gpodder / podcastparser

Simplified, fast RSS parsing library in Python
ISC License
135 stars 35 forks source link

Add rss/channel/itunes:summary handling #49

Open CJxD opened 6 months ago

CJxD commented 6 months ago

We noticed that some podcasts don't have descriptions but do have itunes summaries.

Here's a file that can be used in a test: http://netstorage.discovery.com/ahc/podcasts/2016/ahc-ato-podcastrss.xml

auouymous commented 6 months ago

@thp Could we add an optional or secondary flag to set_podcast_attr() so it only sets the value if None? Then description could be the primary field and summary a secondary, no matter which order they are found in the feed.

CJxD commented 5 months ago

Just got back to this! Sorry for the delay. Yes I can look into a suitable fallback behaviour :)

CJxD commented 5 months ago

Here's a question, the episode description behaviour is similar to what I have suggested here so I wonder if this change-if-none behaviour should also be applied in this case.

I also noted how HTML is handled in description but not in the summary according to this. Is it accurate? Should the same happen for channel/description?

**rss/channel/item/description**
    Episode description.
    If it contains html, it's returned as description_html.
    Otherwise it's returned as description (whitespace is squashed).
    See Mozilla's article `Why RSS Content Module is Popular`

**rss/channel/item/itunes:summary**
    Episode description (whitespace is squashed).
auouymous commented 4 months ago

Here's a question, the episode description behaviour is similar to what I have suggested here so I wonder if this change-if-none behaviour should also be applied in this case.

Yes. It would have the same issue, unless it already only changes if none.

I also noted how HTML is handled in description but not in the summary according to this. Is it accurate? Should the same happen for channel/description?

It could if the RSS spec supports HTML in channel descriptions. gPodder only supports HTML descriptions for episodes and would need to detect an HTML channel description and strip it.