kurtmckee / feedparser

Parse feeds in Python
https://feedparser.readthedocs.io
Other
1.93k stars 340 forks source link

not able to retrieve categories #264

Open cyril36 opened 3 years ago

cyril36 commented 3 years ago

Hi,

I am using Python 3.8.3 and feedparser==6.0.2 I have the following feed :

<channel>
                <atom:link href="http://joeroganexp.joerogan.libsynpro.com/rss" rel="self" type="application/rss+xml"/>
        <title>The Joe Rogan Experience</title>
        <pubDate>Fri, 27 Nov 2020 18:00:00 +0000</pubDate>
        .....
        <itunes:author>Joe Rogan</itunes:author>
        <itunes:keywords>comedian,joe,monkey,redban,rogan,talking,ufc</itunes:keywords>
        <itunes:category text="Comedy"/>
        <itunes:category text="Society &amp; Culture"/>
        <itunes:category text="Technology"/>
        <itunes:explicit>yes</itunes:explicit>
        <itunes:owner>
            <itunes:name><![CDATA[Joe Rogan]]></itunes:name>
            <itunes:email>joe@joerogan.net</itunes:email>
        </itunes:owner>
        <description><![CDATA[Conduit to the Gaian Mind]]></description>
        <itunes:subtitle><![CDATA[Joe Rogan's Weekly Podcast]]></itunes:subtitle>
        <itunes:type>episodic</itunes:type>
       <items>
          ...............

1) I want to retrieve the "categories" independently of "keywords" 2) I want to retrieve "author" independently of "itunes:owner"

First 1)

When I do the following to get one category it works , ut i can not retrieve the other categories:

            NewsFeed = feedparser.parse("http://rss_ul")
            print(NewsFeed.feed.category)

When I do the following to get several categories i get an error:

            NewsFeed = feedparser.parse("http://rss_ul")
            print(NewsFeed.feed.categories)

ERROR :

return dict.__getitem__(self, key)
KeyError: 'categories

I can retrieve the categories through the "tags" key, such as NewsFeed.feed.get("tags") but "tags" contains the "keywords" and "categories". such as :

{
        "term": "comedian",
       ...
    },
    {
        "term": "joe",
        ...
    },
   ...
    {
        "term": "Comedy",  
        ...
    },
    {
        "term": "Society & Culture",
        ...
    },
    {
        "term": "Technology",
        ...
}

I want "categories" separated from "keywords".

2) same for "author" independently of "itunes:owner"

i would like to know if there is a feature where i can parse a single tag?

Thank you

marianoquevedo commented 3 months ago

Same here, I can't get the categories and subcategories correctly from the tags dict. Sample RSS feed: https://allinchamathjason.libsyn.com/rss I get:

{'term': 'Chamath', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'Covid19', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'Entrepreneurship', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'Friedberg', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'IQ', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'Startups', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'allin', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'alllin', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'business', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'calacanis', 'scheme': 'http://www.itunes.com/', 'label': None}
{'term': 'coronavirus', 'scheme': 'http://www.itunes.com/', 'label': None}