pombreda / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

RSS media:content is not available for items in Flickr RSS feed #230

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Load a flickr RSS feed

     myfeed = feed.feedparse( 'http://api.flickr.com/services/feeds/photos_public.gne?tags=architecture&lang=en-us&format=rss_200' )

What is the expected output? What do you see instead?

       With view source in browser you can see media:content
       elements on each item.  The media content elements look 
       like 
          <media:content url="http://farm2.static.flickr.com/1275/5178814367_687c0489c2_m.jpg" .... />

       But searching the parsed feed's entry list, no entries can
       be found that have url data matching what's in the source
       feed's media:content elements.

What version of the product are you using? On what operating system?

       Version 4.1 on Linux and Mac

Original issue reported on code.google.com by rmela02...@gmail.com on 15 Nov 2010 at 7:16

GoogleCodeExporter commented 9 years ago
More info...

   import feedparser
   feed=feedparser.parse('http://api.flickr.com/services/feeds/photos_public.gne?tags=architecture&lang=en-us&format=rss_200')
   entry=feed.entries[0]

   # Based on the doc on namespace handling, I would expect
   # media_content to be one of the keys printed out in the next statement...

   print entry.keys()
['summary_detail', 'dc_date.taken', 'updated_parsed', 'links', 'title', 
'credit', 'author', 'thumbnail', 'updated', 'summary', 'content', 'guidislink', 
'title_detail', 'link', 'author_detail', 'id', 'tags']

Original comment by rmela02...@gmail.com on 15 Nov 2010 at 7:30

GoogleCodeExporter commented 9 years ago
More info...

   import feedparser
   feed=feedparser.parse('http://api.flickr.com/services/feeds/photos_public.gne?tags=architecture&lang=en-us&format=rss_200')
   entry=feed.entries[0]

   # Based on http://www.feedparser.org/docs/namespace-handling.html, 
   # I would expect 'media_content' to be one of the keys printed out
   # in the next statement...

   print entry.keys()
['summary_detail', 'dc_date.taken', 'updated_parsed', 'links', 'title', 
'credit', 'author', 'thumbnail', 'updated', 'summary', 'content', 'guidislink', 
'title_detail', 'link', 'author_detail', 'id', 'tags']

Original comment by rmela02...@gmail.com on 15 Nov 2010 at 7:31

GoogleCodeExporter commented 9 years ago
Here's the fix -

   1) Add http://search.yahoo.com/mrss/ to the _FeedParserMixin namespaces
   2) Add a _start_media_content handler to _FeedParserMixin

Here's the patch, and I've attached a test file.

Thanks,
   - Robert Mela

326a327
>                   'http://search.yahoo.com/mrss/':                        
'media',
834a836,839
>     def _start_media_content(self, attrsD):
>         context = self._getContext()
>         context['media_content']=attrsD
> 

Original comment by rmela02...@gmail.com on 15 Nov 2010 at 8:58

Attachments:

GoogleCodeExporter commented 9 years ago
Submitting a slightly better solution -- use FeedParserDict for element 
attributes.

Also implement same fix for media_thumbnail

Original comment by rmela02...@gmail.com on 15 Nov 2010 at 9:53

Attachments:

GoogleCodeExporter commented 9 years ago
Never mind -- similar to issue 192, which was fixed well enough in revision 296.

     http://code.google.com/p/feedparser/source/detail?r=296

Original comment by rmela02...@gmail.com on 15 Nov 2010 at 10:19

GoogleCodeExporter commented 9 years ago
Weird, I'm using feedparser (__version__ is set to '4.1') to parse a Flickr 
feed, and while my entries do get some of the media_* fields, the important 
ones are empty which is a bummer. Here's what I get

from feedparser import parse

f = 
parse("http://api.flickr.com/services/feeds/photos_public.gne?id=37343463@N08&la
ng=en-us&format=rss_200")

e = f.entries[0]

In [79]: sorted(e.keys())
Out[79]: 
['author',
 'author_detail',
 'dc_date.taken',
 'guidislink',
 'id',
 'license',
 'link',
 'links',
 'media_category',
 'media_content',
 'media_credit',
 'media_thumbnail',
 'summary',
 'summary_detail',
 'title',
 'title_detail',
 'updated',
 'updated_parsed']

In [80]: (e.media_category, e.media_content, e.media_credit, e.media_thumbnail)
Out[80]: (u'barcamp unconference barcamplondon bcl8', u'', u'bfirsh', u'')

As you can see both 'media_content' and 'media_thumbnail' are empty, and those 
are *exactly* the fields I need, which renders feedparser completely useless in 
this particular case.

Any ideas on how to fix this?

Original comment by dguarag...@gmail.com on 16 Nov 2010 at 11:47

GoogleCodeExporter commented 9 years ago
This is what I get:
>>> from feedparser import parse
>>> 
>>> f = 
parse("http://api.flickr.com/services/feeds/photos_public.gne?id=37343463@N08&la
ng=en-us&format=rss_200")
>>> 
>>> e = f.entries[0]
>>> sorted(e.keys())
['author', 'author_detail', 'dc_date.taken', 'guidislink', 'href', 'id', 
'link', 'links', 'media_content', 'media_credit', 'media_thumbnail', 'summary', 
'summary_detail', 'title', 'title_detail', 'updated', 'updated_parsed']
>>> e.media_content
[{'url': u'http://farm5.static.flickr.com/4151/5196207597_80cef881c4_o.jpg', 
'width': u'5184', 'type': u'image/jpeg', 'height': u'3456'}]
>>> e.media_thumbnail
[{'url': u'http://farm5.static.flickr.com/4151/5196207597_37cb589e1f_s.jpg', 
'width': u'75', 'height': u'75'}]

The problem appears to be that you are using version 4.1 rather than the HEAD 
version from Subversion. If you switch to the latest code you should see that 
this has been fixed. I'm marking this as fixed but please update this thread if 
it still doesn't work using the latest code.

Original comment by adewale on 1 Dec 2010 at 12:16