libo26 / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

KeyError: 'image' exception #206

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Python 2.5.5 (r255:77872, Feb  1 2010, 19:53:42) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.__version__
'4.2-pre-308-svn'
>>> feedparser.parse('http://www.vogue.de/rss/all/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/home/niksite/python-env/lib/python2.5/site-packages/feedparser.py", line
3587, in parse
    feedparser.feed(data)
  File
"/home/niksite/python-env/lib/python2.5/site-packages/feedparser.py", line
1717, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
    self.goahead(0)
  File "/usr/lib/python2.5/sgmllib.py", line 138, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.5/sgmllib.py", line 315, in parse_endtag
    self.finish_endtag(tag)
  File "/usr/lib/python2.5/sgmllib.py", line 355, in finish_endtag
    self.unknown_endtag(tag)
  File
"/home/niksite/python-env/lib/python2.5/site-packages/feedparser.py", line
580, in unknown_endtag
    method()
  File
"/home/niksite/python-env/lib/python2.5/site-packages/feedparser.py", line
1410, in _end_title
    context = self._getContext()
  File
"/home/niksite/python-env/lib/python2.5/site-packages/feedparser.py", line
1135, in _getContext
    context = self.feeddata['image']
  File
"/home/niksite/python-env/lib/python2.5/site-packages/feedparser.py", line
251, in __getitem__
    return UserDict.__getitem__(self, realkey)
KeyError: 'image'

Original issue reported on code.google.com by nikolay....@gmail.com on 27 Feb 2010 at 8:00

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by adewale on 28 Feb 2010 at 1:42

GoogleCodeExporter commented 9 years ago
According to this: http://beta.feedvalidator.org/check.cgi?
url=http://feedparser.googlecode.com/issues/attachment%3Faid%3D694305210089
0401728%26name%3Dvogue.xml#l12 and this: 
http://cyber.law.harvard.edu/rss/rss.html that feed is seriously broken.

It has multiple image elements at the item level when the spec only permits one 
image at the channel level. If you have any influence over the provider then 
ask them 
to produce something more useful since most tools will fall over when presented 
with that feed.

Having said that, Feedparser shouldn't be throwing an exception here.

Do you have the URL you got the feed from?

Original comment by adewale on 28 Feb 2010 at 1:53

GoogleCodeExporter commented 9 years ago
> Do you have the URL you got the feed from?

Yes, it is http://www.vogue.de/rss/all/

Original comment by nikolay....@gmail.com on 28 Feb 2010 at 5:19

GoogleCodeExporter commented 9 years ago
Also, http://www.hemidemi.com/rss/bookmark/recent.xml has such issue as well as 
some
other feeds.

I have attached a hack which fixes the issue. It is not ideal solution, but it
removes parsing error and not affecting tests.

Original comment by nikolay....@gmail.com on 15 Apr 2010 at 1:34

Attachments:

GoogleCodeExporter commented 9 years ago
This seems like a reasonable solution.
Could you add a test so that we can eliminate the risk that this bug gets
reintroduced by future changes.

Original comment by adewale on 15 Apr 2010 at 2:46

GoogleCodeExporter commented 9 years ago
@Adewale: please close this report as fixed.

Tested using svn trunk, the attached feed document, and both URLs provided. 
This appears to have been fixed in r312:

https://code.google.com/p/feedparser/source/detail?r=312

Original comment by kurtmckee on 5 Dec 2010 at 11:48

GoogleCodeExporter commented 9 years ago

Original comment by adewale on 13 Dec 2010 at 1:19