Closed GoogleCodeExporter closed 9 years ago
I seem to be able to parse that file using the latest version of the code and
Python
2.5.1 on OS X
What versions and platforms are you using?
Original comment by adewale
on 15 Apr 2010 at 4:22
Have just reproduced. Copied with header:
Python 2.5.5 (r255:77872, Feb 2 2010, 00:25:36)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.__version__
'4.2-pre-308-svn'
>>> f = feedparser.parse('/tmp/advertka.xml')
>>> f.bozo_exception
SAXException('Read failed (no details available)',)
>>>
Original comment by nikolay....@gmail.com
on 15 Apr 2010 at 4:47
well, with python 2.6 it is another error:
Python 2.6.5 (r265:79063, Mar 18 2010, 23:38:15)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> f = feedparser.parse('/tmp/advertka.xml')
>>> f.bozo_exception
SAXParseException('not well-formed (invalid token)',)
Original comment by nikolay....@gmail.com
on 15 Apr 2010 at 4:49
Please attach the file you are attempting to parse to this bug.
Original comment by adewale
on 16 Apr 2010 at 1:02
the file is attached to first message...
...well, I can attach it to this comment if it helps
Original comment by nikolay....@gmail.com
on 16 Apr 2010 at 1:17
Attachments:
Here's what I get with Python 2.5.2 on Linux:
>>> import feedparser
>>> f = feedparser.parse("advertka.xml")
>>> f.bozo_exception
SAXParseException("EntityRef: expecting ';'\n",)
>>> f.feed
{'lastbuilddate': u'Sat, 03 Apr 2010 16:45:01 +0400', 'subtitle': u'',
'language':
u'ru', 'links': [{'href': u'http://www.advertka.ru/', 'type': 'text/html',
'rel':
'alternate'}], 'title': u'Advertka.ru', 'image': {'links': [{'href':
u'http://www.advertka.ru/', 'type': 'text/html', 'rel': 'alternate'}], 'title':
u'advertka.ru', 'height': 17, 'width': 17, 'title_detail': {'base': u'',
'type':
'text/plain', 'value': u'advertka.ru', 'language': None}, 'href':
u'http://www.advertka.ru/img/ico2.gif', 'link': u'http://www.advertka.ru/'},
'generator': u'advertka / advertka.ru', 'generator_detail': {'name': u'advertka
/
advertka.ru'}, 'subtitle_detail': {'base': u'', 'type': 'text/html', 'value':
u'',
'language': None}, 'title_detail': {'base': u'', 'type': 'text/plain', 'value':
u'Advertka.ru', 'language': None}, 'link': u'http://www.advertka.ru/'}
>>> len(f.entries)
21
As you can see the feed has illegal tokens such as & which should be & in it.
However Feedparser has still processed the content so you can still work with
it. Can
you check if you have valid data for f.feed and len(f.entries)? If so then I'm
going
to mark this as Invalid since Feedparser is doing what it's supposed to do.
Original comment by adewale
on 28 Apr 2010 at 3:16
> Can you check if you have valid data for f.feed and len(f.entries)?
Yes, I have, I'm just confused a bit with "no details available" statement in my
case. But it seems that this is saxparser's issue, not feedparser's.
Original comment by nikolay....@gmail.com
on 28 Apr 2010 at 5:10
Please close this bug as invalid.
The "no details available" exception isn't occurring in feedparser, and both
the attached document and the URL provided parse fine using svn trunk.
@nikolay: I wish I knew where to direct you regarding the "no details
available" exception, but happily it looks like advertka.ru has fixed the
SAX-related exception on their end. You might consider downloading the very
latest feedparser code:
https://feedparser.googlecode.com/svn/trunk/feedparser/feedparser.py
If you run into any feeds that aren't parsing properly, don't hesitate to see
if it's a known issue and report it if it isn't!
Original comment by kurtmckee
on 5 Dec 2010 at 12:01
Original comment by adewale
on 12 Dec 2010 at 11:33
It'd be nice if the code actually *CAUGHT* SaxException, but it no longer
appears to do so. This problem now doesn't even given you a feed, it raises an
exception - so no entries. Which is extremely helpful. It's really annoying
because I can see in the code it's using the strict parser on it, but there's
no way of telling feedparser not to do that.
Original comment by vwood....@gmail.com
on 6 Dec 2013 at 6:44
Original issue reported on code.google.com by
nikolay....@gmail.com
on 4 Apr 2010 at 4:27Attachments: