SAXException('Read failed (no details available)',)

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

>>> import feedparser
>>> feedparser.__version__
'4.2-pre-308-svn'
>>> f = feedparser.parse('/tmp/advertka.xml')
>>> f.bozo_exception
SAXException('Read failed (no details available)',)

What is wrong with this feed (this is http://www.advertka.ru/?rss) or
feedparser?

Original issue reported on code.google.com by nikolay....@gmail.com on 4 Apr 2010 at 4:27

Attachments:

advertka.xml

GoogleCodeExporter commented 9 years ago

I seem to be able to parse that file using the latest version of the code and 
Python
2.5.1 on OS X

What versions and platforms are you using?

Original comment by adewale on 15 Apr 2010 at 4:22

GoogleCodeExporter commented 9 years ago

Have just reproduced. Copied with header:

Python 2.5.5 (r255:77872, Feb  2 2010, 00:25:36) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.__version__
'4.2-pre-308-svn'
>>> f = feedparser.parse('/tmp/advertka.xml')
>>> f.bozo_exception
SAXException('Read failed (no details available)',)
>>>

Original comment by nikolay....@gmail.com on 15 Apr 2010 at 4:47

GoogleCodeExporter commented 9 years ago

well, with python 2.6 it is another error:

Python 2.6.5 (r265:79063, Mar 18 2010, 23:38:15) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> f = feedparser.parse('/tmp/advertka.xml')
>>> f.bozo_exception
SAXParseException('not well-formed (invalid token)',)

Original comment by nikolay....@gmail.com on 15 Apr 2010 at 4:49

GoogleCodeExporter commented 9 years ago

Please attach the file you are attempting to parse to this bug.

Original comment by adewale on 16 Apr 2010 at 1:02

GoogleCodeExporter commented 9 years ago

the file is attached to first message... 
...well, I can attach it to this comment if it helps

Original comment by nikolay....@gmail.com on 16 Apr 2010 at 1:17

Attachments:

advertka.xml

GoogleCodeExporter commented 9 years ago

Here's what I get with Python 2.5.2 on Linux:
>>> import feedparser
>>> f = feedparser.parse("advertka.xml")
>>> f.bozo_exception
SAXParseException("EntityRef: expecting ';'\n",)
>>> f.feed
{'lastbuilddate': u'Sat, 03 Apr 2010 16:45:01 +0400', 'subtitle': u'', 
'language': 
u'ru', 'links': [{'href': u'http://www.advertka.ru/', 'type': 'text/html', 
'rel': 
'alternate'}], 'title': u'Advertka.ru', 'image': {'links': [{'href': 
u'http://www.advertka.ru/', 'type': 'text/html', 'rel': 'alternate'}], 'title': 
u'advertka.ru', 'height': 17, 'width': 17, 'title_detail': {'base': u'', 
'type': 
'text/plain', 'value': u'advertka.ru', 'language': None}, 'href': 
u'http://www.advertka.ru/img/ico2.gif', 'link': u'http://www.advertka.ru/'}, 
'generator': u'advertka / advertka.ru', 'generator_detail': {'name': u'advertka 
/ 
advertka.ru'}, 'subtitle_detail': {'base': u'', 'type': 'text/html', 'value': 
u'', 
'language': None}, 'title_detail': {'base': u'', 'type': 'text/plain', 'value': 
u'Advertka.ru', 'language': None}, 'link': u'http://www.advertka.ru/'}
>>> len(f.entries)
21

As you can see the feed has illegal tokens such as & which should be & in it.

However Feedparser has still processed the content so you can still work with 
it. Can 
you check if you have valid data for f.feed and len(f.entries)? If so then I'm 
going 
to mark this as Invalid since Feedparser is doing what it's supposed to do.

Original comment by adewale on 28 Apr 2010 at 3:16

GoogleCodeExporter commented 9 years ago

> Can you check if you have valid data for f.feed and len(f.entries)?

Yes, I have, I'm just confused a bit with "no details available" statement in my
case. But it seems that this is saxparser's issue, not feedparser's.

Original comment by nikolay....@gmail.com on 28 Apr 2010 at 5:10

GoogleCodeExporter commented 9 years ago

Please close this bug as invalid.

The "no details available" exception isn't occurring in feedparser, and both 
the attached document and the URL provided parse fine using svn trunk.

@nikolay: I wish I knew where to direct you regarding the "no details 
available" exception, but happily it looks like advertka.ru has fixed the 
SAX-related exception on their end. You might consider downloading the very 
latest feedparser code:

https://feedparser.googlecode.com/svn/trunk/feedparser/feedparser.py

If you run into any feeds that aren't parsing properly, don't hesitate to see 
if it's a known issue and report it if it isn't!

Original comment by kurtmckee on 5 Dec 2010 at 12:01

GoogleCodeExporter commented 9 years ago

Original comment by adewale on 12 Dec 2010 at 11:33

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

It'd be nice if the code actually *CAUGHT* SaxException, but it no longer 
appears to do so. This problem now doesn't even given you a feed, it raises an 
exception - so no entries. Which is extremely helpful. It's really annoying 
because I can see in the code it's using the strict parser on it, but there's 
no way of telling feedparser not to do that.

Original comment by vwood....@gmail.com on 6 Dec 2013 at 6:44

libo26 / feedparser

SAXException('Read failed (no details available)',) #211