pombreda / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Parser crashes with SGMLParseError on valid feed #226

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1. Try parsing this feed (feed content attached in case the feed changes):
 http://allastrology.blogspot.com/feeds/posts/default

2. I expect feedparser to parse it correctly. But I get an exception instead. 

3. FeedValidator says the feed is valid:
http://feedvalidator.org/check.cgi?url=http://allastrology.blogspot.com/feeds/po
sts/default

4. SimplePie parses it as well:
http://simplepie.org/demo/?feed=http://allastrology.blogspot.com/feeds/posts/def
ault

5. Here is the stack trace I get from feedparser:

Traceback (most recent call last):

   data = feedparser.parse(feed_content)

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 3607, in parse
   feedparser.feed(data)

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 1729, in feed
   sgmllib.SGMLParser.feed(self, data)

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 99, in feed
   self.goahead(0)

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 138, in goahead
   k = self.parse_endtag(i)

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 315, in parse_endtag
   self.finish_endtag(tag)

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 355, in finish_endtag
   self.unknown_endtag(tag)

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 584, in unknown_endtag
   method()

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 1546, in _end_content
   value = self.popContent('content')

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 871, in popContent
   value = self.pop(tag)

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 780, in pop
   output = _resolveRelativeURIs(output, self.baseuri, self.encoding, self.contentparams.get('type', 'text/html'))

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 2338, in _resolveRelativeURIs
   p.feed(htmlSource)

 File "/base/data/home/apps/networkedblogs/5.344111130473216988/lib/feedparser.py", line 1729, in feed
   sgmllib.SGMLParser.feed(self, data)

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 99, in feed
   self.goahead(0)

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 169, in goahead
   k = self.parse_declaration(i)

 File "/base/python_runtime/python_dist/lib/python2.5/markupbase.py", line 136, in parse_declaration
   "unexpected %r char in declaration" % rawdata[j])

 File "/base/python_runtime/python_dist/lib/python2.5/sgmllib.py", line 106, in error
   raise SGMLParseError(message)

SGMLParseError: unexpected '/' char in declaration

Original issue reported on code.google.com by ninuawal...@gmail.com on 16 Aug 2010 at 9:13

Attachments:

GoogleCodeExporter commented 9 years ago
A few more feeds that break with the same exception, and all of them are 
considered valid by feedvalidator:

http://ikonkaar.blogspot.com/feeds/posts/default
http://allastrology.blogspot.com/feeds/posts/default
http://risage-zeekerz.blogspot.com/feeds/posts/default
jjjhttp://nortron.blogspot.com/feeds/posts/default

Original comment by ninuawal...@gmail.com on 18 Aug 2010 at 11:00

GoogleCodeExporter commented 9 years ago
This error is being caused by a <!DOCTYPE> declaration in the content. I've 
attached a sample document that succinctly demonstrates the problem.

Original comment by kurtmckee on 9 Dec 2010 at 7:08

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by adewale on 22 Dec 2010 at 10:40