pombreda / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Blogger's invalid img tags are unparseable #290

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Blogger.com recently pushed an update that's causing an invalid attribute to be 
added to some <img> tags. The invalid attribute is i$, which is obviously not a 
valid attribute name. An example img tag look like this:

<img border="0" i$="true" src="..." />

Feedparser doesn't complain, but it drops all attributes after i$. In other 
words, if the src attribute comes after i$, then it's missing from the parsed 
feed. 

What steps will reproduce the problem?
1. Try to parse this feed:

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<entry>
<content type='html'>
&lt;img border="0" i$="true" 
src="http://2.bp.blogspot.com/-PFFvTs4LpT4/Tf4vWlXwJZI/AAAAAAAAG1s/Ik_ranNtwwA/s
1600/2011IELgraduation-sm.JPG" /&gt;
</content>
</entry></feed>

2. Check the parsed outcome, specifically the <img> tag.
3. You'll notice that the img tag doesn't have a "src" attribute.

What is the expected output? What do you see instead?
Feedparser should ignore the invalid i$ attribute and parse the rest of the tag 
correctly.

In addition to the sample feed above, here are a few that currently have the 
issue:
http://zeemonodee.blogspot.com/feeds/posts/default
http://livelovenstamp.blogspot.com/feeds/posts/default

Original issue reported on code.google.com by wal...@ninua.com on 21 Jun 2011 at 1:40

GoogleCodeExporter commented 9 years ago
I'm a little surprised this issue is not getting more stars. Blogger is the 
most popular blogging platform out there and this issue is making their feeds 
unparsable with feedparser. I thought more people would complain. Unless I'm 
missing something!

Original comment by wal...@ninua.com on 8 Jul 2011 at 7:14

GoogleCodeExporter commented 9 years ago
Another example of a feed that can't be parsed but is reported as valid by 
FeedValidator.org:

ouronesweetfamily.blogspot.com/feeds/posts/default

Original comment by wal...@ninua.com on 21 Jul 2011 at 3:45

GoogleCodeExporter commented 9 years ago
Fixed in r560.

Original comment by kurtmckee on 15 Aug 2011 at 5:59