Closed GoogleCodeExporter closed 9 years ago
I don't understand in what way feedparser is rejecting the feed. Do you mean
that feedparser sets the `bozo` bit because of the character encoding override?
Original comment by kurtmckee
on 30 Aug 2011 at 6:52
Yes, it sets the bozo flag, and the bozo exception return is:
CharacterEncodingOverride(u'document declared as us-ascii, but parsed as
utf-8',)
Original comment by wal...@ninua.com
on 30 Aug 2011 at 9:02
That's the behavior I'm seeing as well. I also see that feedparser parses the
feed.
CharacterEncodingOverride is a feedparser "exception" that simply serves as a
way for developers to see what's going on with the feed they're parsing,
particularly if they're seeing problems with the output. It's a subclass of
`feedparser.ThingsNobodyCaresAboutButMe`, and you can ignore the "exception" in
your code.
As this is expected behavior that follows the specifications noted in the
comment in `getCharacterEncoding()` I'm going to close this issue.
Original comment by kurtmckee
on 30 Aug 2011 at 3:06
Interesting. I always assumed that if the bozo flag is set, it meant that
feedparser couldn't parse the feed. I don't recall any mention in the
documentation of any other way to tell if the feed is parsed or not. Can you
please clarify how you check that the feed is parsed?
Original comment by wal...@ninua.com
on 30 Aug 2011 at 7:55
I don't have a good recommendation; feedparser doesn't check to see if it's
actually parsing a feed, it merely extracts data from the XML document it's
given. As an example, a wellformed XHTML document will be parsed without
errors, but the `feed` and `entries` attributes will be empty (assuming that
there weren't any recognizable XML elements that feedparser was looking for).
If you're trying to figure out if the URL a user inputted is actually a feed,
you might sniff the first 512 bytes (or some other arbitrary number), which is
what Firefox did last time I checked.
Original comment by kurtmckee
on 31 Aug 2011 at 5:00
Original issue reported on code.google.com by
wal...@ninua.com
on 30 Aug 2011 at 5:09