Difficulty with character encodings

libo26 / feedparser

Automatically exported from code.google.com/p/feedparser

Other

0 stars 0 forks source link

This is more a feature request than a bug, but since Python is notoriously
fickle with character encodings, it would be nice if you can set an
optional character encoding when parsing a feed:

e.g. d = feedparser.parse(url=foo, to_encoding='utf-8')

If this option is set, the feedparser rigorous checks and encodes the
source file, so anything that comes out of it is in the desired encoding,
and in the event something bad happens, it gracefully degrades the text
with placeholder characters, which is better than an uncaught exception. 

In most cases, the default should be utf-8 unless otherwise specified since
ASCII and other character sets map into it.

Brian McConnell

Original issue reported on code.google.com by bsmcconn...@gmail.com on 15 Sep 2009 at 2:19

Are you suggesting that Feedparser should perform a lossy conversion of a feed from its native encoding to a user specified encoding? That sounds like a can of worms that is best left to the end-user. I'm marking this as WontFix since I'm unlikely to ever implement this feature. However I'd be happy to accept a patch which provides this behaviour without breaking any of the existing functionality.

libo26 / feedparser

Difficulty with character encodings #185