libo26 / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

Difficulty with character encodings #185

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is more a feature request than a bug, but since Python is notoriously
fickle with character encodings, it would be nice if you can set an
optional character encoding when parsing a feed:

e.g. d = feedparser.parse(url=foo, to_encoding='utf-8')

If this option is set, the feedparser rigorous checks and encodes the
source file, so anything that comes out of it is in the desired encoding,
and in the event something bad happens, it gracefully degrades the text
with placeholder characters, which is better than an uncaught exception. 

In most cases, the default should be utf-8 unless otherwise specified since
ASCII and other character sets map into it.

Brian McConnell

Original issue reported on code.google.com by bsmcconn...@gmail.com on 15 Sep 2009 at 2:19

GoogleCodeExporter commented 9 years ago
Are you suggesting that Feedparser should perform a lossy conversion of a feed 
from 
its native encoding to a user specified encoding? That sounds like a can of 
worms that 
is best left to the end-user.

I'm marking this as WontFix since I'm unlikely to ever implement this feature. 
However 
I'd be happy to accept a patch which provides this behaviour without breaking 
any of 
the existing functionality.

Original comment by adewale on 29 Apr 2010 at 12:36