Open fawkesley opened 10 years ago
At the moment we're returning a file object from response.content which loses any information we had about the file's unicode encoding:
response.content
Content-Type: text/html; charset=UTF-8
We can cunningly wrap the returned file handle using the codecs module:
codecs
from cStringIO import StringIO >>> f = StringIO(a.encode('utf-8')) >>> f.read() 'Marat\xc3\xb3n' >>> f.seek(0) >>> g = codecs.getreader('utf-8')(f) >>> print g.read() Maratón
The critical bit of code is:
g = codecs.getreader('utf-8')(f) g.read()
f is a file handle containing UTF-8 bytes; but g.read() returns correct unicode.
unicode
At the moment we're returning a file object from
response.content
which loses any information we had about the file's unicode encoding:We can cunningly wrap the returned file handle using the
codecs
module: