sensiblecodeio / data-services-helpers

Python module containing classes and functions that The Sensible Code Company's Data Services often used
https://sensiblecode.io/
BSD 2-Clause "Simplified" License
4 stars 4 forks source link

download_url should make use of the encoding in the Content-Type header #2

Open fawkesley opened 10 years ago

fawkesley commented 10 years ago

At the moment we're returning a file object from response.content which loses any information we had about the file's unicode encoding:

Content-Type: text/html; charset=UTF-8

We can cunningly wrap the returned file handle using the codecs module:

from cStringIO import StringIO
>>> f = StringIO(a.encode('utf-8'))
>>> f.read()
'Marat\xc3\xb3n'
>>> f.seek(0)
>>> g = codecs.getreader('utf-8')(f)
>>> print g.read()
Maratón
scraperdragon commented 10 years ago

The critical bit of code is:

g = codecs.getreader('utf-8')(f)
g.read()

f is a file handle containing UTF-8 bytes; but g.read() returns correct unicode.