jmsv / ety-python

A Python module to discover the etymology of words
http://ety-python.rtfd.io
MIT License
144 stars 18 forks source link

Fix UnicodeDecodeError #16

Closed alxwrd closed 6 years ago

alxwrd commented 6 years ago

hello 👋

I'm currently getting the following error on my Windows machine:

>>> import ety
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Alex\repos\ety-python\ety\__init__.py", line 10, in <module>
    data.load()
  File "C:\Users\Alex\repos\ety-python\ety\data.py", line 45, in load
    load_country_codes()
  File "C:\Users\Alex\repos\ety-python\ety\data.py", line 35, in load_country_codes
    countries_json = json.load(f)
  File "C:\Python36\Lib\json\__init__.py", line 296, in load
    return loads(fp.read(),
  File "C:\Users\Alex\.virtualenvs\ety-python-_lx5rZzb\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8732: character maps to <undefined>

This PR should fix this by specifying the encoding.

alxwrd commented 6 years ago

Just for reference: everything was working fine on my Linux machine. I did some digging because I was curious as to why.

Python 3 Default Encoding cp1252

Python 3.6 changes some more defaults:

PEP 529: Change Windows filesystem encoding to UTF-8 PEP 528: Change Windows console encoding to UTF-8

But the default encoding for open() is still whatever Python manages to infer from the environment.

And yup, on the windows machine:

>>> import locale
>>> locale.getpreferredencoding()
'cp1252'
jmsv commented 6 years ago

@alxwrd That's interesting, thanks for spotting and fixing this! :tada:

Thanks for the info too, I don't know much about encoding so I was just opening the file and hoping for the best :grimacing: