rosettatype / hyperglot

Hyperglot: a database and tools for detecting language support in fonts
http://hyperglot.rosettatype.com
GNU General Public License v3.0
166 stars 23 forks source link

UnicodeDecodeError on Windows #56

Closed Eigi closed 3 years ago

Eigi commented 3 years ago

Using the hyperglot cli on Windows I get the following Error:

Traceback (most recent call last):
  File "c:\python38\lib\runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python38\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\pythonEnvs\3.8dev\Scripts\hyperglot.exe\__main__.py", line 7, in <module>
  File "d:\pythonenvs\3.8dev\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "d:\pythonenvs\3.8dev\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "d:\pythonenvs\3.8dev\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\pythonenvs\3.8dev\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "d:\pythonenvs\3.8dev\lib\site-packages\hyperglot\main.py", line 288, in cli
    langs = Languages(strict=strict_iso)
  File "d:\pythonenvs\3.8dev\lib\site-packages\hyperglot\languages.py", line 30, in __init__
    data = yaml.load(f, Loader=yaml.Loader)
  File "d:\pythonenvs\3.8dev\lib\site-packages\yaml\__init__.py", line 112, in load
    loader = Loader(stream)
  File "d:\pythonenvs\3.8dev\lib\site-packages\yaml\loader.py", line 44, in __init__
    Reader.__init__(self, stream)
  File "d:\pythonenvs\3.8dev\lib\site-packages\yaml\reader.py", line 85, in __init__
    self.determine_encoding()
  File "d:\pythonenvs\3.8dev\lib\site-packages\yaml\reader.py", line 124, in determine_encoding
    self.update_raw()
  File "d:\pythonenvs\3.8dev\lib\site-packages\yaml\reader.py", line 178, in update_raw
    data = self.stream.read(size)
  File "c:\python38\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1053: character maps to <undefined>

I can fix the problem by opening the DB in binary mode. languages.py line 29

with open(DB, "rb") as f:

Do you want a PR?

All the best Eigi

kontur commented 3 years ago

Thanks @Eigi for the problem description and possible solution. I'll review this with the next update 👍

kontur commented 3 years ago

Thanks! Release 0.3.4 fixes this as per your suggestion :)