polm / cutlet

Japanese to romaji converter in Python
https://polm.github.io/cutlet/
MIT License
305 stars 21 forks source link

UnicodeDecodeError on 'exceptions.tsv' with Windows 10 Japanese Locale #10

Closed 4890A closed 4 years ago

4890A commented 4 years ago

Windows attempts to decode exceptions.tsv with code point 932 instead of utf-8 for some reason. Setting the open keyword argument encoding=utf-8 fixes it.

Traceback (most recent call last): File "cutlet_test.py", line 2, in <module> katsu = cutlet.Cutlet() File "C:\ProgramData\Miniconda3\envs\jpocr\lib\site-packages\cutlet\cutlet.py", line 80, in __init__ self.exceptions = load_exceptions() File "C:\ProgramData\Miniconda3\envs\jpocr\lib\site-packages\cutlet\cutlet.py", line 59, in load_exceptions for line in open(cdir / 'exceptions.tsv'): UnicodeDecodeError: 'cp932' codec can't decode byte 0x83 in position 10: illegal multibyte sequence

polm commented 4 years ago

Thanks for the report and the quick fix!

polm commented 4 years ago

Just released v0.1.10, which should fix this.