kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
421 stars 89 forks source link

UnicodeDecodeError: 'charmap' codec can't decode byte #88

Closed GaetanBaert closed 1 year ago

GaetanBaert commented 2 years ago

Hello,

When I try to load my KenLM model using the load_from_dir method on Windows, I got a

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 1703: character maps to <undefined>

It seems that adding an encoding="utf8" parameter on line 376 of language_model.py solve this problem.