kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

How to get words from binary file #324

Closed IamSVP94 closed 2 years ago

IamSVP94 commented 3 years ago

I have *.arpa file with list of n-gramms. I can easily extract words like str-type from it. But after convert *.arpa -> *.bin I can't do it! I tried str.decode() but I got an error an error: 'utf-8' codec can't decode byte 0x80 in position 11: invalid start byte How can I get words from probing binary file?

IamSVP94 commented 2 years ago

use https://github.com/parlance/ctcdecode