kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

KenLM for class-based LM #126

Closed vijay120 closed 6 years ago

vijay120 commented 6 years ago

I currently use SRILM for class-based LM. The API for such models is as follows: ngram -classes class.txt -unk -order 3 -debug 1 -lm model.ngram.count. Is there something similar in KenLM to ingest class files?

The class file format in SRILM are as follows:

NAME 9.1121588858e-05 john smith
NAME 3.94062207105e-05 david smith
NAME 3.81350523005e-05 michael smith
.
.
.
kpu commented 6 years ago

You can use hard classes by mapping the text yourself. Probably want to use --discount-fallback with lmplz. There isn't built-in support for reading those files.