Closed PCerles closed 5 years ago
Hi, I have a very large corpus that I want to train an n-gram language model on. I want to prune for efficient STT decoding, but I don't want to do any pruning on n-grams that contain certain key words. Is there a way to do this directly with kenlm?
Requires code modification. https://github.com/kpu/kenlm/blob/master/lm/builder/adjust_counts.cc have it not mark the stuff you want. Have fun!
Thanks!
Hi, I have a very large corpus that I want to train an n-gram language model on. I want to prune for efficient STT decoding, but I don't want to do any pruning on n-grams that contain certain key words. Is there a way to do this directly with kenlm?