kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
421 stars 89 forks source link

OOV words with small LM #115

Open davidavdav opened 11 months ago

davidavdav commented 11 months ago

Hello,

We have an application with a very small vocabulary (~100 words). With an almost trivial bigram model (as kenlm seems not to be able to make a unigram model), we see that decoder.decode() produces words that are not in the language model.

Is there some kind of fallback to letter decoding? Is there a way to turn this off?

Thanks!