Closed PCerles closed 5 years ago
Is there any way to enforce kenlm to not output <unk> as a unigram?
<unk>
Not in the current code. If you want all the mass on words, edit lm/builder/interpolate.cc to muck with vocabulary size on line 167. Then edit the printer to skip the unknown line in output.
Great, thank you!
Is there any way to enforce kenlm to not output
<unk>
as a unigram?