jbeard4 / mitlm

Automatically exported from code.google.com/p/mitlm
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

models with <unk> #17

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
on the tutorial wiki page (http://code.google.com/p/mitlm/wiki/Tutorial) it
is written that a language model with <unk> symbol for out-of-vocabulary
words can be estimated with this command:

estimate-ngram -v CS.vocab -unk -t Lectures.txt -wl Lectures.CS.unk.lm

but that does not work. You have to add T,t,1,TRUE or true after -unk:

estimate-ngram -v CS.vocab -unk true -t Lectures.txt -wl Lectures.CS.unk.lm

Original issue reported on code.google.com by michal.f...@gmail.com on 24 Feb 2010 at 3:30

GoogleCodeExporter commented 8 years ago
The same holds for the -wb / -write-binary options. It only works with e.g.

estimate-ngram -t data.txt -wb true -wl model.lm

Original comment by smarqu...@gmail.com on 9 Mar 2011 at 12:57

GoogleCodeExporter commented 8 years ago

Original comment by giuliop...@gmail.com on 30 Jan 2013 at 2:33