simon-joseph / mitlm

Automatically exported from code.google.com/p/mitlm
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Cannot create optimized unigrams #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When creating unigram LMs and using word features and trying to optimize on
a dev set, I get segmentation fault:

$ estimate-ngram -v etc/vocab -unk 1 -o 1 -t train.txt -wl tmp.arpa.gz -wf
entropy:%s.txt -op tmp.txt
Replace unknown words with <unk>...
Loading vocab etc/vocab...
Loading corpus train.txt...
Loading weight features entropy:train.txt...
Smoothing[1] = ModKN
Set smoothing algorithms...
Loading development set tmp.txt...
Segmentation fault

The same line works fine with -o 2.
I'm using MITLM from SVN.

Original issue reported on code.google.com by alu...@gmail.com on 26 Feb 2009 at 2:13

GoogleCodeExporter commented 9 years ago
Code previously assumed that the first word in each sentence starts with order 
2 (<s>
word).  Modified it such that it assumes min of 2 and current order.

Original comment by bojune...@gmail.com on 26 Feb 2009 at 3:34