Closed GoogleCodeExporter closed 8 years ago
Hi alumae,
Can you please include the scripts you used to estimate and interpolate the
models?
Paul
Original comment by bojune...@gmail.com
on 16 Dec 2008 at 3:29
To estimate models, I used:
estimate-ngram --read-text <train.i.txt> -v vocab.txt --use-unknown --smoothing
ModKN
-o 4 --write-count <model.i>.arpa.counts --write-lm <model.i>.arpa.gz
To interpolate, I used
interpolate-ngram -l <model.1>.arpa.gz <model.2>.arpa.gz <model.3>.arpa.gz -o 4
--read-parameters interpolate.params -i CM -write-lm final.arpa.gz
The same thing happens when I use simple linear interpolation:
interpolate-ngram -l <model1>.arpa.gz <model2>.arpa.gz <model3>.arpa.gz -o 4
-write-lm final.arpa.gz
There are no such "nans" in the component LMs.
Original comment by alu...@gmail.com
on 16 Dec 2008 at 3:46
I just realized that there are 2-grams such as:
-5.363780 </s> Ababacar -0.315952
-5.611012 </s> Abassi -0.225136
in the component LMs, which of course do not make sense. Maybe you are mixing
begin
and end-of-sentence somewhere?
Original comment by alu...@gmail.com
on 16 Dec 2008 at 4:05
Verified that the problem only exists if --read-vocab and --use-unknown are
specified.
Original comment by bojune...@gmail.com
on 16 Dec 2008 at 4:07
In the last change, I intentionally merged <s> and </s> together since it
simplifies
the internal logic and removes a lot of special cases. As you have noticed, I
have
not made the LM output completely compatible with SRILM yet. I do not believe
this
is the issue though. I will let you know once I figure out what is going on,
hopefully in an hour or so.
Original comment by bojune...@gmail.com
on 16 Dec 2008 at 4:16
Bug Fixes
=========
- Cleaned up usage of NaN such that it should no longer appear. Unobserved
backoff
weights are assumed to be 1, not NaN.
- Only output backoff weight if log value is not 0.
- Cleaned up collapse of <s> and </s> such that ARPA LM loading/saving is
unaffected.
Original comment by bojune...@gmail.com
on 16 Dec 2008 at 7:30
Thanks, seems to work perfectly.
Original comment by alu...@gmail.com
on 17 Dec 2008 at 11:19
Original issue reported on code.google.com by
alu...@gmail.com
on 16 Dec 2008 at 10:24