Closed GoogleCodeExporter closed 9 years ago
Hi alumae,
I'll admit that I didn't test the C++ implementation with binary LM and
non-order-3
models enough (I had a hybrid Python implementation previously that went
through more
usage). While I try to figure what could go wrong from the stack trace, can
you try
the same experiment, but with order 3 LMs and ARPA LM inputs? Thanks.
Paul
Original comment by bojune...@gmail.com
on 10 Dec 2008 at 2:59
For long argument names, you need 2 dashes. One dash indicates a sequence of
one-letter arguments since I used the boost::program_options package to parse
the
input arguments. (I probably made the same mistake in my documentation.)
interpolate-ngram -l build/lm/tmp/model1.mitlm
model2.mitlm -o 4 --optimize-perplexity dev.txt --write-lm out.arpa.gz
Original comment by bojune...@gmail.com
on 10 Dec 2008 at 3:04
- Need to create PerplexityOptimizer/WordErrorRateOptimizer with the same n-gram
order as the model being optimized.
- Current code is not robust enough to optimize using mismatched order. (Issue
5)
- SVN Revision 18.
Original comment by bojune...@gmail.com
on 10 Dec 2008 at 3:23
With trigrams and ARPA format it works (LI and CM, didn't test GLI).
Original comment by alu...@gmail.com
on 10 Dec 2008 at 3:28
I should also warn you that the current recipe for count merging is not
completely
correct since I made the mistake of assuming that c(h) = sum_w c(h w), which is
not
true for Kneser-Ney smoothing as it modifies the lower order counts. The
results
should not be significantly different though and we still get a valid n-gram
model.
I'll provide an updated recipe hopefully in a week or two.
Original comment by bojune...@gmail.com
on 10 Dec 2008 at 3:36
OK, thanks! I confirm LI and CM works now with ARPA 4-grams, didn't test with
binary
format.
Original comment by alu...@gmail.com
on 10 Dec 2008 at 3:59
Original issue reported on code.google.com by
alu...@gmail.com
on 10 Dec 2008 at 1:41