Interpolation with CM and GLI fails with using -opt-perp

GoogleCodeExporter commented 8 years ago

When I use interpolate-ngram to interpolate two models by CM or GLI with 
perplexity optimization, I get following faults:

1st:
interpolate-ngram -lm "model1.lm, model2.lm" -smoothing ModKN -interpolation CM 
-opt-perp dev-set.txt -write-lm CM-model.lm
Loading component LM model1.lm...
Loading component LM model2.lm...
Interpolating component LMs...
Tying parameters across n-gram order...
Interpolation Method = CM
Loading feature for model1.lm from log:sumhist:model1.effcounts...
terminate called after throwing an instance of 'std::runtime_error'
 what(): Cannot open file
Aborted

2nd:
interpolate-ngram -lm "model1.lm, model2.lm" -smoothing ModKN -interpolation 
GLI -opt-perp dev-set.txt -write-lm GLI-model.lm
Loading component LM model1.lm...
Loading component LM model2.lm...
Interpolating component LMs...
Tying parameters across n-gram order...
Interpolation Method = GLI
Segmentation fault

I'm using MITLM v0.4 from SVN under Linux, Intel i7.

Jan

Original issue reported on code.google.com by ing.jan....@gmail.com on 8 Sep 2010 at 7:11

GoogleCodeExporter commented 8 years ago

CM and GLI are advanced interpolation techniques that require additional 
feature files.  You can specify the feature files using the 
-interpolation-features argument.

In the first case, we can tell from the error that the code is failing to find 
model1.effcounts.  Without debugging, I suspect the second case is also failing 
on the same problem, but did not properly report the problem.

Please see the tutorial page (http://code.google.com/p/mitlm/wiki/Tutorial) for 
example usage of these advanced interpolation techniques.

Original comment by bojune...@gmail.com on 9 Sep 2010 at 5:17

GoogleCodeExporter commented 8 years ago

I also used the -interpolation-features argument, but my LMs are too large 
(together about 10GB in ARPA format). We have the Intel processor core i7, with 
12GB RAM and 24GB swap and Debian operation system. The problem of segmentation 
fault happens because the running process caused the overflowing of RAM buffer. 
I have no idea how to controll process of combinig LMs using MITLM tools. Could 
you help me?

Original comment by ing.jan....@gmail.com on 9 Sep 2010 at 12:07

GoogleCodeExporter commented 8 years ago

Can you try to increase the swap file size to 32 or 64GB and see if the process 
completes?  Optimization may still run efficiently even if a large portion of 
the model is swapped out.

Original comment by bojune...@gmail.com on 9 Sep 2010 at 4:43

jbeard4 / mitlm

Interpolation with CM and GLI fails with using -opt-perp #20