Retraining the smt-2016 model

ghozn commented 5 years ago

Hi, Thank you for answering my questions before. I am training the smt-2016 model recently. Everything just fine while using moses to train the model but I encounter an error while tuning. The error message is: Name:moses VmPeak:30234320 kB VmRSS:702832 kB RSSMax:29396400 kB user:474.828 sys:7.216 CPU:482.044 real:55.818 The decoder returns the scores in this order: OpSequenceModel0 LM0 LM1 LM2 EditOps0 EditOps0 EditOps0 WordPenalty0 PhrasePenalty0 TranslationModel0 TranslationModel0 TranslationModel0 TranslationModel0 Executing: gzip -f run1.best100.out Scoring the nbestlist. exec: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/tuning.0.1/extractor.sh Executing: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/tuning.0.1/extractor.sh > extract.out 2> extract.err Executing: \cp -f init.opt run1.init.opt Executing: echo 'not used' > weights.txt exec: /data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out Executing: /data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out > run1.mira.out 2> mert.log sh: line 1: 34173 abandoned /data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out > run1.mira.out 2> mert.log Exit code: 134 ERROR: Failed to run '/data/home/ghoznfan/mosesdecoder-master/bin/kbmira --sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001 --dense-init run1.dense --ffile run1.features.dat --scfile run1.scores.dat -o mert.out'. at /data/home/ghoznfan/mosesdecoder-master/scripts/training/mert-moses.pl line 1775. 06/12/2019 19:54:13 Command: perl /data/home/ghoznfan/mosesdecoder-master/scripts/training/mert-moses.pl /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/test.lc.0.mer.err.fact /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/test.lc.0.mer.m2 /data/home/ghoznfan/mosesdecoder-master/bin/moses /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.ini --working-dir=/data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/tuning.0.1 --mertdir=/data/home/ghoznfan/mosesdecoder-master/bin --mertargs "--sctype M2SCORER" --no-filter-phrase-table --nbest=100 --threads 16 --decoder-flags "-threads 16 -fd '|'" --maximum-iterations 15 --batch-mira --return-best-dev --batch-mira-args "--sctype M2SCORER --scconfig beta:0.5,max_unchanged_words:2,case:false --model-bg -D 0.001" finished with non-zero status 512 Died at train/run_cross.perl line 695. Died at train/run_cross.perl line 11. [ghoznfan@train-shuaidong-20190308-1708-gpu-pod-0 baselines-emnlp2016-master]$ paste: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.*/work.err-cor/binmodel.err-cor/moses.mert.?.1.ini: no such file 06/12/2019 19:54:15 Running command: perl /data/home/ghoznfan/baselines-emnlp2016-master/train/scripts/reuse-weights.perl /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.mert.1.ini < /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.ini > /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.mert.1.ini ERROR: could not open weight file: /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.mert.1.ini at /data/home/ghoznfan/baselines-emnlp2016-master/train/scripts/reuse-weights.perl line 9. 06/12/2019 19:54:15 Command: perl /data/home/ghoznfan/baselines-emnlp2016-master/train/scripts/reuse-weights.perl /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/cross.00/work.err-cor/binmodel.err-cor/moses.mert.1.ini < /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.ini > /data/home/ghoznfan/baselines-emnlp2016-master/trainworkdir/release/work.err-cor/binmodel.err-cor/moses.mert.1.ini finished with non-zero status 512 Died at train/run_cross.perl line 695. Where is the problem here?

ghozn commented 5 years ago

I am training it on Linux.

snukky commented 5 years ago

Please check mert.log for detail logs from kbmira.

ghozn commented 5 years ago

The contents of the mert.log are as follows: bmira with c=0.01 decay=0.001 no_shuffle=0 Initialising random seed from system clock name: beta value: 0.5 name: max_unchanged_words value: 2 name: case value: false terminate called after throwing an instance of 'MosesTuning::FileFormatException' what(): Error in line "-13.1108 -41.2622 -36.7584 -37.2793 1 0 0 1 -4 4 -3.77276 -4.28588 -6.39024 -6.82255 CorrectionPattern0_left(«44»)_sub(«a»,«some»)=1 CorrectionPattern0_sub(«a»,«some»)_right(«87»)=1 CorrectionPattern0_left(«44»)_sub(«a»,«some»)_right(«87»)=1 " of run1.features.dat

ghozn commented 5 years ago

It seems that there is too much values for bmira to read. I have tried several times to train it and got different results. The bmira wants 13 weights but sometimes got 15, 16 etc. Sometimes it can finish 2-3 steps of tuning but failed in the next step. Have you encountered this problem?

snukky commented 5 years ago

I'm not sure. I've been using my fork for most of the experiments (https://github.com/snukky/mosesdecoder/tree/gleu), and I had some issues with parsing weights in the past (see https://github.com/snukky/mosesdecoder/commit/20c3b6714e14442391d6dc239f44df884c950af6), but I doubt they are related. You may try my branch for tuning.

wulouzhu commented 5 years ago

@ghozn I met the same problem. Have you solved it?

ghozn commented 5 years ago

@wulouzhu I solve it by changing the branch of mosesdecoder to https://github.com/snukky/mosesdecoder/tree/gleu

wulouzhu commented 5 years ago

@ghozn Thank you! I will try it later.

wulouzhu commented 5 years ago

I retrained the model successfully after I changed the branch of mosesdecoder to https://github.com/snukky/mosesdecoder/tree/gleu. But when I test the model with "../mosesdecoder-gleu/bin/moses -f train/model.sparse/release/work.err-cor/binmodel.err-cor/moses.mert.avg.ini < input.txt". Something error happened: Created input-output object : [64.071] seconds Exception: moses/Word.cpp:159 in void Moses::Word::CreateFromString(Moses::FactorDirection, const std::vector&, const StringPiece&, bool, bool) threw util::Exception because `!isNonTerminal && i < factorOrder.size()'. Too few factors in string 'four'.

But when I use "python ./models/run_gecsmt.py...",it worked. Did you meet the same problem or do you know what's the problem? Thank you

snukky commented 5 years ago

Did you train a model with sparse features using word classes? Then WC factors need to be added manually (it's here: https://github.com/grammatical/baselines-emnlp2016/blob/master/models/run_gecsmt.py#L48)

grammatical / baselines-emnlp2016

Retraining the smt-2016 model #10