marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.22k stars 228 forks source link

About Parameters --after-epochs #370

Open LittleRooki opened 3 years ago

LittleRooki commented 3 years ago

When I ran the Transformer example, I set the training parameter command as follow: $MARIAN_TRAIN \ --model model/model.npz --type transformer \ --train-sets data/corpus.bpe.en data/corpus.bpe.de \ --max-length 100 \ --vocabs model/vocab.ende.yml model/vocab.ende.yml \ --mini-batch-fit -w 6000 --maxi-batch 1000 \ --early-stopping 10 --cost-type=ce-mean-words \ --after-epochs 2 \ --valid-freq 5000 --save-freq 5000 --disp-freq 500 \ --valid-metrics ce-mean-words perplexity translation \ --valid-sets data/valid.bpe.en data/valid.bpe.de \ --valid-script-path "bash ./scripts/validate.sh" \ --valid-translation-output data/valid.bpe.en.output --quiet-translation \ --valid-mini-batch 64 \ --beam-size 6 --normalize 0.6 \ --log model/train.log --valid-log model/valid.log \ --enc-depth 6 --dec-depth 6 \ --transformer-heads 8 \ --transformer-postprocess-emb d \ --transformer-postprocess dan \ --transformer-dropout 0.1 --label-smoothing 0.1 \ --learn-rate 0.0003 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report \ --optimizer-params 0.9 0.98 1e-09 --clip-norm 5 \ --tied-embeddings-all \ --devices $GPUS --sync-sgd --seed 1111 \ --exponential-smoothing

I add the command --after-epochs 2,but why did he start training for the Ep 3,at the same time ,error like this:

[2021-04-27 13:06:02] Starting data epoch 3 in logical epoch 3 [2021-04-27 13:06:02] Training finished [2021-04-27 13:06:03] [valid] Ep. 3 : Up. 23421 : ce-mean-words : 2.08992 : new best [2021-04-27 13:06:04] [valid] Ep. 3 : Up. 23421 : perplexity : 8.08427 : new best [2021-04-27 13:06:11] [valid] Ep. 3 : Up. 23421 : translation : 21.72 : new best [2021-04-27 13:06:12] Saving model weights and runtime parameters to model/model.npz.orig.npz [2021-04-27 13:06:14] Saving model weights and runtime parameters to model/model.npz [2021-04-27 13:06:16] Saving Adam parameters to model/model.npz.optimizer.npz

Error: Model file does not exist: model/model.iter23421.npz Error: Aborted from void marian::ConfigValidator::validateOptionsTranslation() const in /home/caohang/marian/src/common/config_validator.cpp:57

What should I do?

emjotde commented 3 years ago

Hm, check if the model folder exists?

emjotde commented 3 years ago

No, ignore that. The other files were saved. That seems to be a problem with the translation validator. @snukky can you take a look?

LittleRooki commented 3 years ago

No, ignore that. The other files were saved. That seems to be a problem with the translation validator. @snukky can you take a look?

Yes, model.iter5000.npz, model.iter10000.npz, model.iter15000.npz and model.iter20000.npz were saved, and I set epoch 2, I don't know why it showed : [valid] Ep. 3 : Up. 23421 : ce-mean-words : 2.08992 : new best [valid] Ep. 3 : Up. 23421 : perplexity : 8.08427 : new best [valid] Ep. 3 : Up. 23421 : translation : 21.72 : new best

this might cause the problem

emjotde commented 3 years ago

This is the final validation after training stopped. That's actually expected, but the error is weird.