marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.25k stars 233 forks source link

Training Errors for Nematus #184

Closed cong1 closed 6 years ago

cong1 commented 6 years ago

Hi,There was nothing wrong when I configured the parameters to train the nematus model on GPU. But when I change the enc-depth from 1 to 2 and dec-depth from 1 to 2, the following error occurs.

I'm currently using: cmake-3.6.2 gcc-5.4.0 boost-1.64.0 CUDA - 8.0.61

The command is:

train model

if [ ! -e "model/model.npz.best-translation.npz" ] then $MARIAN/build/marian \ --devices $GPUS \ --type nematus \ --model model/model.npz \ --train-sets data/corpus.bpe.ro data/corpus.bpe.en \ --vocabs model/vocab.ro.yml model/vocab.en.yml \ --dim-vocabs 60000 70000 \ --dim-emb 500 \ --mini-batch 50 \ --enc-depth 2 \ --enc-cell-depth 2 \ --enc-type bidirectional \ --dec-depth 2 \ --dec-cell-base-depth 4 \ --dec-cell-high-depth 2 \ --dec-cell gru-nematus --enc-cell gru-nematus \ --tied-embeddings true \ --layer-normalization true\ --dropout-rnn 0.2 --dropout-src 0.1 --dropout-trg 0.1 \ --early-stopping 5 \ --valid-freq 10000 --save-freq 10000 --disp-freq 1000 \ --valid-metrics cross-entropy translation \ --valid-sets data/newsdev2016.bpe.ro data/newsdev2016.bpe.en \ --valid-script-path ./scripts/validate.sh \ --log model/train.log --valid-log model/valid.log \ --overwrite False --keep-best fi

The error is: nohup: 忽略输入 Using GPUs: 2 3 train.sh:行43: $MARIAN/build/marian \ --devices $GPUS \ --type nematus \ --model model/model.npz \ --train-sets data/corpus.bpe.ro data/corpus.bpe.en \ --vocabs model/vocab.ro.yml model/vocab.en.yml \ --dim-vocabs 60000 70000 \ --enc-depth 4 \ --enc-cell-depth 4 \ --enc-type bidirectional \ --dec-depth 4 \ --dec-cell-base-depth 2 \ --dec-cell-high-depth 2 \ --dec-cell gru-nematus --enc-cell gru-nematus \ --tied-embeddings true \ --layer-normalization true\ --dropout-src 0.1 --dropout-trg 0.1 \ --early-stopping 10 \ --valid-freq 10000 --save-freq 15000 --disp-freq 100 \ --valid-metrics cross-entropy translation \ --valid-sets data/newsdev2016.bpe.ro data/newsdev2016.bpe.en \ --valid-script-path ./scripts/validate.sh \ --log model/train.log --valid-log model/valid.log \ --overwrite False --keep-best \

--seed 1111 --exponential-smoothing \

    --normalize=1 --beam-size=12 --quiet-translation

: 没有那个文件或目录 [2018-06-01 16:43:57] [config] after-batches: 0 [2018-06-01 16:43:57] [config] after-epochs: 0 [2018-06-01 16:43:57] [config] allow-unk: false [2018-06-01 16:43:57] [config] batch-flexible-lr: false [2018-06-01 16:43:57] [config] batch-normal-words: 1920 [2018-06-01 16:43:57] [config] beam-size: 12 [2018-06-01 16:43:57] [config] best-deep: false [2018-06-01 16:43:57] [config] clip-norm: 1 [2018-06-01 16:43:57] [config] cost-type: ce-mean [2018-06-01 16:43:57] [config] cpu-threads: 0 [2018-06-01 16:43:57] [config] data-weighting-type: sentence [2018-06-01 16:43:57] [config] dec-cell: gru-nematus [2018-06-01 16:43:57] [config] dec-cell-base-depth: 4 [2018-06-01 16:43:57] [config] dec-cell-high-depth: 2 [2018-06-01 16:43:57] [config] dec-depth: 2 [2018-06-01 16:43:57] [config] devices: [2018-06-01 16:43:57] [config] - 2 [2018-06-01 16:43:57] [config] - 3 [2018-06-01 16:43:57] [config] dim-emb: 500 [2018-06-01 16:43:57] [config] dim-rnn: 1024 [2018-06-01 16:43:57] [config] dim-vocabs: [2018-06-01 16:43:57] [config] - 60000 [2018-06-01 16:43:57] [config] - 70000 [2018-06-01 16:43:57] [config] disp-freq: 1000 [2018-06-01 16:43:57] [config] dropout-rnn: 0.2 [2018-06-01 16:43:57] [config] dropout-src: 0.1 [2018-06-01 16:43:57] [config] dropout-trg: 0.1 [2018-06-01 16:43:57] [config] early-stopping: 5 [2018-06-01 16:43:57] [config] embedding-fix-src: false [2018-06-01 16:43:57] [config] embedding-fix-trg: false [2018-06-01 16:43:57] [config] embedding-normalization: false [2018-06-01 16:43:57] [config] enc-cell: gru-nematus [2018-06-01 16:43:57] [config] enc-cell-depth: 2 [2018-06-01 16:43:57] [config] enc-depth: 2 [2018-06-01 16:43:57] [config] enc-type: bidirectional [2018-06-01 16:43:57] [config] exponential-smoothing: 0 [2018-06-01 16:43:57] [config] grad-dropping-momentum: 0 [2018-06-01 16:43:57] [config] grad-dropping-rate: 0 [2018-06-01 16:43:57] [config] grad-dropping-warmup: 100 [2018-06-01 16:43:57] [config] guided-alignment-cost: ce [2018-06-01 16:43:57] [config] guided-alignment-weight: 1 [2018-06-01 16:43:57] [config] ignore-model-config: false [2018-06-01 16:43:57] [config] keep-best: true [2018-06-01 16:43:57] [config] label-smoothing: 0 [2018-06-01 16:43:57] [config] layer-normalization: true [2018-06-01 16:43:57] [config] learn-rate: 0.0001 [2018-06-01 16:43:57] [config] log: model/train.log [2018-06-01 16:43:57] [config] log-level: info [2018-06-01 16:43:57] [config] lr-decay: 0 [2018-06-01 16:43:57] [config] lr-decay-freq: 50000 [2018-06-01 16:43:57] [config] lr-decay-inv-sqrt: 0 [2018-06-01 16:43:57] [config] lr-decay-repeat-warmup: false [2018-06-01 16:43:57] [config] lr-decay-reset-optimizer: false [2018-06-01 16:43:57] [config] lr-decay-start: [2018-06-01 16:43:57] [config] - 10 [2018-06-01 16:43:57] [config] - 1 [2018-06-01 16:43:57] [config] lr-decay-strategy: epoch+stalled [2018-06-01 16:43:57] [config] lr-report: false [2018-06-01 16:43:57] [config] lr-warmup: 0 [2018-06-01 16:43:57] [config] lr-warmup-at-reload: false [2018-06-01 16:43:57] [config] lr-warmup-cycle: false [2018-06-01 16:43:57] [config] lr-warmup-start-rate: 0 [2018-06-01 16:43:57] [config] max-length: 50 [2018-06-01 16:43:57] [config] max-length-crop: false [2018-06-01 16:43:57] [config] maxi-batch: 100 [2018-06-01 16:43:57] [config] maxi-batch-sort: trg [2018-06-01 16:43:57] [config] mini-batch: 50 [2018-06-01 16:43:57] [config] mini-batch-fit: false [2018-06-01 16:43:57] [config] mini-batch-fit-step: 10 [2018-06-01 16:43:57] [config] mini-batch-words: 0 [2018-06-01 16:43:57] [config] model: model/model.npz [2018-06-01 16:43:57] [config] multi-node: false [2018-06-01 16:43:57] [config] multi-node-overlap: true [2018-06-01 16:43:57] [config] n-best: false [2018-06-01 16:43:57] [config] no-reload: false [2018-06-01 16:43:57] [config] no-restore-corpus: false [2018-06-01 16:43:57] [config] no-shuffle: false [2018-06-01 16:43:57] [config] normalize: 0 [2018-06-01 16:43:57] [config] optimizer: adam [2018-06-01 16:43:57] [config] optimizer-delay: 1 [2018-06-01 16:43:57] [config] overwrite: true [2018-06-01 16:43:57] [config] quiet: false [2018-06-01 16:43:57] [config] quiet-translation: false [2018-06-01 16:43:57] [config] relative-paths: false [2018-06-01 16:43:57] [config] right-left: false [2018-06-01 16:43:57] [config] save-freq: 10000 [2018-06-01 16:43:57] [config] seed: 0 [2018-06-01 16:43:57] [config] skip: true [2018-06-01 16:43:57] [config] sqlite: "" [2018-06-01 16:43:57] [config] sqlite-drop: false [2018-06-01 16:43:57] [config] sync-sgd: false [2018-06-01 16:43:57] [config] tempdir: /tmp [2018-06-01 16:43:57] [config] tied-embeddings: true [2018-06-01 16:43:57] [config] tied-embeddings-all: false [2018-06-01 16:43:57] [config] tied-embeddings-src: false [2018-06-01 16:43:57] [config] train-sets: [2018-06-01 16:43:57] [config] - data/corpus.bpe.ro [2018-06-01 16:43:57] [config] - data/corpus.bpe.en [2018-06-01 16:43:57] [config] transformer-dim-ffn: 2048 [2018-06-01 16:43:57] [config] transformer-dropout: 0 [2018-06-01 16:43:57] [config] transformer-dropout-attention: 0 [2018-06-01 16:43:57] [config] transformer-dropout-ffn: 0 [2018-06-01 16:43:57] [config] transformer-ffn-activation: swish [2018-06-01 16:43:57] [config] transformer-heads: 8 [2018-06-01 16:43:57] [config] transformer-postprocess: dan [2018-06-01 16:43:57] [config] transformer-postprocess-emb: d [2018-06-01 16:43:57] [config] transformer-preprocess: "" [2018-06-01 16:43:57] [config] type: nematus [2018-06-01 16:43:57] [config] valid-freq: 10000 [2018-06-01 16:43:57] [config] valid-log: model/valid.log [2018-06-01 16:43:57] [config] valid-max-length: 1000 [2018-06-01 16:43:57] [config] valid-metrics: [2018-06-01 16:43:57] [config] - cross-entropy [2018-06-01 16:43:57] [config] - translation [2018-06-01 16:43:57] [config] valid-mini-batch: 32 [2018-06-01 16:43:57] [config] valid-script-path: ./scripts/validate.sh [2018-06-01 16:43:57] [config] valid-sets: [2018-06-01 16:43:57] [config] - data/newsdev2016.bpe.ro [2018-06-01 16:43:57] [config] - data/newsdev2016.bpe.en [2018-06-01 16:43:57] [config] vocabs: [2018-06-01 16:43:57] [config] - model/vocab.ro.yml [2018-06-01 16:43:57] [config] - model/vocab.en.yml [2018-06-01 16:43:57] [config] workspace: 2048 [2018-06-01 16:43:57] [data] Loading vocabulary from model/vocab.ro.yml [2018-06-01 16:43:58] [data] Setting vocabulary size for input 0 to 60000 [2018-06-01 16:43:58] [data] Loading vocabulary from model/vocab.en.yml [2018-06-01 16:43:58] [data] Setting vocabulary size for input 1 to 70000 [2018-06-01 16:44:01] [memory] Extending reserved space to 2048 MB (device gpu2) [2018-06-01 16:44:01] [memory] Extending reserved space to 2048 MB (device gpu3) [2018-06-01 16:44:01] Training started [2018-06-01 16:44:01] [data] Shuffling files [2018-06-01 16:45:26] [data] Done marian: /media/ntfs-1/lcc/marian/src/marian/src/rnn/cells.h:424: virtual marian::rnn::State marian::rnn::GRUNematus::applyState(std::vector<std::shared_ptr<marian::Chainable<std::sharedptr > > >, marian::rnn::State, marian::Expr): Assertion `transition == xWs.empty()' failed. train.sh: 行 74: 30799 已放弃 (吐核)$MARIAN/build/marian --devices $GPUS --type nematus --model model/model.npz --train-sets data/corpus.bpe.ro data/corpus.bpe.en --vocabs model/vocab.ro.yml model/vocab.en.yml --dim-vocabs 60000 70000 --dim-emb 500 --mini-batch 50 --enc-depth 2 --enc-cell-depth 2 --enc-type bidirectional --dec-depth 2 --skip true --dec-cell-base-depth 4 --dec-cell-high-depth 2 --dec-cell gru-nematus --enc-cell gru-nematus --tied-embeddings true --layer-normalization true --dropout-rnn 0.2 --dropout-src 0.1 --dropout-trg 0.1 --early-stopping 5 --valid-freq 10000 --save-freq 10000 --disp-freq 1000 --valid-metrics cross-entropy translation --valid-sets data/newsdev2016.bpe.ro data/newsdev2016.bpe.en --valid-script-path ./scripts/validate.sh --log model/train.log --valid-log model/valid.log --overwrite False --keep-best Error: File 'model/model.npz.best-translation.npz.decoder.yml' does not exist - aborting Aborted from InputFileStream::InputFileStream(const string&) in /media/ntfs-1/lucongcong/marian/src/marian/src/common/file_stream.h: 83 Detokenizer Version $Revision: 4134 $ Language: en Error: File 'model/model.npz.best-translation.npz.decoder.yml' does not exist - aborting Aborted from InputFileStream::InputFileStream(const string&) in /media/ntfs-1/lucongcong/marian/src/marian/src/common/file_stream.h: 83 Detokenizer Version $Revision: 4134 $ Language: en Use of uninitialized value $length_reference in numeric eq (==) at ../tools/moses-scripts/scripts/generic/multi-bleu-detok.perl line 157. BLEU = 0, 0/0/0/0 (BP=0, ratio=0, hyp_len=0, ref_len=0) Use of uninitialized value $length_reference in numeric eq (==) at ../tools/moses-scripts/scripts/generic/multi-bleu-detok.perl line 157. BLEU = 0, 0/0/0/0 (BP=0, ratio=0, hyp_len=0, ref_len=0)

snukky commented 6 years ago

I think that layer normalization together with enc-depth > 1 or dec-depth > 2 for training Nematus model has never worked. I've never implemented or tested it, sorry. Marian supports only enc-cell-depth and dec-cell-base-depth with LN as they are needed for the architecture used in Edinburgh's WMT 2017 systems. We used to have warning/aborting messages for that, which probably disappear accidentally. I'll restore them.

Is there a particular reason why you need a Nematus-compatible model instead of "s2s"?

cong1 commented 6 years ago

Yes, I just want to repeat the same experiment with nematus on Marian. Unfortunately, Marian cannot change the number of layers.

snukky commented 6 years ago

nematus model type has been added as we wanted to decode specific models trained with Nematus using Marian decoder.

You may use --type s2s and have a model architecture which is equivalent to Nematus. The model won't be compatible with the Nematus toolkit, but you probably don't need that.

snukky commented 6 years ago

Finally, only --dec-cell-high-depth > 1 is not supported with --type nematus. I added a clear abort message and regression tests.

emjotde commented 6 years ago

It does work with --type s2s?

snukky commented 6 years ago

Yes, it does.

emjotde commented 6 years ago

So, the reason it does not work with --type nematus is our laziness?

snukky commented 6 years ago

I would say it's a time management optimization.

cong1 commented 6 years ago

We only use --type nematus in the example, and it can work. Now we are using --type transformer.