Closed cong1 closed 6 years ago
I think that layer normalization together with enc-depth
> 1 or dec-depth
> 2 for training Nematus model has never worked. I've never implemented or tested it, sorry. Marian supports only enc-cell-depth
and dec-cell-base-depth
with LN as they are needed for the architecture used in Edinburgh's WMT 2017 systems. We used to have warning/aborting messages for that, which probably disappear accidentally. I'll restore them.
Is there a particular reason why you need a Nematus-compatible model instead of "s2s"?
Yes, I just want to repeat the same experiment with nematus on Marian. Unfortunately, Marian cannot change the number of layers.
nematus
model type has been added as we wanted to decode specific models trained with Nematus using Marian decoder.
You may use --type s2s
and have a model architecture which is equivalent to Nematus.
The model won't be compatible with the Nematus toolkit, but you probably don't need that.
Finally, only --dec-cell-high-depth
> 1 is not supported with --type nematus
. I added a clear abort message and regression tests.
It does work with --type s2s
?
Yes, it does.
So, the reason it does not work with --type nematus
is our laziness?
I would say it's a time management optimization.
We only use --type nematus in the example, and it can work. Now we are using --type transformer.
Hi,There was nothing wrong when I configured the parameters to train the nematus model on GPU. But when I change the enc-depth from 1 to 2 and dec-depth from 1 to 2, the following error occurs.
I'm currently using: cmake-3.6.2 gcc-5.4.0 boost-1.64.0 CUDA - 8.0.61
The command is:
train model
if [ ! -e "model/model.npz.best-translation.npz" ] then $MARIAN/build/marian \ --devices $GPUS \ --type nematus \ --model model/model.npz \ --train-sets data/corpus.bpe.ro data/corpus.bpe.en \ --vocabs model/vocab.ro.yml model/vocab.en.yml \ --dim-vocabs 60000 70000 \ --dim-emb 500 \ --mini-batch 50 \ --enc-depth 2 \ --enc-cell-depth 2 \ --enc-type bidirectional \ --dec-depth 2 \ --dec-cell-base-depth 4 \ --dec-cell-high-depth 2 \ --dec-cell gru-nematus --enc-cell gru-nematus \ --tied-embeddings true \ --layer-normalization true\ --dropout-rnn 0.2 --dropout-src 0.1 --dropout-trg 0.1 \ --early-stopping 5 \ --valid-freq 10000 --save-freq 10000 --disp-freq 1000 \ --valid-metrics cross-entropy translation \ --valid-sets data/newsdev2016.bpe.ro data/newsdev2016.bpe.en \ --valid-script-path ./scripts/validate.sh \ --log model/train.log --valid-log model/valid.log \ --overwrite False --keep-best fi
The error is: nohup: 忽略输入 Using GPUs: 2 3 train.sh:行43: $MARIAN/build/marian \ --devices $GPUS \ --type nematus \ --model model/model.npz \ --train-sets data/corpus.bpe.ro data/corpus.bpe.en \ --vocabs model/vocab.ro.yml model/vocab.en.yml \ --dim-vocabs 60000 70000 \ --enc-depth 4 \ --enc-cell-depth 4 \ --enc-type bidirectional \ --dec-depth 4 \ --dec-cell-base-depth 2 \ --dec-cell-high-depth 2 \ --dec-cell gru-nematus --enc-cell gru-nematus \ --tied-embeddings true \ --layer-normalization true\ --dropout-src 0.1 --dropout-trg 0.1 \ --early-stopping 10 \ --valid-freq 10000 --save-freq 15000 --disp-freq 100 \ --valid-metrics cross-entropy translation \ --valid-sets data/newsdev2016.bpe.ro data/newsdev2016.bpe.en \ --valid-script-path ./scripts/validate.sh \ --log model/train.log --valid-log model/valid.log \ --overwrite False --keep-best \
--seed 1111 --exponential-smoothing \
: 没有那个文件或目录 [2018-06-01 16:43:57] [config] after-batches: 0 [2018-06-01 16:43:57] [config] after-epochs: 0 [2018-06-01 16:43:57] [config] allow-unk: false [2018-06-01 16:43:57] [config] batch-flexible-lr: false [2018-06-01 16:43:57] [config] batch-normal-words: 1920 [2018-06-01 16:43:57] [config] beam-size: 12 [2018-06-01 16:43:57] [config] best-deep: false [2018-06-01 16:43:57] [config] clip-norm: 1 [2018-06-01 16:43:57] [config] cost-type: ce-mean [2018-06-01 16:43:57] [config] cpu-threads: 0 [2018-06-01 16:43:57] [config] data-weighting-type: sentence [2018-06-01 16:43:57] [config] dec-cell: gru-nematus [2018-06-01 16:43:57] [config] dec-cell-base-depth: 4 [2018-06-01 16:43:57] [config] dec-cell-high-depth: 2 [2018-06-01 16:43:57] [config] dec-depth: 2 [2018-06-01 16:43:57] [config] devices: [2018-06-01 16:43:57] [config] - 2 [2018-06-01 16:43:57] [config] - 3 [2018-06-01 16:43:57] [config] dim-emb: 500 [2018-06-01 16:43:57] [config] dim-rnn: 1024 [2018-06-01 16:43:57] [config] dim-vocabs: [2018-06-01 16:43:57] [config] - 60000 [2018-06-01 16:43:57] [config] - 70000 [2018-06-01 16:43:57] [config] disp-freq: 1000 [2018-06-01 16:43:57] [config] dropout-rnn: 0.2 [2018-06-01 16:43:57] [config] dropout-src: 0.1 [2018-06-01 16:43:57] [config] dropout-trg: 0.1 [2018-06-01 16:43:57] [config] early-stopping: 5 [2018-06-01 16:43:57] [config] embedding-fix-src: false [2018-06-01 16:43:57] [config] embedding-fix-trg: false [2018-06-01 16:43:57] [config] embedding-normalization: false [2018-06-01 16:43:57] [config] enc-cell: gru-nematus [2018-06-01 16:43:57] [config] enc-cell-depth: 2 [2018-06-01 16:43:57] [config] enc-depth: 2 [2018-06-01 16:43:57] [config] enc-type: bidirectional [2018-06-01 16:43:57] [config] exponential-smoothing: 0 [2018-06-01 16:43:57] [config] grad-dropping-momentum: 0 [2018-06-01 16:43:57] [config] grad-dropping-rate: 0 [2018-06-01 16:43:57] [config] grad-dropping-warmup: 100 [2018-06-01 16:43:57] [config] guided-alignment-cost: ce [2018-06-01 16:43:57] [config] guided-alignment-weight: 1 [2018-06-01 16:43:57] [config] ignore-model-config: false [2018-06-01 16:43:57] [config] keep-best: true [2018-06-01 16:43:57] [config] label-smoothing: 0 [2018-06-01 16:43:57] [config] layer-normalization: true [2018-06-01 16:43:57] [config] learn-rate: 0.0001 [2018-06-01 16:43:57] [config] log: model/train.log [2018-06-01 16:43:57] [config] log-level: info [2018-06-01 16:43:57] [config] lr-decay: 0 [2018-06-01 16:43:57] [config] lr-decay-freq: 50000 [2018-06-01 16:43:57] [config] lr-decay-inv-sqrt: 0 [2018-06-01 16:43:57] [config] lr-decay-repeat-warmup: false [2018-06-01 16:43:57] [config] lr-decay-reset-optimizer: false [2018-06-01 16:43:57] [config] lr-decay-start: [2018-06-01 16:43:57] [config] - 10 [2018-06-01 16:43:57] [config] - 1 [2018-06-01 16:43:57] [config] lr-decay-strategy: epoch+stalled [2018-06-01 16:43:57] [config] lr-report: false [2018-06-01 16:43:57] [config] lr-warmup: 0 [2018-06-01 16:43:57] [config] lr-warmup-at-reload: false [2018-06-01 16:43:57] [config] lr-warmup-cycle: false [2018-06-01 16:43:57] [config] lr-warmup-start-rate: 0 [2018-06-01 16:43:57] [config] max-length: 50 [2018-06-01 16:43:57] [config] max-length-crop: false [2018-06-01 16:43:57] [config] maxi-batch: 100 [2018-06-01 16:43:57] [config] maxi-batch-sort: trg [2018-06-01 16:43:57] [config] mini-batch: 50 [2018-06-01 16:43:57] [config] mini-batch-fit: false [2018-06-01 16:43:57] [config] mini-batch-fit-step: 10 [2018-06-01 16:43:57] [config] mini-batch-words: 0 [2018-06-01 16:43:57] [config] model: model/model.npz [2018-06-01 16:43:57] [config] multi-node: false [2018-06-01 16:43:57] [config] multi-node-overlap: true [2018-06-01 16:43:57] [config] n-best: false [2018-06-01 16:43:57] [config] no-reload: false [2018-06-01 16:43:57] [config] no-restore-corpus: false [2018-06-01 16:43:57] [config] no-shuffle: false [2018-06-01 16:43:57] [config] normalize: 0 [2018-06-01 16:43:57] [config] optimizer: adam [2018-06-01 16:43:57] [config] optimizer-delay: 1 [2018-06-01 16:43:57] [config] overwrite: true [2018-06-01 16:43:57] [config] quiet: false [2018-06-01 16:43:57] [config] quiet-translation: false [2018-06-01 16:43:57] [config] relative-paths: false [2018-06-01 16:43:57] [config] right-left: false [2018-06-01 16:43:57] [config] save-freq: 10000 [2018-06-01 16:43:57] [config] seed: 0 [2018-06-01 16:43:57] [config] skip: true [2018-06-01 16:43:57] [config] sqlite: "" [2018-06-01 16:43:57] [config] sqlite-drop: false [2018-06-01 16:43:57] [config] sync-sgd: false [2018-06-01 16:43:57] [config] tempdir: /tmp [2018-06-01 16:43:57] [config] tied-embeddings: true [2018-06-01 16:43:57] [config] tied-embeddings-all: false [2018-06-01 16:43:57] [config] tied-embeddings-src: false [2018-06-01 16:43:57] [config] train-sets: [2018-06-01 16:43:57] [config] - data/corpus.bpe.ro [2018-06-01 16:43:57] [config] - data/corpus.bpe.en [2018-06-01 16:43:57] [config] transformer-dim-ffn: 2048 [2018-06-01 16:43:57] [config] transformer-dropout: 0 [2018-06-01 16:43:57] [config] transformer-dropout-attention: 0 [2018-06-01 16:43:57] [config] transformer-dropout-ffn: 0 [2018-06-01 16:43:57] [config] transformer-ffn-activation: swish [2018-06-01 16:43:57] [config] transformer-heads: 8 [2018-06-01 16:43:57] [config] transformer-postprocess: dan [2018-06-01 16:43:57] [config] transformer-postprocess-emb: d [2018-06-01 16:43:57] [config] transformer-preprocess: "" [2018-06-01 16:43:57] [config] type: nematus [2018-06-01 16:43:57] [config] valid-freq: 10000 [2018-06-01 16:43:57] [config] valid-log: model/valid.log [2018-06-01 16:43:57] [config] valid-max-length: 1000 [2018-06-01 16:43:57] [config] valid-metrics: [2018-06-01 16:43:57] [config] - cross-entropy [2018-06-01 16:43:57] [config] - translation [2018-06-01 16:43:57] [config] valid-mini-batch: 32 [2018-06-01 16:43:57] [config] valid-script-path: ./scripts/validate.sh [2018-06-01 16:43:57] [config] valid-sets: [2018-06-01 16:43:57] [config] - data/newsdev2016.bpe.ro [2018-06-01 16:43:57] [config] - data/newsdev2016.bpe.en [2018-06-01 16:43:57] [config] vocabs: [2018-06-01 16:43:57] [config] - model/vocab.ro.yml [2018-06-01 16:43:57] [config] - model/vocab.en.yml [2018-06-01 16:43:57] [config] workspace: 2048 [2018-06-01 16:43:57] [data] Loading vocabulary from model/vocab.ro.yml [2018-06-01 16:43:58] [data] Setting vocabulary size for input 0 to 60000 [2018-06-01 16:43:58] [data] Loading vocabulary from model/vocab.en.yml [2018-06-01 16:43:58] [data] Setting vocabulary size for input 1 to 70000 [2018-06-01 16:44:01] [memory] Extending reserved space to 2048 MB (device gpu2) [2018-06-01 16:44:01] [memory] Extending reserved space to 2048 MB (device gpu3) [2018-06-01 16:44:01] Training started [2018-06-01 16:44:01] [data] Shuffling files [2018-06-01 16:45:26] [data] Done marian: /media/ntfs-1/lcc/marian/src/marian/src/rnn/cells.h:424: virtual marian::rnn::State marian::rnn::GRUNematus::applyState(std::vector<std::shared_ptr<marian::Chainable<std::sharedptr > > >, marian::rnn::State, marian::Expr): Assertion `transition == xWs.empty()' failed.
train.sh: 行 74: 30799 已放弃 (吐核)$MARIAN/build/marian --devices $GPUS --type nematus --model model/model.npz --train-sets data/corpus.bpe.ro data/corpus.bpe.en --vocabs model/vocab.ro.yml model/vocab.en.yml --dim-vocabs 60000 70000 --dim-emb 500 --mini-batch 50 --enc-depth 2 --enc-cell-depth 2 --enc-type bidirectional --dec-depth 2 --skip true --dec-cell-base-depth 4 --dec-cell-high-depth 2 --dec-cell gru-nematus --enc-cell gru-nematus --tied-embeddings true --layer-normalization true --dropout-rnn 0.2 --dropout-src 0.1 --dropout-trg 0.1 --early-stopping 5 --valid-freq 10000 --save-freq 10000 --disp-freq 1000 --valid-metrics cross-entropy translation --valid-sets data/newsdev2016.bpe.ro data/newsdev2016.bpe.en --valid-script-path ./scripts/validate.sh --log model/train.log --valid-log model/valid.log --overwrite False --keep-best
Error: File 'model/model.npz.best-translation.npz.decoder.yml' does not exist - aborting
Aborted from InputFileStream::InputFileStream(const string&) in /media/ntfs-1/lucongcong/marian/src/marian/src/common/file_stream.h: 83
Detokenizer Version $Revision: 4134 $
Language: en
Error: File 'model/model.npz.best-translation.npz.decoder.yml' does not exist - aborting
Aborted from InputFileStream::InputFileStream(const string&) in /media/ntfs-1/lucongcong/marian/src/marian/src/common/file_stream.h: 83
Detokenizer Version $Revision: 4134 $
Language: en
Use of uninitialized value $length_reference in numeric eq (==) at ../tools/moses-scripts/scripts/generic/multi-bleu-detok.perl line 157.
BLEU = 0, 0/0/0/0 (BP=0, ratio=0, hyp_len=0, ref_len=0)
Use of uninitialized value $length_reference in numeric eq (==) at ../tools/moses-scripts/scripts/generic/multi-bleu-detok.perl line 157.
BLEU = 0, 0/0/0/0 (BP=0, ratio=0, hyp_len=0, ref_len=0)