Getting "Model does not seem to support alignments" when using --guided-alignment with Transformer models

tomsbergmanis commented 6 years ago

Hi!

We tried to train a Transformer model with guided alignment. Upon executing the training script, we got an error stating Model does not seem to support alignments. Do you have an idea of what goes wrong here?

Here is the execution command:

marian \
--model $MODEL_ROOT/model.npz --type transformer \
--train-sets data/train.1-to-1.merged.bpe.en  data/train.1-to-1.merged.bpe.lv  \
--max-length 128 \
--vocabs data/train.bpe.en-lv.yml data/train.bpe.en-lv.yml \
--mini-batch-fit -w 10000 --maxi-batch 1000 \
--early-stopping 10 \
--valid-freq 10000  --save-freq 10000 --disp-freq 1000 \
--valid-metrics cross-entropy perplexity translation \
--valid-sets data/tune.tc.id.1-to-1.merged.bpe.en data/tune.tc.id.1-to-1.merged.bpe.lv \
--valid-script-path $SCRIPT_ROOT/validate.sh \
--valid-translation-output data/tune.tc.id.1-to-1.merged.bpe.lv.output --quiet-translation \
--valid-mini-batch 64 \
--beam-size 6 --normalize 0.6 \
--log $MODEL_ROOT/train.log --valid-log $MODEL_ROOT/valid.log \
--enc-depth 4 --dec-depth 4 \
--transformer-heads 6 \
--transformer-postprocess-emb d \
--transformer-postprocess dan \
--transformer-dim-aan 1024 \
--transformer-dim-ffn 1024 \
--learn-rate 0.0003 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report \
--optimizer-params 0.9 0.98 1e-09 --clip-norm 5 \
--tied-embeddings-all \
--devices $GPUS --sync-sgd --seed 1111 \
--guided-alignment data/train.1-to-1.en-lv.align.grow-diag-final-and \
--exponential-smoothing

Here is the output:

[2018-08-31 16:02:14] [config] after-batches: 0
[2018-08-31 16:02:14] [config] after-epochs: 0
[2018-08-31 16:02:14] [config] allow-unk: false
[2018-08-31 16:02:14] [config] batch-flexible-lr: false
[2018-08-31 16:02:14] [config] batch-normal-words: 1920
[2018-08-31 16:02:14] [config] beam-size: 6
[2018-08-31 16:02:14] [config] best-deep: false
[2018-08-31 16:02:14] [config] clip-gemm: 0
[2018-08-31 16:02:14] [config] clip-norm: 5
[2018-08-31 16:02:14] [config] cost-type: ce-mean
[2018-08-31 16:02:14] [config] cpu-threads: 0
[2018-08-31 16:02:14] [config] data-weighting-type: sentence
[2018-08-31 16:02:14] [config] dec-cell: gru
[2018-08-31 16:02:14] [config] dec-cell-base-depth: 2
[2018-08-31 16:02:14] [config] dec-cell-high-depth: 1
[2018-08-31 16:02:14] [config] dec-depth: 4
[2018-08-31 16:02:14] [config] devices:
[2018-08-31 16:02:14] [config]   - 0
[2018-08-31 16:02:14] [config] dim-emb: 512
[2018-08-31 16:02:14] [config] dim-rnn: 1024
[2018-08-31 16:02:14] [config] dim-vocabs:
[2018-08-31 16:02:14] [config]   - 0
[2018-08-31 16:02:14] [config]   - 0
[2018-08-31 16:02:14] [config] disp-freq: 1000
[2018-08-31 16:02:14] [config] disp-label-counts: false
[2018-08-31 16:02:14] [config] dropout-rnn: 0
[2018-08-31 16:02:14] [config] dropout-src: 0
[2018-08-31 16:02:14] [config] dropout-trg: 0
[2018-08-31 16:02:14] [config] early-stopping: 10
[2018-08-31 16:02:14] [config] embedding-fix-src: false
[2018-08-31 16:02:14] [config] embedding-fix-trg: false
[2018-08-31 16:02:14] [config] embedding-normalization: false
[2018-08-31 16:02:14] [config] enc-cell: gru
[2018-08-31 16:02:14] [config] enc-cell-depth: 1
[2018-08-31 16:02:14] [config] enc-depth: 4
[2018-08-31 16:02:14] [config] enc-type: bidirectional
[2018-08-31 16:02:14] [config] exponential-smoothing: 0.0001
[2018-08-31 16:02:14] [config] grad-dropping-momentum: 0
[2018-08-31 16:02:14] [config] grad-dropping-rate: 0
[2018-08-31 16:02:14] [config] grad-dropping-warmup: 100
[2018-08-31 16:02:14] [config] guided-alignment: data/train.1-to-1.en-lv.align.grow-diag-final-and
[2018-08-31 16:02:14] [config] guided-alignment-cost: ce
[2018-08-31 16:02:14] [config] guided-alignment-weight: 1
[2018-08-31 16:02:14] [config] ignore-model-config: false
[2018-08-31 16:02:14] [config] interpolate-env-vars: false
[2018-08-31 16:02:14] [config] keep-best: false
[2018-08-31 16:02:14] [config] label-smoothing: 0
[2018-08-31 16:02:14] [config] layer-normalization: false
[2018-08-31 16:02:14] [config] learn-rate: 0.0003
[2018-08-31 16:02:14] [config] log: models_guided_alignment/train.log
[2018-08-31 16:02:14] [config] log-level: info
[2018-08-31 16:02:14] [config] lr-decay: 0
[2018-08-31 16:02:14] [config] lr-decay-freq: 50000
[2018-08-31 16:02:14] [config] lr-decay-inv-sqrt: 16000
[2018-08-31 16:02:14] [config] lr-decay-repeat-warmup: false
[2018-08-31 16:02:14] [config] lr-decay-reset-optimizer: false
[2018-08-31 16:02:14] [config] lr-decay-start:
[2018-08-31 16:02:14] [config]   - 10
[2018-08-31 16:02:14] [config]   - 1
[2018-08-31 16:02:14] [config] lr-decay-strategy: epoch+stalled
[2018-08-31 16:02:14] [config] lr-report: true
[2018-08-31 16:02:14] [config] lr-warmup: 16000
[2018-08-31 16:02:14] [config] lr-warmup-at-reload: false
[2018-08-31 16:02:14] [config] lr-warmup-cycle: false
[2018-08-31 16:02:14] [config] lr-warmup-start-rate: 0
[2018-08-31 16:02:14] [config] max-length: 128
[2018-08-31 16:02:14] [config] max-length-crop: false
[2018-08-31 16:02:14] [config] max-length-factor: 3
[2018-08-31 16:02:14] [config] maxi-batch: 1000
[2018-08-31 16:02:14] [config] maxi-batch-sort: trg
[2018-08-31 16:02:14] [config] mini-batch: 64
[2018-08-31 16:02:14] [config] mini-batch-fit: true
[2018-08-31 16:02:14] [config] mini-batch-fit-step: 10
[2018-08-31 16:02:14] [config] mini-batch-words: 0
[2018-08-31 16:02:14] [config] model: models_guided_alignment/model.npz
[2018-08-31 16:02:14] [config] multi-node: false
[2018-08-31 16:02:14] [config] multi-node-overlap: true
[2018-08-31 16:02:14] [config] n-best: false
[2018-08-31 16:02:14] [config] no-reload: false
[2018-08-31 16:02:14] [config] no-restore-corpus: false
[2018-08-31 16:02:14] [config] no-shuffle: false
[2018-08-31 16:02:14] [config] normalize: 0.6
[2018-08-31 16:02:14] [config] optimizer: adam
[2018-08-31 16:02:14] [config] optimizer-delay: 1
[2018-08-31 16:02:14] [config] optimizer-params:
[2018-08-31 16:02:14] [config]   - 0.9
[2018-08-31 16:02:14] [config]   - 0.98
[2018-08-31 16:02:14] [config]   - 1e-09
[2018-08-31 16:02:14] [config] overwrite: false
[2018-08-31 16:02:14] [config] quiet: false
[2018-08-31 16:02:14] [config] quiet-translation: true
[2018-08-31 16:02:14] [config] relative-paths: false
[2018-08-31 16:02:14] [config] right-left: false
[2018-08-31 16:02:14] [config] save-freq: 10000
[2018-08-31 16:02:14] [config] seed: 1111
[2018-08-31 16:02:14] [config] skip: false
[2018-08-31 16:02:14] [config] sqlite: ""
[2018-08-31 16:02:14] [config] sqlite-drop: false
[2018-08-31 16:02:14] [config] sync-sgd: true
[2018-08-31 16:02:14] [config] tempdir: /tmp
[2018-08-31 16:02:14] [config] tied-embeddings: false
[2018-08-31 16:02:14] [config] tied-embeddings-all: true
[2018-08-31 16:02:14] [config] tied-embeddings-src: false
[2018-08-31 16:02:14] [config] train-sets:
[2018-08-31 16:02:14] [config]   - data/train.1-to-1.merged.bpe.en
[2018-08-31 16:02:14] [config]   - data/train.1-to-1.merged.bpe.lv
[2018-08-31 16:02:14] [config] transformer-aan-activation: swish
[2018-08-31 16:02:14] [config] transformer-aan-depth: 2
[2018-08-31 16:02:14] [config] transformer-aan-nogate: false
[2018-08-31 16:02:14] [config] transformer-decoder-autoreg: self-attention
[2018-08-31 16:02:14] [config] transformer-dim-aan: 1024
[2018-08-31 16:02:14] [config] transformer-dim-ffn: 1024
[2018-08-31 16:02:14] [config] transformer-dropout: 0
[2018-08-31 16:02:14] [config] transformer-dropout-attention: 0
[2018-08-31 16:02:14] [config] transformer-dropout-ffn: 0
[2018-08-31 16:02:14] [config] transformer-ffn-activation: swish
[2018-08-31 16:02:14] [config] transformer-ffn-depth: 2
[2018-08-31 16:02:14] [config] transformer-heads: 6
[2018-08-31 16:02:14] [config] transformer-no-projection: false
[2018-08-31 16:02:14] [config] transformer-postprocess: dan
[2018-08-31 16:02:14] [config] transformer-postprocess-emb: d
[2018-08-31 16:02:14] [config] transformer-preprocess: ""
[2018-08-31 16:02:14] [config] transformer-tied-layers:
[2018-08-31 16:02:14] [config]   []
[2018-08-31 16:02:14] [config] type: transformer
[2018-08-31 16:02:14] [config] valid-freq: 10000
[2018-08-31 16:02:14] [config] valid-log: models_guided_alignment/valid.log
[2018-08-31 16:02:14] [config] valid-max-length: 1000
[2018-08-31 16:02:14] [config] valid-metrics:
[2018-08-31 16:02:14] [config]   - cross-entropy
[2018-08-31 16:02:14] [config]   - perplexity
[2018-08-31 16:02:14] [config]   - translation
[2018-08-31 16:02:14] [config] valid-mini-batch: 64
[2018-08-31 16:02:14] [config] valid-script-path: scripts/validate.sh
[2018-08-31 16:02:14] [config] valid-sets:
[2018-08-31 16:02:14] [config]   - data/tune.tc.id.1-to-1.merged.bpe.en
[2018-08-31 16:02:14] [config]   - data/tune.tc.id.1-to-1.merged.bpe.lv
[2018-08-31 16:02:14] [config] valid-translation-output: data/tune.tc.id.1-to-1.merged.bpe.lv.output
[2018-08-31 16:02:14] [config] vocabs:
[2018-08-31 16:02:14] [config]   - data/train.bpe.en-lv.yml
[2018-08-31 16:02:14] [config]   - data/train.bpe.en-lv.yml
[2018-08-31 16:02:14] [config] word-penalty: 0
[2018-08-31 16:02:14] [config] workspace: 10000
[2018-08-31 16:02:14] [data] Loading vocabulary from JSON/Yaml file data/train.bpe.en-lv.yml
[2018-08-31 16:02:14] [data] Setting vocabulary size for input 0 to 33865
[2018-08-31 16:02:14] [data] Loading vocabulary from JSON/Yaml file data/train.bpe.en-lv.yml
[2018-08-31 16:02:14] [data] Setting vocabulary size for input 1 to 33865
[2018-08-31 16:02:14] [data] Using word alignments from file data/train.1-to-1.en-lv.align.grow-diag-final-and
[2018-08-31 16:02:14] [batching] Collecting statistics for batch fitting with step size 10
[2018-08-31 16:02:15] [memory] Extending reserved space to 10112 MB (device gpu0)
[2018-08-31 16:02:15] Model does not seem to support alignments
Aborted from virtual marian::Expr marian::models::EncoderDecoderCE::apply(marian::Ptr<marian::models::ModelBase>, marian::Ptr<marian::ExpressionGraph>, marian::Ptr<marian::data::Batch>, bool) in /marian/src/marian/src/models/costs.h: 64
./train_guided_alignment.sh: line 38: 10381 Aborted                 (core dumped)

emjotde commented 6 years ago

Did you use master from marian-dev? The main repo does not have this yet.

emjotde commented 6 years ago

@snukky Looking and the output, shouldn't the config displayed during training contain version and compilation hash in the log?

emjotde commented 6 years ago

@tomsbergmanis if you are not seeing this, you have the wrong version:

./marian --version
v1.6.0+87c98cc

snukky commented 6 years ago

@emjotde Ha, the version is stored in the model file, so it shows up only if the training has been restarted. It's not a bug, it's a feature... ?

But logging the version may simplify our lives, so I'll add it.

tomsbergmanis commented 6 years ago

Sorry ... we have the wrong version: v1.6.0+bda9b18 ... because we cloned the marian and not marian-dev project.

marian-nmt / marian-dev

Getting "Model does not seem to support alignments" when using --guided-alignment with Transformer models #289