hplt-project / OpusTrainer

Curriculum training
https://pypi.org/project/opustrainer/
MIT License
16 stars 5 forks source link

Investigating weird training bevavior #46

Closed eu9ene closed 10 months ago

eu9ene commented 10 months ago

I've been investigating an issue with training for a while and would appreciate if someone from OpusTrainer or Marian developers looked at what can be wrong. Basically my training/validation charts for teacher models look like this:

Screenshot 2024-01-04 at 3 41 51 PM

What's interesting is that 56k and 114k updates where the proper training starts coincide with OpusTrainer starting a new epoch.

It almost looks like OpusTrainer feeds data of worse quality for an epoch or two and then starts feeding proper data. I tried to save the produced data separately to look at it and didn't notice anything bad.

Another hypothesis is that it might be somehow related to Marian settings like learning rate warmup because I don't see this behavior for backward s2s and student models that have a slightly different configuration.

I initially thought that it's related to noise in back-translations but then reduced training to the original parallel corpus only and it still looks the same. My OpusTrainer config for this run:

datasets:
  original: <dataset0> # Original parallel corpus
  backtranslated: <dataset1> # Back-translated data

stages:
  - finetune

# Fine-tuning only on original clean corpus until the early stopping
finetune:
  - original 1.0
  - until original inf

modifiers:
- UpperCase: 0.07 # Apply randomly to 10% of sentences
- TitleCase: 0.05

seed: 1111
num_fields: 2

Full training log for teacher 1 Full training log for teacher 2 Parts of the update log for teacher 1:

[task 2023-12-21T22:38:03.145Z] + opustrainer-train --config /home/ubuntu/tasks/task_170319775225490/artifacts/config.opustrainer.yml --log-file /home/ubuntu/tasks/task_170319775225490/artifacts/opustrainer.log --log-level INFO /home/ubuntu/tasks/task_170319775225490/fetches/marian --model /home/ubuntu/tasks/task_170319775225490/artifacts/model.npz -c configs/model/teacher.yml configs/training/teacher.train.yml -T /home/ubuntu/tasks/task_170319775225490/artifacts/tmp --vocabs /home/ubuntu/tasks/task_170319775225490/fetches/vocab.spm /home/ubuntu/tasks/task_170319775225490/fetches/vocab.spm -w 12000 --devices 0 1 2 3 --valid-metrics chrf ce-mean-words bleu-detok --valid-sets /home/ubuntu/tasks/task_170319775225490/fetches/devset.lten.tsv --valid-translation-output /home/ubuntu/tasks/task_170319775225490/artifacts/devset.out --valid-log /home/ubuntu/tasks/task_170319775225490/artifacts/valid.log --log /home/ubuntu/tasks/task_170319775225490/artifacts/train.log --shuffle batches --sentencepiece-alphas 0.1 --no-restore-corpus --valid-reset-stalled --sharding local --sync-sgd --quiet-translation --overwrite --keep-best --tsv --early-stopping 30
[task 2023-12-21T22:38:04.295Z] [2023-12-21 22:38:04] [Trainer] [INFO] Starting stage finetune
[task 2023-12-21T22:38:04.351Z] [2023-12-21 22:38:04] [Trainer] [INFO] Reading original for epoch 0
[task 2023-12-21T22:38:14.626Z] [2023-12-21 22:38:14] [marian] Marian v1.10.25; e8a1a25 2021-12-07 17:47:33 +0000
...
[task 2023-12-22T03:34:12.290Z] [2023-12-22 03:34:12] Ep. 1 : Up. 56000 : Sen. 27,016,256 : Cost 2.87605119 : Time 280.60s : 34035.93 words/s : gNorm 0.7730
[task 2023-12-22T03:35:22.661Z] [2023-12-22 03:35:22] [Trainer] [INFO] Reading original for epoch 1
[task 2023-12-22T03:37:51.993Z] [2023-12-22 03:37:51] Ep. 1 : Up. 57000 : Sen. 27,504,111 : Cost 2.85416627 : Time 219.70s : 43811.03 words/s : gNorm 0.7667
...
[task 2023-12-22T08:07:43.212Z] [2023-12-22 08:07:43] [valid] Ep. 1 : Up. 114000 : bleu-detok : 25.4407 : new best
[task 2023-12-22T08:10:13.998Z] [2023-12-22 08:10:13] [Trainer] [INFO] Reading original for epoch 2
[task 2023-12-22T08:11:22.304Z] [2023-12-22 08:11:22] Ep. 1 : Up. 115000 : Sen. 55,474,171 : Cost 2.51639462 : Time 330.01s : 28892.58 words/s : gNorm 0.5984

Marian config:

[task 2023-12-22T05:15:48.649Z] [2023-12-22 05:15:48] [Trainer] [INFO] Starting stage finetune
[task 2023-12-22T05:15:48.708Z] [2023-12-22 05:15:48] [Trainer] [INFO] Reading original for epoch 0
[task 2023-12-22T05:15:59.625Z] [2023-12-22 05:15:59] [marian] Marian v1.10.25; e8a1a25 2021-12-07 17:47:33 +0000
[task 2023-12-22T05:15:59.625Z] [2023-12-22 05:15:59] [marian] Running on translations-1-b-linux-v100-gpu-4-1tb-k6k7xzrwrgyn-tegzuy7ag as process 2866 with command line:
[task 2023-12-22T05:15:59.625Z] [2023-12-22 05:15:59] [marian] /home/ubuntu/tasks/task_170322157205451/fetches/marian --model /home/ubuntu/tasks/task_170322157205451/artifacts/model.npz -c configs/model/teacher.yml configs/training/teacher.train.yml -T /home/ubuntu/tasks/task_170322157205451/artifacts/tmp --vocabs /home/ubuntu/tasks/task_170322157205451/fetches/vocab.spm /home/ubuntu/tasks/task_170322157205451/fetches/vocab.spm -w 12000 --devices 0 1 2 3 --valid-metrics chrf ce-mean-words bleu-detok --valid-sets /home/ubuntu/tasks/task_170322157205451/fetches/devset.lten.tsv --valid-translation-output /home/ubuntu/tasks/task_170322157205451/artifacts/devset.out --valid-log /home/ubuntu/tasks/task_170322157205451/artifacts/valid.log --log /home/ubuntu/tasks/task_170322157205451/artifacts/train.log --shuffle batches --sentencepiece-alphas 0.1 --no-restore-corpus --valid-reset-stalled --sharding local --sync-sgd --quiet-translation --overwrite --keep-best --tsv --early-stopping 30
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] after: 0e
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] after-batches: 0
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] after-epochs: 0
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] all-caps-every: 0
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] allow-unk: false
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] authors: false
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] beam-size: 8
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] bert-class-symbol: "[CLS]"
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] bert-mask-symbol: "[MASK]"
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] bert-masking-fraction: 0.15
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] bert-sep-symbol: "[SEP]"
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] bert-train-type-embeddings: true
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] bert-type-vocab-size: 2
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] build-info: ""
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] check-gradient-nan: false
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] check-nan: false
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] cite: false
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] clip-norm: 0
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] cost-scaling:
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] cost-type: ce-mean-words
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] cpu-threads: 0
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] data-weighting: ""
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] data-weighting-type: sentence
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] dec-cell: gru
[task 2023-12-22T05:15:59.627Z] [2023-12-22 05:15:59] [config] dec-cell-base-depth: 2
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dec-cell-high-depth: 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dec-depth: 6
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] devices:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 2
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 3
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dim-emb: 1024
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dim-rnn: 1024
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dim-vocabs:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 32000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 32000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] disp-first: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] disp-freq: 1000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] disp-label-counts: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dropout-rnn: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dropout-src: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dropout-trg: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dump-config: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] dynamic-gradient-scaling:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] early-stopping: 30
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] early-stopping-on: first
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] embedding-fix-src: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] embedding-fix-trg: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] embedding-normalization: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] embedding-vectors:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] enc-cell: gru
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] enc-cell-depth: 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] enc-depth: 6
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] enc-type: bidirectional
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] english-title-case-every: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] exponential-smoothing: 0.0001
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] factor-weight: 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] factors-combine: sum
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] factors-dim-emb: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] gradient-checkpointing: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] gradient-norm-average-window: 100
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] guided-alignment: none
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] guided-alignment-cost: mse
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] guided-alignment-weight: 0.1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ignore-model-config: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] input-types:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] interpolate-env-vars: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] keep-best: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] label-smoothing: 0.1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] layer-normalization: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] learn-rate: 0.0003
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lemma-dependency: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lemma-dim-emb: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] log: /home/ubuntu/tasks/task_170322157205451/artifacts/train.log
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] log-level: info
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] log-time-zone: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] logical-epoch:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 1e
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay-freq: 50000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay-inv-sqrt:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 8000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay-repeat-warmup: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay-reset-optimizer: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay-start:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 10
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-decay-strategy: epoch+stalled
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-report: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-warmup: 8000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-warmup-at-reload: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-warmup-cycle: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] lr-warmup-start-rate: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] max-length: 100
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] max-length-crop: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] max-length-factor: 3
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] maxi-batch: 1000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] maxi-batch-sort: trg
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch: 1000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-fit: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-fit-step: 10
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-round-up: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-track-lr: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-warmup: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-words: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] mini-batch-words-ref: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] model: /home/ubuntu/tasks/task_170322157205451/artifacts/model.npz
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] multi-loss-type: sum
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] n-best: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] no-nccl: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] no-reload: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] no-restore-corpus: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] normalize: 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] normalize-gradient: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] num-devices: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] optimizer: adam
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] optimizer-delay: 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] optimizer-params:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 0.9
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 0.998
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 1e-09
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] output-omit-bias: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] overwrite: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] precision:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - float32
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - float32
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] pretrained-model: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] quantize-biases: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] quantize-bits: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] quantize-log-based: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] quantize-optimization-steps: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] quiet: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] quiet-translation: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] relative-paths: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] right-left: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] save-freq: 5000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] seed: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sentencepiece-alphas:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - 0.1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sentencepiece-max-lines: 2000000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sentencepiece-options: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sharding: local
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] shuffle: batches
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] shuffle-in-ram: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sigterm: save-and-exit
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] skip: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sqlite: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sqlite-drop: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sync-freq: 200u
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] sync-sgd: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] tempdir: /home/ubuntu/tasks/task_170322157205451/artifacts/tmp
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] tied-embeddings: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] tied-embeddings-all: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] tied-embeddings-src: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] train-embedder-rank:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] train-sets:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - stdin
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-aan-activation: swish
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-aan-depth: 2
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-aan-nogate: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-decoder-autoreg: self-attention
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-depth-scaling: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-dim-aan: 2048
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-dim-ffn: 4096
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-dropout: 0.1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-dropout-attention: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-dropout-ffn: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-ffn-activation: relu
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-ffn-depth: 2
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-guided-alignment-layer: last
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-heads: 16
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-no-projection: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-pool: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-postprocess: dan
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-postprocess-emb: d
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-postprocess-top: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-preprocess: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-tied-layers:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] transformer-train-position-embeddings: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] tsv: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] tsv-fields: 2
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] type: transformer
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr-dim-emb: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr-dropout: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr-keys-vectors: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr-query-vectors: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr-softmax-temperature: 1
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] ulr-trainable-transformation: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] unlikelihood-loss: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-freq: 3000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-log: /home/ubuntu/tasks/task_170322157205451/artifacts/valid.log
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-max-length: 300
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-metrics:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - chrf
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - ce-mean-words
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - bleu-detok
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-mini-batch: 8
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-reset-stalled: true
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-script-args:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   []
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-script-path: ""
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-sets:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - /home/ubuntu/tasks/task_170322157205451/fetches/devset.lten.tsv
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] valid-translation-output: /home/ubuntu/tasks/task_170322157205451/artifacts/devset.out
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] vocabs:
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - /home/ubuntu/tasks/task_170322157205451/fetches/vocab.spm
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config]   - /home/ubuntu/tasks/task_170322157205451/fetches/vocab.spm
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] word-penalty: 0
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] word-scores: false
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] workspace: 12000
[task 2023-12-22T05:15:59.628Z] [2023-12-22 05:15:59] [config] Model is being created with Marian v1.10.25; e8a1a25 2021-12-07 17:47:33 +0000
[task 2023-12-22T05:15:59.631Z] [2023-12-22 05:15:59] Using synchronous SGD
[task 2023-12-22T05:15:59.631Z] [2023-12-22 05:15:59] [comm] Compiled without MPI support. Running as a single process on translations-1-b-linux-v100-gpu-4-1tb-k6k7xzrwrgyn-tegzuy7ag
[task 2023-12-22T05:15:59.631Z] [2023-12-22 05:15:59] Synced seed 1703222159
[task 2023-12-22T05:15:59.631Z] [2023-12-22 05:15:59] [data] Loading SentencePiece vocabulary from file /home/ubuntu/tasks/task_170322157205451/fetches/vocab.spm
[task 2023-12-22T05:15:59.686Z] [2023-12-22 05:15:59] [data] Setting vocabulary size for input 0 to 32,000
[task 2023-12-22T05:15:59.686Z] [2023-12-22 05:15:59] [data] Loading SentencePiece vocabulary from file /home/ubuntu/tasks/task_170322157205451/fetches/vocab.spm
[task 2023-12-22T05:15:59.735Z] [2023-12-22 05:15:59] [data] Setting vocabulary size for input 1 to 32,000
[task 2023-12-22T05:15:59.736Z] [2023-12-22 05:15:59] [batching] Collecting statistics for batch fitting with step size 10

Related to https://github.com/mozilla/firefox-translations-training/issues/314

eu9ene commented 10 months ago

The issue appears to be caused mostly by incorrect optimizer-delay setting. With having it fixed training looks stable.