I've been investigating an issue with training for a while and would appreciate if someone from OpusTrainer or Marian developers looked at what can be wrong. Basically my training/validation charts for teacher models look like this:
What's interesting is that 56k and 114k updates where the proper training starts coincide with OpusTrainer starting a new epoch.
It almost looks like OpusTrainer feeds data of worse quality for an epoch or two and then starts feeding proper data. I tried to save the produced data separately to look at it and didn't notice anything bad.
Another hypothesis is that it might be somehow related to Marian settings like learning rate warmup because I don't see this behavior for backward s2s and student models that have a slightly different configuration.
I initially thought that it's related to noise in back-translations but then reduced training to the original parallel corpus only and it still looks the same. My OpusTrainer config for this run:
datasets:
original: <dataset0> # Original parallel corpus
backtranslated: <dataset1> # Back-translated data
stages:
- finetune
# Fine-tuning only on original clean corpus until the early stopping
finetune:
- original 1.0
- until original inf
modifiers:
- UpperCase: 0.07 # Apply randomly to 10% of sentences
- TitleCase: 0.05
seed: 1111
num_fields: 2
I've been investigating an issue with training for a while and would appreciate if someone from OpusTrainer or Marian developers looked at what can be wrong. Basically my training/validation charts for teacher models look like this:
What's interesting is that 56k and 114k updates where the proper training starts coincide with OpusTrainer starting a new epoch.
It almost looks like OpusTrainer feeds data of worse quality for an epoch or two and then starts feeding proper data. I tried to save the produced data separately to look at it and didn't notice anything bad.
Another hypothesis is that it might be somehow related to Marian settings like learning rate warmup because I don't see this behavior for backward s2s and student models that have a slightly different configuration.
I initially thought that it's related to noise in back-translations but then reduced training to the original parallel corpus only and it still looks the same. My OpusTrainer config for this run:
Full training log for teacher 1 Full training log for teacher 2 Parts of the update log for teacher 1:
Marian config:
Related to https://github.com/mozilla/firefox-translations-training/issues/314