There is an issue with pre-training on a mix of back-translated data and original parallel corpus and then fine-tuning on the original corpus only. The model does not continue training and stop too early. We can experiment with only training on the mix without fine-tuning.
Results
This approach looks working and the model trains longer, so it was implemented in the pipeline.
Related to #472
Hypothesis
There is an issue with pre-training on a mix of back-translated data and original parallel corpus and then fine-tuning on the original corpus only. The model does not continue training and stop too early. We can experiment with only training on the mix without fine-tuning.
Results
This approach looks working and the model trains longer, so it was implemented in the pipeline.