But I found another use case when we don't want ["one-stage" teacher training when using a pre-trained backtranslations model]. If the amount of mono-trg data is too small (for example for en-lt) we still want to use two-stage. We don't want to loop over 5M back-translated sentences.
From: https://github.com/mozilla/firefox-translations-training/pull/620#discussion_r1612332379