mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

[meta] Train low resource languages #803

Open gregtatum opened 2 months ago

gregtatum commented 2 months ago

After #524 and #425 we can start focusing on lower resource languages. When data is small we'll probably need to focus on different techniques. For instance data synthesizing will be important, with both back translations and enough monolingual data to distill.

We'll probably need to experiment with multilingual models. See #684. We will have to probably re-evaluate our release criteria. Most likely other translation systems will also have lower evaluation scores, and we can compare with our results.