Training pipelines for Firefox Translations machine translation models.
The trained models are hosted in firefox-translations-models repository, compatible with bergamot-translator and power the Firefox web page translation starting with version 118.
The pipeline was originally developed as a part of Bergamot project that focuses on improving client-side machine translation in a web browser.
The pipeline is capable of training a translation model for a language pair end to end. Translation quality depends on the chosen datasets, data cleaning procedures and hyperparameters. Some settings, especially low resource languages might require extra tuning.
We use fast translation engine Marian.
You can find more details about the pipeline steps in the documentation.
An orchestrator is responsible for workflow management and parallelization.
Public training dashboard in Weights & Biases
Marian training metrics are parsed from logs and published using a custom module within the tracking
directory.
More information is available here.
High level overview post on Mozilla Hacks
Model training guide - practical advice on how to use the pipeline
This project uses materials developed by: