mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Use our localization data for training #882

Open marco-c opened 2 weeks ago

marco-c commented 2 weeks ago

https://github.com/mozilla-l10n/mt-training-data

Maybe we could add it to OPUS.

marco-c commented 2 weeks ago

It looks like it is already in OPUS: https://github.com/Helsinki-NLP/OPUS-ingest/tree/master/corpus/Mozilla-I10n. Though it seems to be a very old version, from 2021.

ZJaume commented 1 week ago

Localization data from software like I think it can really help with translation of short sentences, specially when #888 is fixed :sweat_smile:

EDIT: although some language pairs may need a little bit of cleaning in these corpora, Ubuntu and OpenOffice corpora can be useful helping firefox translations models with the webpage menus.