mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Create a Python package to use translation models #802

Open eu9ene opened 2 months ago

eu9ene commented 2 months ago

The Package should be able to either download or use already downloaded translation models and run translation efficiently. It means it should be able to utilize multiple CPUs.

The main challenge is compiling Marian. It should be possible to use prebuilt wheels, so that you don’t need to compile it locally as many Python scientific libraries do.

Another question to explore is whether we should include bergamot-tranlsator at all or just run regular Marian or the Bergamot fork of Marian.

Some prior art: https://github.com/mozilla/translation-service https://github.com/browsermt/bergamot-translator/tree/main/bindings/python

marco-c commented 2 months ago

I wonder if we can somehow upload the models on HuggingFace too. I guess we'd need to write a converter for the models like the one for opus-mt models.

eu9ene commented 2 months ago

Yeah, I also thought about this. Filed https://github.com/mozilla/firefox-translations-training/issues/804

eu9ene commented 2 months ago

If we convert to Pytorch and HF, it will be runnable with HF transformers but it will be slower than Marian.