mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Upgrade to Bicleaner 3 #604

Closed eu9ene closed 5 months ago

eu9ene commented 5 months ago

bicleaner2 full multilingual model: Troughput: 271 rows/s bicleaner3 full-large multilingual model: Troughput: 80 rows/s

I would still try the large model as we use those multilingual models for non-European languages where we have less data.

https://firefox-ci-tc.services.mozilla.com/tasks/Kjhc2ehkTi-06k-nr8lwrg/runs/0/logs/public/logs/live.log

fixes #528

eu9ene commented 5 months ago

Bicleaner now requires https://github.com/MSeal/cython_hunspell/tree/2.0.3 which builds hunspell which needs some system libs installed. Since this is not a docker worker I can't add them in this PR. I don't think there's a way to use our prebuilt packages. Also it seems cyhunspell wants version 1.7 and we fetch 0.5.5.

I was able to successfully install and use it on a Snakepit machine under conda env.

eu9ene commented 5 months ago

Bicleaner folks also recommend switching to the multilingual model even for the languages that have specialized models. We can run an experiment side by side but I'm hesitant to do it before the next big training.

marco-c commented 5 months ago

Bicleaner folks also recommend switching to the multilingual model even for the languages that have specialized models. We can run an experiment side by side but I'm hesitant to do it before the next big training.

Where did you see that? Did they share numbers?

eu9ene commented 5 months ago

Bicleaner folks also recommend switching to the multilingual model even for the languages that have specialized models. We can run an experiment side by side but I'm hesitant to do it before the next big training.

Where did you see that? Did they share numbers?

https://github.com/bitextor/bicleaner-ai/issues/31#issuecomment-2118343598