Closed eu9ene closed 5 months ago
Bicleaner now requires https://github.com/MSeal/cython_hunspell/tree/2.0.3 which builds hunspell which needs some system libs installed. Since this is not a docker worker I can't add them in this PR. I don't think there's a way to use our prebuilt packages. Also it seems cyhunspell wants version 1.7 and we fetch 0.5.5.
I was able to successfully install and use it on a Snakepit machine under conda env.
Bicleaner folks also recommend switching to the multilingual model even for the languages that have specialized models. We can run an experiment side by side but I'm hesitant to do it before the next big training.
Bicleaner folks also recommend switching to the multilingual model even for the languages that have specialized models. We can run an experiment side by side but I'm hesitant to do it before the next big training.
Where did you see that? Did they share numbers?
Bicleaner folks also recommend switching to the multilingual model even for the languages that have specialized models. We can run an experiment side by side but I'm hesitant to do it before the next big training.
Where did you see that? Did they share numbers?
https://github.com/bitextor/bicleaner-ai/issues/31#issuecomment-2118343598
bicleaner2 full multilingual model: Troughput: 271 rows/s bicleaner3 full-large multilingual model: Troughput: 80 rows/s
I would still try the large model as we use those multilingual models for non-European languages where we have less data.
https://firefox-ci-tc.services.mozilla.com/tasks/Kjhc2ehkTi-06k-nr8lwrg/runs/0/logs/public/logs/live.log
fixes #528