mozilla / firefox-translations-models

CPU-optimized Neural Machine Translation models for Firefox Translations
Mozilla Public License 2.0
173 stars 17 forks source link

Recalculate all COMET scores that were generated with Comet 1.1.3 #104

Closed eu9ene closed 1 year ago

eu9ene commented 1 year ago

For some translators, we can see the opposite results for BLEU and COMET when compared to Bergamot. This is mostly relevant for open-source translators.

For example, for cs-en we can see:

BLEU argos -33% nllb -27% COMET argos +47% nllb +51%

There might be a bug somewerhe.

BLEU prod:

Screenshot 2023-11-07 at 9 21 22 AM

COMET prod:

Screenshot 2023-11-07 at 9 21 11 AM
eu9ene commented 1 year ago

We recently updated COMET to 2.1.1. from 1.1.3. It uses different models. We should recalculate COMET scores for all languages for Bergamot, Microsoft and Google. I suggest implementing #91 first to make sure we don't translate everything every time we change metrics.

marco-c commented 1 year ago

I opened #107 to test the hypotesis that the score change is due to the update of Comet.

marco-c commented 1 year ago

Yes, we can see that the recalculated scores are indeed much higher than before.