mozilla / firefox-translations-training

Training pipelines for Firefox Translations neural machine translation models
https://mozilla.github.io/firefox-translations-training/
Mozilla Public License 2.0
143 stars 31 forks source link

English to Lithuanian did not meet our quality bar #756

Open gregtatum opened 1 month ago

gregtatum commented 1 month ago

Teacher ensemble was 0.8971 COMET (-1.07%), but then quantized student was 0.8598 (-5.45%). I asked online for qualitative feedback on how our models are performing and received feedback that it wasn't very good.

Word inventions:

"someone moves abroad" -> "kažkas persiutų į užsienį". "prose" -> "prosą" (instead of "prozą") "drizzly rain" -> "dreifuoja liūsnios" (there is no such a word "liūsnios". And "dreifuoja" is "drifting"). "someone moves abroad" -> "kažkas persiutų į užsienį" ("persiutų" is made up)

Then an issue with gender:

also, an interesting case in gender swap. From "Icelandic First Lady Eliza Reid" -> "Islandijos pirmoji ponia Eliza Reid".

gregtatum commented 1 month ago

Nonsense translation:

US Election Unspun -> JAV rinkimai „Uns“ US Election Unspun -> JAV rinkimai „Unsup“

This may be the result of passthrough translations, or maybe just short translations.

gregtatum commented 1 month ago

There's a whole bunch more of them here:

https://elk.zone/fosstodon.org/@KasTasMykolas@river.group.lt/112812259518762235

eu9ene commented 1 month ago

@gregtatum I believe it's kind of a duplicate of #231. The root issue is that the distilled model has low scores and everything else follows. It doesn't mean all the issues will be fixed though.

gregtatum commented 1 month ago

This issue can serve as a place for specific Lithuanian feedback, and outside contributors could add examples here as well if they wish.