Open ZJaume opened 2 days ago
Great catch! We remove it in the find-corpus:
but it seems it got lost with all the refactorings and migration to the config generator... cc @gregtatum
The outcome is quite sad as all our WMT based evaluation benchmarks for this cohort of languages are not correct. Flores should be fine
@marco-c FYI too
ok, maybe not all the results are incorrect but only the ones before 2019
The WMTNews corpus at OPUS is just a compilation of the WMT test sets, so it must not be included as training
https://github.com/mozilla/firefox-translations-training/blob/1f7ab70cd4dbb64e16bb6b38840490c2f2259cb0/configs/autogenerated/en-tr-spring-2024.yml#L79-L79
https://github.com/mozilla/firefox-translations-training/blob/1f7ab70cd4dbb64e16bb6b38840490c2f2259cb0/configs/autogenerated/en-ro-spring-2024.yml#L123