mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Report empty alignments separately #571

Closed eu9ene closed 5 months ago

eu9ene commented 5 months ago

Swith OpusTrainer logging level to ERROR and separately report the number of empty alignments.

It's expected that some alignments are empty because eflomal can't align wrong translations. Also, there are some warnings about out-of-bound alignments but very few. Likely due to the slight difference in tokenization in OpusTrainer. See #507

Eventually, we'll rewrite this script in Python, filter empty alignments from TSV and enable logging back, but it would be a heavy refactoring at this point which we don't want before we do #524.

fixes #570

[skip ci]