Open eu9ene opened 3 months ago
Could also be a data cleaning problem, like num_mismatch
.
I would push back against implementing option 1, which would happen on the Gecko side for every translation. That regex seems risky and error prone to write. I would at least start with data augmentation. There is https://github.com/hplt-project/OpusTrainer/issues/43 already on file.
I'm seeing this happen often with Svenska-to-English translations.
Sometimes URLs are written in text rather than hidden behind the HTML element. The URL should be copied as is in this case.
There are two ways to fix this: