mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Figure out sacrebleu `/` dataset strategy #634

Open gregtatum opened 5 months ago

gregtatum commented 5 months ago
  - sacrebleu_aug-mix_wmt18
  - sacrebleu_aug-mix_wmt17
  - sacrebleu_aug-mix_wmt15
  - sacrebleu_aug-mix_wmt14/full

@eu9ene wrote:

I would not include those / ones unless there's only flores available. I did it for en-lt for example:

  devtest:
  - flores_aug-mix_dev
  - sacrebleu_aug-mix_wmt19/dev
  - mtdata_aug-mix_Neulab-tedtalks_dev-1-eng-lit

I guess we can either comment all such datasets or just leave it up to the user to remove them.