Open j0ma opened 2 years ago
Multilingual corpora tend to get large, and higher-resourced languages can overpower lower-resourced ones.
To get around this, need functionality to subsample corpora. Should be easy to specify in a YAML config.
May not be needed if using fairseq's translation_multi_simple_epoch task
translation_multi_simple_epoch
Multilingual corpora tend to get large, and higher-resourced languages can overpower lower-resourced ones.
To get around this, need functionality to subsample corpora. Should be easy to specify in a YAML config.