mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Support CJK in find_corpus and config generator #740

Open eu9ene opened 3 months ago

eu9ene commented 3 months ago

Chinese corpora are not tagged with zh, but with zh_{tw,zh,hk...} etc. it would be helpful if find_corpus.py checks for those when checking for zh.