mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Check training for CJK #747

Open eu9ene opened 3 months ago

eu9ene commented 3 months ago

Does it require any adjustment? Should we change any hyperparameters etc.?

eu9ene commented 1 month ago

Example training log from HPLT https://github.com/hplt-project/HPLT-MT-Models/blob/main/v1.0/training/en-zh_hant/hplt/model.train.log

eu9ene commented 1 month ago

I changed the vocabulary size to 64000 based on the HPLT config. I don't yet understand the implications.

ZJaume commented 1 week ago

Chinese should be trained in two different models, one for simplified and one for traditional, as both scripts might be too large vocabulary to fit in 32k pieces.