google / cld3

Apache License 2.0
793 stars 111 forks source link

Increase the number of supported languages #50

Open AlAntonov opened 3 years ago

AlAntonov commented 3 years ago

Hi! Do you have any plans to increase the number of supported languages up to 200-300? The languages like: Chuvash (chv), Mari (mhr), Hill Mari (mrj), Komi (kpv), which have presence in the web, are not included here. And hence are not in multilingual C4 dataset.