rspeer / wordfreq

Access a database of word frequencies, in various natural languages.
Other
1.41k stars 100 forks source link

Adding new language "Basque" #102

Closed Mikelhoya closed 5 months ago

Mikelhoya commented 2 years ago

Hello my name is Mikel, I would like to know if there is any possibility of adding a new language to the library. The Basque language. And if the answer is yes how could i colaborate to make it happen. Thank you

rspeer commented 2 years ago

Last time I updated the input corpora, Basque just missed the cutoff for having enough text for me to consider the frequencies representative. I had left myself a note that if I finished including corpus text from OSCAR, it would enable word frequencies in Basque, as well as Estonian, Albanian, and Galician.

The company I work for now is focused on monolingual English, and I may not have an opportunity to do more multilingual corpus processing any time soon, though I really wish I could.

Mikelhoya commented 2 years ago

Thank you very much, it isn´t urgent for me now. You are doing a good job. Thank you

rspeer commented 5 months ago

Closing because the wordfreq data is unlikely to be updated in any language.