rspeer / wordfreq

Access a database of word frequencies, in various natural languages.
Other
1.4k stars 101 forks source link

Argument to specify frequency source #61

Open glupyan opened 6 years ago

glupyan commented 6 years ago

Would be wonderful to be able to get frequencies for the different sources!

rspeer commented 6 years ago

This would be nice, but one thing that makes wordfreq particularly useful and accessible is that it fits in a PyPI package. There isn't room to store 7 different frequencies per word, so I don't think this can be solved within the wordfreq package.

https://github.com/LuminosoInsight/exquisite-corpus contains the process that processes the data for each corpus, though in one case it relies on a data set that can't be distributed (Twitter). I would like to think about how to make the complete results of this build more accessible, even if it can't appear in wordfreq.