Open glupyan opened 6 years ago
This would be nice, but one thing that makes wordfreq particularly useful and accessible is that it fits in a PyPI package. There isn't room to store 7 different frequencies per word, so I don't think this can be solved within the wordfreq package.
https://github.com/LuminosoInsight/exquisite-corpus contains the process that processes the data for each corpus, though in one case it relies on a data set that can't be distributed (Twitter). I would like to think about how to make the complete results of this build more accessible, even if it can't appear in wordfreq.
Would be wonderful to be able to get frequencies for the different sources!