rspeer / wordfreq

Access a database of word frequencies, in various natural languages.
Other
965 stars 68 forks source link

lemmatization? #113

Closed doctorcolossus closed 2 months ago

doctorcolossus commented 3 months ago

Nice work on this project, it's amazing!

However, what about lemmatization? There seems to be no information about parts of speech, unless I'm missing it. Are words counted without considering this? If so, then words like English "record" [verb] would be counted together with "record" [noun], and words with many different inflected forms would have each individual form counted separately, giving no overview of how popular the lemma itself is. Is any functionality built in or planned to take lemmata into consideration?

rspeer commented 2 months ago

No, no such functionality is planned.