Open bszollosinagy opened 1 year ago
$ grep ascetic frequency-alpha-alldicts.txt
18614 ascetic 2,875,469 0.000199% 97.305329%
25054 asceticism 1,605,339 0.000111% 98.265396%
63318 ascetical 153,464 0.000011% 99.760632%
104505 ascetically 24,997 0.000002% 99.955170%
It would be nice to be able to merge different forms of the same root together, as a dictionary does, but that information is not included in the Google corpus.
Do you know of any database I could use for such merging? I'm not going to write an automatic algorithm for it as it'd end up merging "cop" with "copy" and "copious".
The word "ascetic" exists more than once in the file: once at rank 18614, then at rank 25054, and also ranks 63318 and 104505.
The word "copious" and "verdant" are also duplicated for some reason.
Can the counts be simply summed across all occurrences?