medical-spell-checker-dictionary / medical-spell-checker-dictionary.github.io

The website for the medical spell checker dictionary.
GNU General Public License v3.0
9 stars 1 forks source link

Windows Installer - Bad choice of words for dictionary #4

Open divinenephron opened 8 years ago

divinenephron commented 8 years ago

The installer for the Windows system dictionary can only install 44502 words. Right now it chooses these words by just picking the first 44502 words in the source dictionary. Which is a little silly because then the dictionary only contains the medical words starting with A through H! A better method should be used.

divinenephron commented 8 years ago

The ideal solution is to choose the 44502 most frequently used words. But we'd need to analyse some medical texts to determine these. The dictionary doesn't currently have this data.

divinenephron commented 8 years ago

A better heuristic would be:

To choose words based on this heuristic you could sort the dictionary by word length, ignore the words three letters long or less, and pick the shortest 44502 words after that.