paultela / enterprener

Because you don't know how to spell it either.
2 stars 0 forks source link

Be able to set threshold for each word #9

Open paultela opened 11 years ago

paultela commented 11 years ago

Right now, we set the threshold once and forget about it. This works for misspellings, but in cases where we want to replace exact words it poses a problem. Being able to set a threshold for each word in the dictionary would sove this problem and also allow us more control over what constitutes a misspelling for a given word. I encountered this problem when I tried to type "loud" and it was auto corrected to "Internet" when I hit submit.

btmills commented 11 years ago

Another option would be to hook up to a dictionary API (I'm sure there are only a few hundred of those) and only correct words that aren't already in the dictionary.

Ex: loud, clod (both real words) unchanged; cloud (match string), clud (not a word) -> Internet

paultela commented 11 years ago

Dictionary API sounds like overkill to me, that would be a lot of extra requests. Setting the threshold to 0.0 for certain words would solve the problem without too much extra overhead. I wish we had access to Chrome's dictionary API

paultela commented 11 years ago

Okay working today I realized we really need something like this, but instead of an API why not just use something like http://wordlist.sourceforge.net/ and check a local copy of words? Avoids the overhead of a network request and would be much faster.

btmills commented 11 years ago

Right, we definitely need some sort of dictionary. We have to be able to tell the difference between an actual misspelling (a correct positive) and a real word that's similarly spelled (false positive), without making it so restrictive that it misses words (false negatives). The only way I see that happening is a dictionary.

I had been contemplating some sort of caching because we really only need the words related to the ones in the settings, not an entire dictionary, and we'd only need to request it once. So maybe the local dictionary is overkill? Or maybe it's necessary. I haven't decided yet.