wolfgarbe / SymSpell

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
https://seekstorm.com/blog/1000x-spelling-correction/
MIT License
3.15k stars 298 forks source link

Alternate string metrics #131

Open ParadaCarleton opened 2 years ago

ParadaCarleton commented 2 years ago

I think a great feature would be to let the have an optional argument that changes the string metric. For example, keyboard-based distances can improve performance and accuracy. A standard QWERTY keyboard has layout:

` 1 2 3 4 5 6 7 8 9 0 - = 
  q w e r t y u i o p [ ] \
   a s d f g h j k l ; ' 
    z x c v b n m , . /

A substitution adds a distance of 1 for every step we have to take on a keyboard. Keys on a keyboard are typically spaced along a nearly-hexagonal grid with 6 roughly equidistant neighbors, e.g. "g" has:

 t y
f g h
 v b

All of these keys can be considered neighbors, and substituting g with one of these keys increases the edit distance by 1. For an edit distance of 2, we can consider keys that are neighbors of neighbors, and so on.

This both cuts back on the number of neighbors within a given edit distance, speeding up performance, and should improve accuracy when working with real-world data typed on a keyboard.