I think a great feature would be to let the have an optional argument that changes the string metric. For example, keyboard-based distances can improve performance and accuracy. A standard QWERTY keyboard has layout:
` 1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /
A substitution adds a distance of 1 for every step we have to take on a keyboard. Keys on a keyboard are typically spaced along a nearly-hexagonal grid with 6 roughly equidistant neighbors, e.g. "g" has:
t y
f g h
v b
All of these keys can be considered neighbors, and substituting g with one of these keys increases the edit distance by 1. For an edit distance of 2, we can consider keys that are neighbors of neighbors, and so on.
This both cuts back on the number of neighbors within a given edit distance, speeding up performance, and should improve accuracy when working with real-world data typed on a keyboard.
I think a great feature would be to let the have an optional argument that changes the string metric. For example, keyboard-based distances can improve performance and accuracy. A standard QWERTY keyboard has layout:
A substitution adds a distance of 1 for every step we have to take on a keyboard. Keys on a keyboard are typically spaced along a nearly-hexagonal grid with 6 roughly equidistant neighbors, e.g. "g" has:
All of these keys can be considered neighbors, and substituting
g
with one of these keys increases the edit distance by 1. For an edit distance of 2, we can consider keys that are neighbors of neighbors, and so on.This both cuts back on the number of neighbors within a given edit distance, speeding up performance, and should improve accuracy when working with real-world data typed on a keyboard.