Open bittlingmayer opened 5 years ago
Makes sense. PR welcome. Should be an options-object with an option called maximumDistance
.
but probably max is the most useful, as it would require not extra handling for sorting or filtering.)
👍
Feel free to take inspiration from this code: https://github.com/Yomguithereal/talisman/blob/master/src/metrics/distance/levenshtein.js#L230-L340
This is not the fastest Levenshtein implementation for NodeJS. In fact it is not even the second fastest. The fastest is: https://github.com/ka-weihe/node-levenshtein
If anyone wants to work on this, see https://github.com/sindresorhus/leven/pull/15 for the previous attempt and the feedback there.
This is the most performant Levenshtein implementation for NodeJS of which I know, and I have a though on how to make it faster for many applications.
When testing many string pairs by similarity (for example, when sorting, or filtering strings above or below a threshold), often we are happy to short-circuit on any pair with a distance greater than
max
.At scale, given that the least similar strings are also very expensive to compute, there is a big potential savings. (For one, just comparing length is enough to discard many candidates.)
(For pairs with a distance above
max
, the distance returned can benull
,max
ormax + 1
- I don't have a strong opinion on which is best, but probablymax
is the most useful, as it would require not extra handling for sorting or filtering.)