Closed nol13 closed 7 years ago
Yes, a PR would be welcome, though this would impact performance. So I'm wondering if there's a way to do this that doesn't impact performance too much.
Ok, let me experiment a bit more and I'll try to get something ready. Possibly able to edit deletion and insertion cost too if no performance impact? (pretty sure I've seen elsewhere an option to set those mentioned at least, though i have no use for them).
There is one easy optimization I can add however that got me about 30% boost, where instead of checking for useCollator inside the for loop, have two separate loops that are run based on if useCollator is true or not. Was the only way I found to be able to check that without any performance impact.
In my current benchmarks it seems to be insignificantly faster when I set the subcost, not sure I fully understand why. Possibly my benchmarks are unscientific..
Basically what i added is
var subcost = 1;
if (options.subcost && typeof options.subcost === "number") subcost = options.subcost;
// and on line 57
nextCol = prevRow[j] + (strCmp ? 0 : subcost);
Currently I'm using a mashup of fast-levenshtein and leven, basically bolting on the collator code from fast-levenshtein. Once the collator check was taken out of the for loop the comparison was at least a lot closer though.
Hi, using this for my fuzzywuzzy port, (https://github.com/nol13/fuzzball.js) but had to make a small change to it for it to match the behavior of python-Levenshtein. Namely passing in an option that would modify the substitution cost to 2 for ratio calculations.
Would this be something I should put in a pull request for, and then I could just use this as a dependency?