hiddentao / fast-levenshtein

Efficient Javascript implementation of Levenshtein algorithm with locale-specific collator support.
MIT License
592 stars 56 forks source link

split-up loop to only run collator check once #22

Closed nol13 closed 7 years ago

nol13 commented 7 years ago

This is the optimization I mentioned in my comment. It checks useCollator outside the nested for loops and branches based on result so that it can use the correct comparison inside without checking again. I also made a branch that has these changes + the edit to allow passing in the different costs, which seems to take ~150 ops/s, so if you think it's worthwhile can pr that branch too..

Below are my benchmarks for each.

Maybe a way to avoid the performance hit without duplicating code? Nothing else I tried seemed to work though.

me@mypc:~/fast-levenshtein/fast-levenshtein$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
me@mypc:~/fast-levenshtein/fast-levenshtein$ grunt benchmarkConfig
Running "benchmarkConfig:speed" (benchmarkConfig) task

Running suite Implementation comparison [benchmark/speed.js]...
>> levenshtein-edit-distance x 2,254 ops/sec ±0.11% (102 runs sampled)
>> levenshtein-component x 349 ops/sec ±0.20% (96 runs sampled)
>> levenshtein-deltas x 485 ops/sec ±0.09% (97 runs sampled)
>> natural x 382 ops/sec ±0.09% (95 runs sampled)
>> levenshtein x 218 ops/sec ±0.30% (88 runs sampled)
>> fast-levenshtein x 1,598 ops/sec ±0.12% (102 runs sampled)
Benchmark done.
Fastest test is levenshtein-edit-distance at 1.41x faster than fast-levenshtein

Done, without errors.
me@mypc:~/fast-levenshtein/fast-levenshtein$ git checkout collator-check 
Switched to branch 'collator-check'
me@mypc:~/fast-levenshtein/fast-levenshtein$ grunt benchmarkConfig
Running "benchmarkConfig:speed" (benchmarkConfig) task

Running suite Implementation comparison [benchmark/speed.js]...
>> levenshtein-edit-distance x 2,263 ops/sec ±0.13% (100 runs sampled)
>> levenshtein-component x 349 ops/sec ±0.17% (96 runs sampled)
>> levenshtein-deltas x 497 ops/sec ±0.34% (96 runs sampled)
>> natural x 380 ops/sec ±0.29% (94 runs sampled)
>> levenshtein x 221 ops/sec ±0.46% (89 runs sampled)
>> fast-levenshtein x 1,981 ops/sec ±0.15% (100 runs sampled)
Benchmark done.
Fastest test is levenshtein-edit-distance at 1.14x faster than fast-levenshtein

Done, without errors.
me@mypc:~/fast-levenshtein/fast-levenshtein$ git checkout cost-options 
Switched to branch 'cost-options'
me@mypc:~/fast-levenshtein/fast-levenshtein$ grunt benchmarkConfig
Running "benchmarkConfig:speed" (benchmarkConfig) task

Running suite Implementation comparison [benchmark/speed.js]...
>> levenshtein-edit-distance x 2,262 ops/sec ±0.12% (100 runs sampled)
>> levenshtein-component x 349 ops/sec ±0.17% (96 runs sampled)
>> levenshtein-deltas x 502 ops/sec ±0.09% (97 runs sampled)
>> natural x 379 ops/sec ±0.08% (94 runs sampled)
>> levenshtein x 221 ops/sec ±0.31% (89 runs sampled)
>> fast-levenshtein x 1,848 ops/sec ±0.09% (101 runs sampled)
Benchmark done.
Fastest test is levenshtein-edit-distance at 1.22x faster than fast-levenshtein

Done, without errors.
hiddentao commented 7 years ago

Thanks!