Closed rolftimmermans closed 2 years ago
Thanks a lot @rolftimmermans , great to keep the design document in sync.
Regarding the optimization, well done 👍 I am curious, did you come up with it? If not, what was the source? Asking because the topic is very interesting for me :)
Ah, no, I definitely did not invent this trick... 🙂 It's listed in the description of the algorithm on Wikipedia:
If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix.
I only implemented the computation savings and not the space savings though, because the matrix is not that big and the computational cost of the additional bookkeeping seems not worthwhile (but I did not try).
Also, numbers:
Fuzzy search:
=============
* SearchableMap#fuzzyGet("virtute", 1) x 43,451 ops/sec ±0.26% (98 runs sampled)
* SearchableMap#fuzzyGet("virtu", 2) x 5,828 ops/sec ±0.21% (98 runs sampled)
* SearchableMap#fuzzyGet("virtu", 3) x 1,605 ops/sec ±0.64% (98 runs sampled)
* SearchableMap#fuzzyGet("virtute", 4) x 670 ops/sec ±0.35% (95 runs sampled)
Fuzzy search:
=============
* SearchableMap#fuzzyGet("virtute", 1) x 66,266 ops/sec ±0.44% (97 runs sampled)
* SearchableMap#fuzzyGet("virtu", 2) x 7,144 ops/sec ±0.18% (99 runs sampled)
* SearchableMap#fuzzyGet("virtu", 3) x 1,900 ops/sec ±0.19% (99 runs sampled)
* SearchableMap#fuzzyGet("virtute", 4) x 774 ops/sec ±0.30% (97 runs sampled)
Nice! I read the Wikipedia article multiple times, and somehow managed to miss this :)
It's definitely ok to have the whole matrix in memory, space is not the main concern in this case, as the matrix is normally small.
I updated the design doc to incorporate the changes from #125. While researching the algorithm I also encountered an optimisation (only calculate a diagonal band of size
2 * edit distance + 1
) that yields another modest performance improvement.