Update design doc and reduce fuzzy search computations a bit

lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node

https://lucaong.github.io/minisearch/

MIT License

4.81k stars 137 forks source link

Update design doc and reduce fuzzy search computations a bit #133

Closed rolftimmermans closed 2 years ago

rolftimmermans commented 2 years ago

I updated the design doc to incorporate the changes from #125. While researching the algorithm I also encountered an optimisation (only calculate a diagonal band of size 2 * edit distance + 1) that yields another modest performance improvement.

lucaong commented 2 years ago

Thanks a lot @rolftimmermans , great to keep the design document in sync.

Regarding the optimization, well done 👍 I am curious, did you come up with it? If not, what was the source? Asking because the topic is very interesting for me :)

rolftimmermans commented 2 years ago

Ah, no, I definitely did not invent this trick... 🙂 It's listed in the description of the algorithm on Wikipedia:

If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix.

I only implemented the computation savings and not the space savings though, because the matrix is not that big and the computational cost of the additional bookkeeping seems not worthwhile (but I did not try).

rolftimmermans commented 2 years ago

Also, numbers:

Before

Fuzzy search:
=============
  * SearchableMap#fuzzyGet("virtute", 1) x 43,451 ops/sec ±0.26% (98 runs sampled)
  * SearchableMap#fuzzyGet("virtu", 2) x 5,828 ops/sec ±0.21% (98 runs sampled)
  * SearchableMap#fuzzyGet("virtu", 3) x 1,605 ops/sec ±0.64% (98 runs sampled)
  * SearchableMap#fuzzyGet("virtute", 4) x 670 ops/sec ±0.35% (95 runs sampled)

After

Fuzzy search:
=============
  * SearchableMap#fuzzyGet("virtute", 1) x 66,266 ops/sec ±0.44% (97 runs sampled)
  * SearchableMap#fuzzyGet("virtu", 2) x 7,144 ops/sec ±0.18% (99 runs sampled)
  * SearchableMap#fuzzyGet("virtu", 3) x 1,900 ops/sec ±0.19% (99 runs sampled)
  * SearchableMap#fuzzyGet("virtute", 4) x 774 ops/sec ±0.30% (97 runs sampled)

lucaong commented 2 years ago

Nice! I read the Wikipedia article multiple times, and somehow managed to miss this :)

It's definitely ok to have the whole matrix in memory, space is not the main concern in this case, as the matrix is normally small.