m31coding / fuzzy-search

A fast, accurate and multilingual fuzzy search library for the frontend.
MIT License
694 stars 12 forks source link

Exact matches are ranked too low #6

Open leeoniya opened 8 months ago

leeoniya commented 8 months ago

hey @m31coding!

great to see someone else working on a better fuzzy search :) i found this project on HN.

i've added it to the uFuzzy demo/bench in https://github.com/leeoniya/uFuzzy/commit/5ebe1ba396ef9318680ce481b8201f2088630220.

indexing the 4MB haystack takes ~1400ms. after that, giving it a quick try with the term "twilight", the results are missing or mis-ranking many exact matches? or perhaps i'm just holding it wrong :sweat_smile: ?

cheers! :beers:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzy-search&search=twilight

image
m31coding commented 8 months ago

Hi @leeoniya, thank you very much for your input and for adding this project to the demo / benchmark. Your work is really nice! I have added a comment to your commit with minor suggestions.

It looks like we have chosen very different implementations. The top 10 results are very similar, except that the term twilit shows up on rank 8 on the right hand side. The other matches shown on the left hand side are indeed ranked very low in my implementation. E.g. the query twilight matches Twilight Phenomena: The Lodgers of House 13 Collector's Edition only with a quality of 0.13.

The reason for this is that the quality increases linearly with the number of common n-grams and the word twilight is only a small part of the full term. The best use case for my library are terms that have 1-3 words, e.g. names of places or persons. For terms with more words your library is clearly doing better for exact matches!

I will think about how to improve the library in this respect. Thank you again for your input!