leeoniya / uFuzzy

A tiny, efficient fuzzy search that doesn't suck
MIT License
2.63k stars 47 forks source link

How to tweak handling of spaces? #5

Closed tomas closed 2 years ago

tomas commented 2 years ago

Hi, your library looks awesome! Kudos to you.

I was trying the demo, but I couldn't find a way to tell uFuzzy to include results that contain spaces in between the search query. I want to get results like "Super Markup Man" or "Super Mario" when querying for "superma" (just remove the space between super and ma in the example query).

image

Is this possible?

leeoniya commented 2 years ago

thanks!

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=superma&intraIns=inf&intraChars=[a-z\d%20]

depending on your corpus, this may or may not be great. effectively it treats the whole thing as one big term with a bunch of junk in between. you'll also lose the benefit boosting multiple terms that fall at expected bounds (whitespace, punct, etc). you can do a bit better by swapping out the sorting function for one that prioritizes other aspects of matches to maybe get better ordering than you get in the demo.

there is currently no counter in the info object for how many matched chars landed on word boundaries (only full terms). i'll see if this can be added without too much overhead, which should help with what a custom sorting function can consider.

but you also get stuff like this, which i don't think is avoidable with the settings necessary for the behavior you want.

image

for cases where you just want to mash keys without much thought, uFuzzy is probably not going to satisfy. fuzzysort, QuickScore and others do better in this case.

leeoniya commented 2 years ago

if you don't allow intraIns to be arbitrarily large, you get much better results:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=superma&intraIns=1&intraChars=%5Ba-z%5Cd%20%5D

image

tomas commented 2 years ago

I see. intraIns: 1 plus adding a space to the regex did the trick. Thanks for the quick response!

arctica commented 1 year ago

@leeoniya It seems like in intraMode=1 the tollerance to spaces does not work. Searching for supermariobros does not find anything when SingleError mode is selected.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=supermariobros&intraIns=1&intraMode=1&intraChars=[a-z\d%20] vs https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=supermariobros&intraIns=1&intraMode=0&intraChars=[a-z\d%20]

yan42685 commented 8 months ago

@leeoniya It seems like in intraMode=1 the tollerance to spaces does not work. Searching for supermariobros does not find anything when SingleError mode is selected.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=supermariobros&intraIns=1&intraMode=1&intraChars=[a-z\d%20] vs https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=supermariobros&intraIns=1&intraMode=0&intraChars=[a-z\d%20]

Same here. And thanks for your info which helps me find the reason that causes the unexpected results :)