Closed arctica closed 1 year ago
hmm, [0, Infinity]
should do it :thinking:
this appears to be working?
https://jsfiddle.net/j7xkba1v/2/
let haystack = ["Fireball"];
let needle = "Direball";
let uf = new uFuzzy({
intraMode: 1,
intraIns: 1,
intraSub: 1,
intraTrn: 1,
intraDel: 1,
intraSlice: [0, Infinity],
});
let out = uf.search(haystack, needle);
console.log(out);
maybe you forgot to set the additional tolerances? (intraMode: 1
does not auto-enable them).
this is kind of a footgun tho, maybe uFuzzy should auto-set them all when intraMode: 1
and require opting out rather than opting in.
that being said, you probably dont want unconditional intraSlice: [0, Infinity],
since typing a single letter will bring back the entire haystack. you most likely want these tolerance rules to be dynamic based on term length, which is what intraRules
is for:
the default intraRules
already have some small-word guards that prevent a bad experience, so you may want to copy that fn and provide your intraSlice
logic in addition to this:
Thanks for the fast reply. I've noticed that the reason why my test with intraSlice
: [0, Infinity]
didn't work is because the string in the hackstack contained more than the needle word (think Fireball Test
. It seems like it doesn't work as a partial/prefix match anymore.
So searching for Dire
will match when the haystack is Fire
but wont match if the haystack is Fireball
. In the Fiddle you provided you can see this by changing the needle to Dire
instead of Direball
. If we instead search for Fire
it will match Fireball
just fine. Weirdly though the needles Direb
or Direba
and so on do return a match.
Thank you for pointing out the intraRules option though, that will be handy.
didn't work is because the string in the hackstack contained more than the needle word (think
Fireball Test
. It seems like it doesn't work as a partial/prefix match anymore.
if you want only prefix matching you should set interLft: 2
.
replacing Fireball
with Fireball Test
in the haystack still works just fine for "Direball" needle. but as i explained above there is additional intraRules
logic to make the matching more strict for needle terms <= 4 chars, which is probably why Dire
does not work. maybe that default logic needs to be tweaked, as it currently assumes intraSlice
is the default [1, Infinity]
. (PRs are welcome).
you can replace intraRules
with your own function that has no special treatment of small terms and follows the same tolerance rules exactly for all terms. i'm not sure you'll be happy with the result of this, though...i wasn't. do you really want:
Oh sorry. I didn't understand the infraRules
part correctly before. I see now that it has special logic for terms shorter than 5 characters which is why Dire
was the cut-off point while Direb
still worked.
Having run tests on my production haystack of roughly 25k entries I am definitely seeing considerably better results by overriding infraRules
for short terms which would otherwise result in no matches at all. One example I saw was cork
not matching a name starting with Cook County
without the override. It does have side effects like changing the results that would have matches with and without this setting but in those cases it was sometimes hard to tell which one should be preferred.
I should point out though that in my application it is much worse to show no results than to show something that is not exactly a good match. When the haystack is big enough, it seems even short search terms will hit enough good results. On the flip side if results are omitted due to a single character error then that has a much higher negative impact on the user. We do start the search only if there are at least 2 characters. We then take the top N results and present them to the user.
So with the following options I'm seeing pretty decent results.
const opts = {
intraMode: 1,
intraChars: `[a-z\d' ]`,
intraRules: () => {
return {
intraIns: 1,
intraSub: 1,
intraTrn: 1,
intraDel: 1,
intraSlice: [0, Infinity]
};
}
}
For longer search terms allowing more fuzzyness (e.g. 2 errors) could improve the results even further. But I think uFuzzy does not support that.
One example I saw was
cork
not matching a name starting withCook County
without the override.
i've never misspelled cook
as cork
personally since o
and r
arent very close on a QWERTY layout. but i also used to live in Cook County, so i might be biased :rofl:
For longer search terms allowing more fuzzyness (e.g. 2 errors) could improve the results even further. But I think uFuzzy does not support that.
yeah more than 1 error is not really feasible since uFuzzy effectively constructs all possible regexp permutations of a single error for each term. those regexps can get pretty large already. any additional fuzziness would be too absurd for the current design 😅 . im happy with the tradeoff and outcome. people who need something different have quite a few other options to choose from.
Makes sense. It's pretty amazing and surprising that the regexp approach works as good as it does :)
The cork
example might have not been the best but you can take cooj
if you will. That's right next on the keyboard and just a single error (not even at the start) away but results in no matches. Cork
could have been a typo introduced by a spelling auto correct I guess. Sometimes they are hurting rather than helping :)
This plus further testing where I saw results getting sometimes a bit too far from reasonable for very short terms have lead me to an approach that so far looks like a pretty good tradeoff:
I first do a search without the intraRules
override and only when there are fewer than X (e.g. 10) results do I do another round but this time with the intraRules
override. This has the benefit of returning better matches for short terms in the usual case and has noticably better performance (up to 10x faster) while still not missing matches that need a bit more fuzziness.
but i also used to live in Cook County, so i might be biased rofl
haha, what a funny coincidence :)
It seems like errors in the first letter of a word are not handled even when SingleError mode is used.
For example
Direball
does not return results containingFireball
.Test: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&intraMode=1&intraIns=1&intraChars=[a-z\d%20]&outOfOrder&search=Direball
I found the
intraSlice
option and tried with[0, Infinite]
but that didn't do the trick.