Closed MaxGiting closed 1 year ago
Is split() exposed on purpose and will it be supported moving forward as it is not documented. Or could that API's behaviour change?
yes, .split()
is part of the official API:
it actually predates the .search()
API. previously, everything .search()
does was expected to be done externally, like https://github.com/leeoniya/uFuzzy#example, plus more code for outOfOrder
via split() + permute(), etc.
Also why is an error
Empty needle!
thrown?
cause i'm not sure what to do instead :(
there are basically two opposite UIs that can be used for fuzzy search:
what should .search()
return when the needle is empty after processing, everything or nothing? the answer is not clear to me, either can be right or wrong, depending on the UI you hook up.
so currently you can use .split()
to pre-check the needle and skip searching, or you can wrap the .search()
call in a try {} catch {}
. they'll basically have the same effect.
Can it / should it fail gracefully?
suggestions are welcome, i don't like the throwing behavior either.
Could an optional parameter be added to search, that allows you to choose the behavior?
Something like whenEmpty: "throw" | "all" | "none" = "throw"
.
Just an idea.
yeah, maybe.
if the option is explicit, then there's no need for throw
, and we can just make it a boolean, maybe part of the constructor opts since i dont see this needing to change at runtime. now we'll have to bikeshed the default setting :rofl:
Yeah, a constructor argument would make more sense.
I left throwing as an option so that it could be the default, but I guess no one ever really needs that, as they will almost always want one of the other two behaviors. If a breaking change could be tolerated, it could be a required argument?
i guess the other concern i have is performance. why would i ever bother generating and returning an array of all idxs when the user can just iterate their whole haystack directly? the default would certainly be to return an empty idxs
when the needle is empty just due to this sadness.
returning null
for idxs
is another option. but the ergonomics are maybe questionable of null
== search didn't happen vs []
== search returned no results. but maybe this is the best option without any additional settings.
Personally, just having an empty array would probably be the best.
People who want no results on an empty needle could use it as is, and people who want all results could check the needle themselves using split()
. This allows the internal code to just have to handle the actual searching.
An alternative of an optional constructor argument to instead return everything with an empty needle would make it more ergonomic, at the cost of a (probably slight) increase in internal code complexity and a (probably slight) decrease in performance. The results could just be used as-is by both "parties".
Using null
would mean that type checks would also be required (I have suffered a lot from interfaces like this), but in terms of semantics it does allow "empty needle" to be differentiated from "no results", without needing extra needle-checking code. I'm not sure when/if the benefits would outweigh the costs, though.
the more i think about it the more i'm leaning towards using idxs == null
as an indicator of an aborted search.
.search()
already has the property of returning either [idxs, null, null]
or [idxs, info, order]
, depending on whether the length of found idxs
was below or above infoThresh
, so you already need to test for this. having idxs
be null
would be very consistent with the existing behavior.
Hmm yes. I forgot about that; my mind is not at full capacity at the mo. With that context, null
does definitely seem to make the most sense.
And then the caller could simply choose what to do in that case (show no results, show everything, throw an error, whatever); as the results already have to be checked for differences because of infoThresh
, adding a check for a third case wouldn't really complicate things.
I'd be happy with [null, null, null]
for an aborted search. 👍
pushed those changes. would be cool if someone can test it out before i do a release. you can set your package.json
dependency to a github commit: "@leeoniya/ufuzzy": "leeoniya/uFuzzy#672885d96d8f67744a24c360c9df750221b07e67"
I love it! Returning null seems very intuitive to me and means a lot of boilerplate code can be removed from the original example I provided.
I have tested this and it works just as expected and passes all my tests.
One final question. Am I right in thinking this line from my example above
if (idxs.length > 0) {
can just become
if (idxs !== null) {
well, they're different, right?
if (idxs == null) {
// aborted, not searched
} else if (idxs.length > 0) {
// searched, found
} else {
// searched, not found
}
it's up to you if you want to combine the length=0 and idxs=null cases. it depends on your UI.
Works for me :+1:
I have replaced this:
// If the needle is empty, return all indexes.
if (uf.value.split(needle.value).length === 0) {
return {
indexes: Array.from({ length: haystack.value.length }, (_, i) => i),
};
}
with this:
// If the needle is empty, return all indexes.
if (indexes === null) {
return {
indexes: Array.from({ length: haystack.value.length }, (_, i) => i),
};
}
So not too much change there, other than simplifying the conditional.
However, it is much "cleaner", as it fits right in with the check for info === null
.
And, of course, I now get a compile-time error from TypeScript if I forget the check. Great bonus!
I have the below code running which throws an
Empty needle!
error when a user types punctuation or other "special" characters.Originally I saw the
Empty needle!
error thrown when a user typed whitespace so I added the if statement at the start. I could modify that if statement to fix all situations, by calling the internalsplit()
function. As shown below.becomes
Which is how internally it checks for an empty needle https://github.com/leeoniya/uFuzzy/blob/130b0380c19bb02613331f38f9f4a6b0673dbb50/src/uFuzzy.js#L185
My question is. Is split() exposed on purpose and will it be supported moving forward as it is not documented. Or could that API's behaviour change? Also why is an error
Empty needle!
thrown? Can it / should it fail gracefully?