leeoniya / uFuzzy

A tiny, efficient fuzzy search that doesn't suck
MIT License
2.65k stars 48 forks source link

Empty needle error question #29

Closed MaxGiting closed 1 year ago

MaxGiting commented 1 year ago

I have the below code running which throws an Empty needle! error when a user types punctuation or other "special" characters.

let results = []
let noResults =  'No matching results found'

if (needle === '') {
    results[0] = noResults;
} else {
    [idxs, info, order] = uf.search(haystack, needle, outOfOrder, infoThresh)

    if (idxs.length > 0) {
        for (let i = 0; i < order.length; i++) {
            // Push result to results array
        }
    } else {
        results[0] = noResults;
    }
}

Originally I saw the Empty needle! error thrown when a user typed whitespace so I added the if statement at the start. I could modify that if statement to fix all situations, by calling the internal split() function. As shown below.

if (needle === '')

becomes

if (uf.split(needle).length === 0)

Which is how internally it checks for an empty needle https://github.com/leeoniya/uFuzzy/blob/130b0380c19bb02613331f38f9f4a6b0673dbb50/src/uFuzzy.js#L185

My question is. Is split() exposed on purpose and will it be supported moving forward as it is not documented. Or could that API's behaviour change? Also why is an error Empty needle! thrown? Can it / should it fail gracefully?

leeoniya commented 1 year ago

Is split() exposed on purpose and will it be supported moving forward as it is not documented. Or could that API's behaviour change?

yes, .split() is part of the official API:

https://github.com/leeoniya/uFuzzy/blob/130b0380c19bb02613331f38f9f4a6b0673dbb50/dist/uFuzzy.d.ts#L34-L35

it actually predates the .search() API. previously, everything .search() does was expected to be done externally, like https://github.com/leeoniya/uFuzzy#example, plus more code for outOfOrder via split() + permute(), etc.

Also why is an error Empty needle! thrown?

cause i'm not sure what to do instead :(

there are basically two opposite UIs that can be used for fuzzy search:

  1. start by showing everything, filter as you type
  2. start by showing nothing, show found matches as you type, or after pressing Enter

what should .search() return when the needle is empty after processing, everything or nothing? the answer is not clear to me, either can be right or wrong, depending on the UI you hook up.

so currently you can use .split() to pre-check the needle and skip searching, or you can wrap the .search() call in a try {} catch {}. they'll basically have the same effect.

Can it / should it fail gracefully?

suggestions are welcome, i don't like the throwing behavior either.

darrylnoakes commented 1 year ago

Could an optional parameter be added to search, that allows you to choose the behavior? Something like whenEmpty: "throw" | "all" | "none" = "throw". Just an idea.

leeoniya commented 1 year ago

yeah, maybe.

if the option is explicit, then there's no need for throw, and we can just make it a boolean, maybe part of the constructor opts since i dont see this needing to change at runtime. now we'll have to bikeshed the default setting :rofl:

darrylnoakes commented 1 year ago

Yeah, a constructor argument would make more sense.

I left throwing as an option so that it could be the default, but I guess no one ever really needs that, as they will almost always want one of the other two behaviors. If a breaking change could be tolerated, it could be a required argument?

leeoniya commented 1 year ago

i guess the other concern i have is performance. why would i ever bother generating and returning an array of all idxs when the user can just iterate their whole haystack directly? the default would certainly be to return an empty idxs when the needle is empty just due to this sadness.

returning null for idxs is another option. but the ergonomics are maybe questionable of null == search didn't happen vs [] == search returned no results. but maybe this is the best option without any additional settings.

darrylnoakes commented 1 year ago

Personally, just having an empty array would probably be the best.

People who want no results on an empty needle could use it as is, and people who want all results could check the needle themselves using split(). This allows the internal code to just have to handle the actual searching.

An alternative of an optional constructor argument to instead return everything with an empty needle would make it more ergonomic, at the cost of a (probably slight) increase in internal code complexity and a (probably slight) decrease in performance. The results could just be used as-is by both "parties".

Using null would mean that type checks would also be required (I have suffered a lot from interfaces like this), but in terms of semantics it does allow "empty needle" to be differentiated from "no results", without needing extra needle-checking code. I'm not sure when/if the benefits would outweigh the costs, though.

leeoniya commented 1 year ago

the more i think about it the more i'm leaning towards using idxs == null as an indicator of an aborted search.

.search() already has the property of returning either [idxs, null, null] or [idxs, info, order], depending on whether the length of found idxs was below or above infoThresh, so you already need to test for this. having idxs be null would be very consistent with the existing behavior.

darrylnoakes commented 1 year ago

Hmm yes. I forgot about that; my mind is not at full capacity at the mo. With that context, null does definitely seem to make the most sense. And then the caller could simply choose what to do in that case (show no results, show everything, throw an error, whatever); as the results already have to be checked for differences because of infoThresh, adding a check for a third case wouldn't really complicate things.

I'd be happy with [null, null, null] for an aborted search. 👍

leeoniya commented 1 year ago

pushed those changes. would be cool if someone can test it out before i do a release. you can set your package.json dependency to a github commit: "@leeoniya/ufuzzy": "leeoniya/uFuzzy#672885d96d8f67744a24c360c9df750221b07e67"

MaxGiting commented 1 year ago

I love it! Returning null seems very intuitive to me and means a lot of boilerplate code can be removed from the original example I provided.

I have tested this and it works just as expected and passes all my tests.

One final question. Am I right in thinking this line from my example above

if (idxs.length > 0) {

can just become

if (idxs !== null) {
leeoniya commented 1 year ago

well, they're different, right?

if (idxs == null) {
  // aborted, not searched
} else if (idxs.length > 0) {
  // searched, found
} else {
  // searched, not found
}

it's up to you if you want to combine the length=0 and idxs=null cases. it depends on your UI.

darrylnoakes commented 1 year ago

Works for me :+1:

I have replaced this:

// If the needle is empty, return all indexes.
if (uf.value.split(needle.value).length === 0) {
  return {
    indexes: Array.from({ length: haystack.value.length }, (_, i) => i),
  };
}

with this:

// If the needle is empty, return all indexes.
if (indexes === null) {
  return {
    indexes: Array.from({ length: haystack.value.length }, (_, i) => i),
  };
}

So not too much change there, other than simplifying the conditional. However, it is much "cleaner", as it fits right in with the check for info === null.

darrylnoakes commented 1 year ago

And, of course, I now get a compile-time error from TypeScript if I forget the check. Great bonus!

leeoniya commented 1 year ago

https://github.com/leeoniya/uFuzzy/releases/tag/1.0.6