krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
17.76k stars 753 forks source link

Fuse.js demo produces unexpected results #771

Closed jasonpolites closed 1 week ago

jasonpolites commented 3 weeks ago

Is there an existing issue for this?

Description of the bug

I initially followed the community guidelines and posted to stackoverflow, but it seems that may not be seen, so reposting here.

I am evaluating client-side search tools, and I tried the demo for Fuse, but it's producing results I didn't expect. I don't know if it's a "bug", or just expected behavior, but it was not what I expected.

The demo has a list of elements, the third element (from the top) looks like this:

  {
    "title": "HTML5",
    "author": {
      "firstName": "Remy",
      "lastName": "Sharp"
    }
  }

The text "Remy" appears only in this one element.

If you enter "remy" as a search term, you get the following results:

[
  {
    "item": {
      "title": "HTML5",
      "author": {
        "firstName": "Remy",
        "lastName": "Sharp"
      }
    },
    "refIndex": 2
  },
  {
    "item": {
      "title": "The Preservationist",
      "author": {
        "firstName": "David",
        "lastName": "Maine"
      }
    },
    "refIndex": 20
  },
  {
    "item": {
      "title": "Angels & Demons",
      "author": {
        "firstName": "Dan",
        "lastName": "Brown"
      }
    },
    "refIndex": 7
  }
]

The search results correctly retrieve (and rank) the element with "firstName": "Remy", but I can't figure out why the other two results are there. There doesn't seem to be anything in the indexed data for these other results that contain the work "remy".

I think I expected just one result.

The Fuse.js version where this bug is happening.

6.6.2

Is this a regression?

Which version did this behavior use to work in?

None

Steps To Reproduce

Search for the term "Remy" in the online demo

Expected behavior

I expected to see a single match for the term"remy"

Screenshots

No response

Additional context

No response

EarMaster commented 1 week ago

I think this is due to the fuzziness of the search. If you enable the includeMatches option it shows you, why it thinks an entry matches. What you experienced as false positives are just very fuzzy matches.

What you can do to decrease the fuzziness is to tweak the matching options. Although I do agree with you that the default settings are indeed very fuzzy resulting in very confusing matches. Most likely reducing the threshold by quite a bit (e.g. 0.3) will improve your matches.

jasonpolites commented 1 week ago

Hmm.. yeah. This "threshold" value seems tricky. Intuitively, if I set it to 0.0 my expectation would be "no fuzzies please", which actually works for the search term ("Remy") in the original issue, but returns 0 results for other queries which should match (e.g. "Artist"). This feels like a tricky value to get right. I get that fuzzy is "fuzzy", but I can't help but think that a mode which matches literal strings only (with ranking based on frequency of occurrence and ordering of tokens, stemming in an ideal world) would be helpful. I would say that would generally be favored as the default, with fuzziness and option above that.

jasonpolites commented 1 week ago

I will close this issue, as technically it's WAI. I could debate whether the "I" in WAI is the right answer or not, but this issue itself is resolved.

EarMaster commented 1 week ago

If you set threshold: 0 and ignoreLocation: true. It does what you want, but this library is all about fuzzy search so…

The reason it doesn't work without ignoreLocation is that the search string is not exactly at the location you expect it to be (that's what location (defaults to 0) is for) and therefore the score is reduced (increased actually as the 0 is a perfect match) and filtered out by the threshold.

jasonpolites commented 1 week ago

That's fair. What I actually wanted was just an inverted index in pure JS. Fuse says it right on the tin, it's a "lightweight fuzzy-search, in JavaScript". I guess I skipped past that and just saw "search" 😄