krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.15k stars 767 forks source link

Match indices contain ranges of partial matches #611

Closed codeaid closed 2 years ago

codeaid commented 2 years ago

Describe the bug

When including matches in the results the indices array contains ranges of partial matches that shouldn't be included in the output.

Version

v6.5.3

Is this a regression?

I don't think it is as there is another report from two years ago about the same issue (#505) that was automatically closed and never resolved.

đŸ”¬Minimal Reproduction

const data = [
  { value: 'AA BB AAA BBB' },
];

const index = new Fuse(data, {
  includeMatches: true,
  ignoreFieldNorm: true,
  ignoreLocation: true,
  includeScore: true,
  keys: ['value'],
  threshold: 0.001,
});

const results = index.search('AAA');

After this results contains the following:

[
  {
    "item": {
      "value": "AA BB AAA BBB"
    },
    "refIndex": 0,
    "matches": [
      {
        "indices": [
          [ 0, 1 ],
          [ 6, 8 ]
        ],
        "value": "AA BB AAA BBB",
        "key": "value"
      }
    ],
    "score": 0.001
  }
]

Note how indices contains [0, 1] when it's not even a match. I'm guessing it's there because it's a partial match, however, there aren't any options available to get rid of it. Event the threshold of 0.001 seems to be completely ignored.

This behaviour renders the indices pretty much useless because they include irrelevant ranges, which I as someone in this case searching for AAA don't even care about.

The suggestion to set minMatchCharLength to 2 (mentioned in #505) is not really a solution because it would still behave in the exact same way scenarios like this when the irrelevant match is 2 or more characters long.

Additional context

I wanted to use this functionality to highlight matches in search results but as it stands it's not possible because if the user searches for AAA and I'm not manually processing/filtering indices in any way then AA would also get highlighted confusing the user.

Not knowing much about internals of this library I'd suggest to either completely remove indices or matches that are not full matches, or at least respect the threshold and not include indices that don't match the threshold.

codeaid commented 2 years ago

I've been experimenting with highlighting search results and just stumbled upon another good example.

When having an indexed string value of 402220 63193795 0000 ABC and searching for 402220 the following indices are returned:

As a result I end up highlighting 402220 and 0000, which is obviously completely wrong because the search query doesn't even contain two zeroes, let alone four.

Same happens even if I enable the useExtendedSearch flag and wrap my query in double quotes ("402220"), thinking that it will only return exact matches.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

zach-is-my-name commented 2 years ago

Yeah same problem

sunknudsen commented 1 year ago

@krisk Would you happen to know what is going on? Experiencing similar issue… see screenshot.

import fuseJs from "fuse.js"

…

const faqFuseJs = new fuseJs(sortableTopics, {
  distance: Infinity,
  findAllMatches: true,
  includeMatches: true,
  keys: ["content", "metadata.title"],
  minMatchCharLength: 2,
  shouldSort: true,
  threshold: 0,
})

const results = faqFuseJs.search("linux")

console.log(results)

I am perhaps naively expecting indices to contain a single linux match… puzzled.

fuse