krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.15k stars 767 forks source link

Does not seem to match start of sentence whatever setting I put [6.4.6] #561

Closed ghost closed 3 years ago

ghost commented 3 years ago

UPDATE

Another few more hours of fiddling, and here's what I've arrived at:

Make a very "tight" first search

  fuseOptions.threshold = 0.2
  fuseOptions.distance = 40

Then if there's more than one result, feed the results back into a second search

    fuseOptions.threshold = 0.5
    fuseOptions.distance = 50

This seems to finally persuade Fuse that Sheffield is more Sheffield than Driffield!


Original question

Fuse 6.4.6 Node 14.17.0

I have a list, and sometimes when people want an item, they get the name slightly wrong. If they get it right, so does fuse:

query: "Sheffield talking news"

{ score: 9.287439764962262e-10, name: 'Sheffield Talking News' }, { score: 0.3167532482101869, name: 'Driffield Talking Newspaper' }, { score: 0.3167532482101869, name: 'Enfield Talking Newspaper' },

But.... query: "Sheffield talking newspaper"

 { score: 0.28145053080087556, name: 'Driffield Talking Newspaper' },
  { score: 0.28145053080087556, name: 'Enfield Talking Newspaper' },
  { score: 0.3186444689652177, name: 'Wakefield Talking Newspaper' },
  { score: 0.37792776624627444, name: 'Sheffield Talking News' },
  { score: 0.37792776624627444, name: 'Sleaford Talking Newspaper' },

I've been mixing and matching every possible combination of options, and it NEVER brings the one I expect to the top. The docs example says:

With the above options, for something to be considered a match, it would have to be within (threshold) 0.6 x (distance) 100 = 60 characters away from the expected location 0

So therefore, let's say I want to focus on the first 15 characters, then with a threshold of 0.4 I would use 0.4*15 to get a distance of 37.5

So I plug that in and ... still nothing. But I don't want to focus just on those 15 anyway. I don't get how, even with a distance of just 10 and a start of 0, it considers "Driffield Talking Newspaper" to be a closer match than "Sheffield Talking News" for the query "Sheffield Talking Newspaper", when surely it shouldn't even be looking past the highlighted part with a low distance?

Currently I've got these options based on the 0.4*15(chars)=37.5 distance, but distance of 15 is the same, too.

 const fuse = new Fuse(allTNData, fuseOptions)
 const result = fuse.search(query)

const fuseOptions = {
  isCaseSensitive: false,
  includeScore: true,
  minMatchCharLength: 5,
  location: 0,
  threshold: 0.4,
  distance: 37.5,
  // useExtendedSearch: false,
  ignoreLocation: false,
  // ignoreFieldNorm: false,
  keys: ['publication_name']
}
github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days