lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.64k stars 133 forks source link

`fuzzy` predicate function? #237

Closed subsetpark closed 9 months ago

subsetpark commented 9 months ago

The docs in the MiniSearch say this about the fuzzy parameter:

/**
   * Controls whether to perform fuzzy search. It can be a simple boolean, or a
   * number, or a function.
   *
   * If a boolean is given, fuzzy search with a default fuzziness parameter is
   * performed if true.
   *
   * If a number higher or equal to 1 is given, fuzzy search is performed, with
   * a maximum edit distance (Levenshtein) equal to the number.
   *
   * If a number between 0 and 1 is given, fuzzy search is performed within a
   * maximum edit distance corresponding to that fraction of the term length,
   * approximated to the nearest integer. For example, 0.2 would mean an edit
   * distance of 20% of the term length, so 1 character in a 5-characters term.
   * The calculated fuzziness value is limited by the `maxFuzzy` option, to
   * prevent slowdown for very long queries.
   *
   * If a function is passed, the function is called upon search with a search
   * term, a positional index of that term in the tokenized search query, and
   * the tokenized search query. It should return a boolean or a number, with
   * the meaning documented above.
   */
  fuzzy?: boolean | number | ((term: string, index: number, terms: string[]) => boolean | number),

In other words, it behaves like the prefix parameter.

However, that doesn't seem to be true. I have a fuzzy parameter that looks like this:

const fuzzy = (
    term: string,
    _index: number,
    _terms: string[]
): number | boolean => term.length >= 3 ? 0.2 : false

And I am seeing fuzzy matches on smaller terms than 3. It doesn't seem that this function is being called.

Do I misunderstand the parameter?

lucaong commented 9 months ago

Hi @subsetpark , the fuzzy option accepts a predicate function, and should work as you expect. Could you show how you pass this option to the constructor or to the search method? I suspect that something is wrong, and the option is simply not considered, so the default applies: with a "fuzziness" of 0.2, terms with a smaller length than 3 would not allow any fuzzy match anyway (as, for example, for a two-characters words the max edit distance would be Math.round(2*0.2) which is 0 even without the condition on term.length).

Are you also using prefix match? If so, short terms might be matching longer ones due to prefix match.

Finally, if you checked all the above, it is possible that there is a bug. Would it be possible to share a minimal reproduction example?

lucaong commented 9 months ago

I now see that you closed the issue, so I assume you found the problem. If not, feel free to reopen it or comment further.