Prefix match is ranked lower than expected

Hi, thank you very much for reporting this observation. I think this is an interesting topic for many people. Carcasse and Carasso are good matches for the query carcasso because the words are similar. To be precise, the computed quality is the number of common n-grams between the query and the matched term, divided by the number of n-grams of the longer string.

Most fuzzy searchers are based on such string similarities, related heuristics, or well-defined string metrics. That being said, I can definitely see where you are coming from. A correctly typed prefix could potentially yield a higher-quality match than what is currently computed.

One solution that can be adopted is to implement a search controller that queries a fuzzy searcher as well as a suffix array searcher. The latter is good at finding prefix and suffix matches. The result matches of both can then be distinct and combined. However, I consider this approach to be out of scope for this library.

To get closer to your desired behaviour with the library as is, you could boost the qualities of prefix matches like so:

const query = new fuzzySearch.Query(queryString, 50, 0.2);
const result = searcher.getMatches(query);
for (const match of result.matches) {
  if (match.matchedString.toLowerCase().normalize('NFKD').startsWith(query.string.toLowerCase().normalize('NFKD'))) {
    match.quality = Math.pow(match.quality, 0.5);
  }
}
result.matches.sort((m1, m2) => m2.quality - m1.quality);

The lower the exponent, the more weight is given to prefix matches. With my choice of 0.5 Carcassonne is at rank 1 with a quality of 0.8 for the query carcasso. The quality of the other results don't change because their prefixes don't match.

I will try to find time in Q2 to investigate whether something like this can be implemented in the core of the searcher in a proper and performant way.

m31coding / fuzzy-search

Prefix match is ranked lower than expected #2