lucaong / minisearch

Tiny and powerful JavaScript full-text search engine for browser and Node
https://lucaong.github.io/minisearch/
MIT License
4.68k stars 135 forks source link

Should documents with fields that start with the search term score higher? #275

Closed ItaiYosephi closed 1 month ago

ItaiYosephi commented 1 month ago

Hi, in the following screenshot taken from the demo, I search for the word love. I would expect that the results the came sixth, would be first. it both start with search term, an is exact match. Is there a way to configure minisearch to do that? CleanShot 2024-07-22 at 09 34 33

lucaong commented 1 month ago

Hi @ItaiYosephi , when searching on a single field, with default options, the behavior you describe is already what MiniSearch does. Here is an example, using only exact match (no fuzzy nor prefix search) and searching only the title field:

Screenshot 2024-07-22 at 10 17 06 AM

As you can see, the songs with a title containing just the term "Love" are ranking first.

In your screenshot the song titled just "Love" is not the first result is because, due to prefix search, some results match less common terms like "Lovergame" or "Loversong", and "I Never Loved A Man The Way I Love You" matches twice. Less common terms are considered more specific and therefore get a higher score in most scoring algorithms like BM25+ (used by MiniSearch) and Tf-Idf.

Note that the position of the matched term in the text (whether it's at the beginning of the sentence or not) has no influence on the scoring though, because MiniSearch by design uses a "bag of words" approach that does not consider the order of terms. This enables MiniSearch to keep a much smaller index, which fits in the process memory. If needed, one could use the boostDocument option to boost results where one of the query terms is at the beginning.

I hope this helps!

ItaiYosephi commented 1 month ago

@lucaong thank you for your through response!