oramasearch / orama

🌌 A complete search engine and RAG pipeline in your browser, server or edge network with support for full-text, vector, and hybrid search in less than 2kb.
https://docs.orama.com
Other
8.6k stars 289 forks source link

Unintuitive boost behavior #783

Closed AdventureBeard closed 1 month ago

AdventureBeard commented 1 month ago

Describe the bug

When I do a search for a phrase, I expect a boosted field with an exact match to rise to the top of the results. However, matches with hits in lower-boosted fields tend to overtake.

For example, in this search feature, I've got two important fields: resourceName, which is a 3x boost. And text, which is a 1 by default.

"Port at the moonlight gate" is an exact match for a resourceName in my index. It's mentioned in several articles in their text fields. The text field matches always outrank the title match. In fact, the higher I boost the resourceName field, the lower it shows up in the search results. My users are perplexed by this behavior as well. "Phasing gate" and other articles with "moonlight" in the title show up as well, which makes sense as they are keyword matches, but I'd still expect a perfect phrase match to be scored higher than individual keyword matches.

Screenshot 2024-09-10 at 12 15 25 Screenshot 2024-09-10 at 12 19 08

To get around it, I'm doing a two phase search: One for an exact match in the title field, one for general search, and then distinct-merging the two lists. This works, but I feel like orama should be able to handle this scenario. Is there anything else I can do to favor exact matches in higher-boosted fields? Do I need to increase my understanding of BM25 to make sense of these results?

To Reproduce

  1. Add schema with two fields: title and content.
  2. Boost the title by 100x.
  3. Add documents with titles and contents with overlapping text.
  4. Search for a phrase match on the title of one of the documents, that is also present in the content field of another document.
  5. Observe that the unboosted field outscores the exact title match.

Expected behavior

I expected an exact phrase match in a boosted field would outscore phrase matches in non-boosted fields.

Environment Info

Orama: 2.0.23

Affected areas

Search

Additional context

No response

micheleriva commented 1 month ago

Hi @AdventureBeard thanks for spotting this. Can I kindly ask you to upgrade to Orama v2.1.0 (just released it) and see if the error persists?