manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.81k stars 488 forks source link

IDF boosters in HTTP requests with match #2419

Open donhardman opened 1 month ago

donhardman commented 1 month ago

Proposal:

Currently, we do not support phrase boosting while using the match HTTP JSON method; this feature is only supported in query_string.

Due to the logic we use in building the fuzzy search query on the Buddy side, we need to implement this functionality. The suggested implementation is to add an extra field to the match struct, such as boost or idf.

This new field should implement the same IDF (Inverse Document Frequency) boosting logic for the given match phrase.

Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] Changelog updated - [ ] OpenAPI YAML updated and issue created to rebuild clients
sanikolaev commented 1 month ago

As discussed, pls prepare a spec on what it should look like in the interface.

donhardman commented 1 month ago

As previously discussed, we expect the structure to be as follows:

"match": {
  "field1,field2": "keyword"
} 

or

"query": {
  "match": {
    "content,title": {
      "query": "keyword",
      "operator": "or"
    }
  }
}

The suggestion is to add a "booster" or something similar at the same level as the "operator". This means that when you use "content,title": "value" and want to include a booster, you should use the following structure:

"content,title": {
  "query": "value",
  "booster": 1.5
}

This approach allows you to boost the relevance of certain fields in your search query, giving them more weight in the results. 😊

sanikolaev commented 1 month ago

Let's call it not booster, but just boost.

sanikolaev commented 4 weeks ago

@donhardman does this issue block anything?

donhardman commented 4 weeks ago

Technically, it's not blocking anything. However, we use boost logic to promote the most relevant phrases when forming SQL requests in fuzzy logic. If we don't have it here and use a match with query fields, it will not work for this case.