typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
https://typesense.org
GNU General Public License v3.0
18.68k stars 575 forks source link

exclude search keywords in override rules #1775

Open saidagit77 opened 1 month ago

saidagit77 commented 1 month ago

Hi @jasonbosco ,

Created genre-based override below rule in curation. I need to exclude the list of keywords in this rule. For example, when we search with the 'action' keyword all action genre movies are coming in results and the 'action' movie name is also in the same collection. In this case, I need to exclude action(keyword) genre movies. action movie name should come in the results if a search with 'action'. Can you please suggest want needs to be added to the below rule

{
  "excludes": [],
  "filter_by": "genre:={genre}",
  "filter_curated_hits": false,
  "id": "i-_We4mvgpCkdewKEAYbp",
  "includes": [],
  "remove_matched_tokens": true,
  "rule": {
    "match": "contains",
    "query": "{genre}"
  },
  "stop_processing": false
}
jasonbosco commented 4 weeks ago

Could you give me a set of step-by-step curl commands like this that replicates the issue?

saidagit77 commented 3 weeks ago

This is type sense Version

curl "http://localhost:8108/debug" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}"

{
    "state": 1,
    "version": "0.24.1"
}

This is Schema

curl "http://localhost:8108/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
       -d '{
  "name": "contents",
  "fields": [

    {
      "name": "name",
      "type": "string",
      "facet": false,
      "optional": false,
      "index": true
    }
    {
      "name": "genre",
      "type": "string[]",
      "facet": true,
      "optional": true,
      "index": true
    }
  ]
}

These two documents were added to that collection

  1. one document genre is Action
  2. The second document name is Action
curl "http://localhost:8108/collections/vcontents/documents/import?action=create" \

  {

    "dataType": "movie",
    "genre": [
      "action"
    ],
    "name": "Idhayathil Nee"
  }

  {
    "genre": [
      "drama"
    ]
  }

Here Search Results when we search with the "action" keyword which document has the genre = "action" that content is coming in top. I need to exclude this document genre = "action" in rule and Other document has name=['action'] should be top in results

  curl -L -X GET 'http://localhost:8108/collections/vcontents/documents/search/?q=action&query_by=name,dataType&sort_by=_text_match:desc,releaseDate:desc&query_by_weights=2,1&page=1&per_page=200' -H 'X-TYPESENSE-API-KEY: {API-KEY}'

{
    "facet_counts": [],
    "found": 1,
    "hits": [
        {
                "genre": [
                    "action"
                ],
            },
            "highlight": {},
            "highlights": [],
            "text_match": 100,
            "text_match_info": {
                "best_field_score": "0",
                "best_field_weight": 12,
                "fields_matched": 4,
                "num_tokens_dropped": 1,
                "score": "100",
                "tokens_matched": 0,
                "typo_prefix_score": 255
            }
        }
    ],
    "out_of": 46,
    "page": 1,
    "request_params": {
        "collection_name": "contents",
        "first_q": "action",
        "per_page": 200,
        "q": "action"
    },
    "search_cutoff": false,
    "search_time_ms": 0
}
jasonbosco commented 3 weeks ago

I need to exclude this document genre = "action" in rule and Other document has name=['action'] should be top in results

The override rule you shared earlier does exactly the opposite. It applies a filter of genre:=action, when action exists anywhere in the search keywords, which then filters out any records that don't have that genre.

You can try inverting the filter condition to != like this (you might need to upgrade to v26.0):

{
  "excludes": [],
  "filter_by": "genre!:={genre}",
  "filter_curated_hits": false,
  "id": "i-_We4mvgpCkdewKEAYbp",
  "includes": [],
  "remove_matched_tokens": true,
  "rule": {
    "match": "exact",
    "query": "{genre}"
  },
  "stop_processing": false
}

But at that point, I'm wondering if you need to add the genre field in query_by at all in the search query. You could just remove that from query_by right?

saidagit77 commented 3 weeks ago

Sorry, I did not add details. Let me share some examples here.

let's say we have two documents with genre=action and genre=live

when someone searches for "action" .. all content will genre=action gets displayed - This is working ok.

when someone searches for "live"... i don't want to results to be filtered by genre=live. basically, this rule should disabled for specific genres. is there any way to ignore the rule for specific genres?

jasonbosco commented 3 weeks ago

Ah I see. Could you try adding another rule like this:

{
  "excludes": ["non-existent-id"],
  "id": "0_i-_We4mvgpCkdewKEAYbp",
  "rule": {
    "match": "contains",
    "query": "live"
  },
  "stop_processing": true
}

Key things to note:

So essentially we're creating this rule to intercept the rule processing and prevent the other rule(s) from triggering.

saidagit77 commented 3 weeks ago

applied the above rule. it's working fine as we expected.

@jasonbosco thank you for your support

saidagit77 commented 2 weeks ago

We have one issue with the rules. When the content title is "Love You" with the genre - "drama, romance" Even if the search query is "love you". It does come in search results as content is filtered by genre (love). I need the exact match to come first.

@jasonbosco can you please suggest on this point?

saidagit77 commented 2 weeks ago

@jasonbosco can you please look at it once?

saidagit77 commented 1 week ago

Hi @jasonbosco ,

can you please help me?

saidagit77 commented 1 week ago

@kishorenc can you please look at it once?

jasonbosco commented 1 week ago

You want to add a rule for the love genre, just like the live genre we discussed before, and this time set remove_matched_tokens: false.

That way just for that genre, the full search term "love you" is used for keyword search and then the exact match will be ranked higher.

saidagit77 commented 1 week ago

we already added the same as the live genre rule. it's working fine.

but we have many keywords that need to create many rules. any option to avoid creating rules?

jasonbosco commented 1 week ago

Setting remove_matched_tokens: false on the rules would be the way to achieve this, but then that might have other unintended consequences since you're using dynamic filtering.

These two requires conflict with each other when applied generically... I can't think of any other ways to avoid this besides creating one-off exceptions as needed