typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
https://typesense.org
GNU General Public License v3.0
20.11k stars 621 forks source link

A Bit confused about Query suggestions and Aggregation key #1801

Open saidagit77 opened 2 months ago

saidagit77 commented 2 months ago

Hi @jasonbosco ,

I am exploring Query suggestions & Aggregation key.

These Steps are the following:

  1. When Self-Hosting

    --enable-search-analytics=true \
    --analytics-dir=/path/to/analytics-data \ 
    --analytics-flush-interval=60
  2. Create a collection for queries

    curl -k "http://localhost:8108/collections" \
      -X POST \
      -H "Content-Type: application/json" \
      -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
      -d '{
        "name": "product_queries",
        "fields": [
          {"name": "q", "type": "string" },
          {"name": "count", "type": "int32" }
        ]
      }
  3. Create an analytics rule

curl -k "http://localhost:8108/analytics/rules" \
      -X POST \
      -H "Content-Type: application/json" \
      -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
      -d '{
        "name": "product_queries_aggregation",
        "type": "popular_queries",
        "params": {
            "source": {
                "collections": ["contents"]
            },
            "destination": {
                "collection": "product_queries"
            },
            "limit": 1000
        }
      }'

Search the "Comedy" keyword and then document the registered destination collection with "Comedy". I got it at this point.

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=comedy&query_by=title&x-typesense-user-id=10.

image.

As per docs a->ac->act->acti->actio->action-> after 4 sec + analytics-flush-interval(60) after that it will register "action".

but if I tried with the below curls in below 60 seconds. Six documents were registered into the destination collection with the action keyword and it should be registered with only one documentation with the action keyword. why were 6 documents registered? Anything I did wrong or missed? can you please guide and help me?

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=a&query_by=title&x-typesense-user-id=10.

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=ac&query_by=title&x-typesense-user-id=10.

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=act&query_by=title&x-typesense-user-id=10.

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=acti&query_by=title&x-typesense-user-id=10.

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=actio&query_by=title&x-typesense-user-id=10

curl -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" -H "X-TYPESENSE-USER-ID: 10" http://localhost:8108/collections/contents/documents/search?q=action&query_by=title&x-typesense-user-id=10.

image

jasonbosco commented 2 months ago

You'd need to send ALL the curl search requests within a 4 second window, for it to register as a single search.

The 60s window is when Typesense looks at past queries and writes the data into the collection. But it will only aggregate search requests that occur within 4s windows.

saidagit77 commented 2 months ago

Oh ok. Can we change/increment the value from the 4-second window like this property --analytics-flush-interval=60

One more thing reg: no hits queries

Step: 1 -> Created No hits queries Collection Step 2 -> Created No hits queries rules. Step 3 - > Search with Saidaroy keyword. no results. but this query is not registered in no hits queries collection.

please find the below pics

image

image

jasonbosco commented 2 months ago

No the 4s window is not configurable at the moment. Could you open a new issue with the feature request, so we can track interest?

Could you make sure you've set all the the parameters specified here: https://typesense.org/docs/26.0/api/analytics-query-suggestions.html#when-self-hosting

saidagit77 commented 2 months ago

yes. the below parameters are specified.

./typesense-server --data-dir=/path/to/data --api-key=abcd \
  --enable-search-analytics=true \
  --analytics-dir=/path/to/analytics-data \ 
  --analytics-flush-interval=60

but still no search results queries not registered into the no-hits query collection

jasonbosco commented 2 months ago

Could you give me a standalone end-to-end set of commands that starts the Typesense server, creates a collection, creates a document, creates the analytics rule and then does a search query for a non-existent term that replicates the issue?

Here's a template that you can use to adapt to produce the standalone script to reproduce the issue you're seeing.