typesense / typesense-instantsearch-adapter

A JS adapter library to build rich search interfaces with Typesense and InstantSearch.js
MIT License
414 stars 64 forks source link

Wrong number of group_by documents returned #213

Closed coatezy closed 4 months ago

coatezy commented 4 months ago

Description

I have approx 985 lessons across approx 41 courses and would like to group lessons by the course.title.

  collectionSpecificSearchParameters: {
    lessons_development_1721316941: {
      query_by: "course.title,title,keywords",
      include_fields: "course.title",
      group_by: "course.title",
      group_limit: 1
    }
  }

If I provide a search term all expected courses are returned, presumably because the number of documents has been reduced. However; if there is no search term and all results are returned I only get get 15 courses. When I set the same parameters in the cloud search interface I get back all the courses (41 results found from 985 docs in 3ms). Would this be a InstantSearch.js adaptor related issue?

I have experienced this using the InstantSearch React library, I will test using the InstantSearch.js library too and report back.

Steps to reproduce

  1. Create a new collection:
curl "https://{HOSTNAME}/collections" \
       -X POST \
       -H "Content-Type: application/json" \
       -H "X-TYPESENSE-API-KEY: {API_KEY}" \
       -d '{
         "name": "lessons_development_1721316941",
         "fields": [
          {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": ".*",
            "optional": true,
            "sort": false,
            "stem": false,
            "type": "auto"
          },
          {
            "facet": true,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "course.title",
            "optional": false,
            "sort": false,
            "stem": false,
            "type": "string"
          },
          {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "course",
            "optional": true,
            "sort": false,
            "stem": false,
            "type": "object"
          },
          {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "keywords",
            "optional": true,
            "sort": false,
            "stem": false,
            "type": "string[]"
          },
          {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "position",
            "optional": true,
            "sort": true,
            "stem": false,
            "type": "int64"
          },
          {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "title",
            "optional": true,
            "sort": false,
            "stem": false,
            "type": "string"
          }
         ],
         "enable_nested_fields": true
       }'
  1. Import records

  2. Query used that works via a direct curl request but exhibits the unexpected behaviour via the InstantSearch adaptor.

    curl 'https://{HOSTNAME}/multi_search' \
    -H 'accept: application/json, text/plain, */*' \
    -H 'content-type: text/plain' \
    -H 'x-typesense-api-key: {API_KEY}' \
    --data-raw '{"searches":[{"query_by":"course.title","group_by":"course.title","sort_by":"_text_match:desc","group_limit":"1","highlight_full_fields":"course.title","collection":"lesson_development_1721316940","q":"*","page":1,"per_page":10}]}'

Expected Behavior

I'd expect 41 results found from 985 documents.

Actual Behavior

With the default per_page value of 10, 4 results are returned. If I increase this to 250 then 15 results are returned.

Metadata

Typesense Version: v26.0

OS: Typesense Cloud

tharropoulos commented 4 months ago

After digging around, I created a games collection

{
    "created_at": 1721377414,
    "default_sorting_field": "release_date",
    "enable_nested_fields": false,
    "fields": [
        {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "name",
            "optional": false,
            "sort": false,
            "stem": false,
            "type": "string"
        },
        {
            "facet": true,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "price",
            "optional": false,
            "sort": true,
            "stem": false,
            "type": "float"
        },
        {
            "facet": true,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "hltb_single",
            "optional": true,
            "sort": true,
            "stem": false,
            "type": "int32"
        },
        {
            "facet": true,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "positive",
            "optional": false,
            "sort": true,
            "stem": false,
            "type": "int32"
        },
        {
            "facet": true,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "negative",
            "optional": false,
            "sort": true,
            "stem": false,
            "type": "int32"
        },
        {
            "facet": true,
            "index": true,
            "infix": false,
            "locale": "",
            \"name": "app.id",\
            "optional": false,
            "sort": false,
            "stem": false,
            "type": "string"
        },
        {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "min_owners",
            "optional": false,
            "sort": true,
            "stem": false,
            "type": "int32"
        },
        {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "max_owners",
            "optional": false,
            "sort": true,
            "stem": false,
            "type": "int32"
        },
        {
            "facet": false,
            "index": true,
            "infix": false,
            "locale": "",
            "name": "release_date",
            "optional": false,
            "sort": true,
            "stem": false,
            "type": "int64"
        }
    ],
    "name": "games",
    "num_documents": 76048,
    "symbols_to_index": [],
    "token_separators": []
}

Because I thought it must have been an issue with the dot (.) notation.

So I then added these params to my instantsearch adapter config

const additionalSearchParameters: BaseSearchParameters = {
    query_by: "app.id",
    num_typos: 0,
    group_by: "app.id",
    group_limit: 1,
};

and upon loading the page, the POST request to http://localhost:8108/multi_search had the following request body

{
  "searches": [
    {
      "collection": "games",
      "facet_by": "price",
      \"group_by": "app.id",\
      \"group_limit": 1,
      "highlight_full_fields": "app.id",
      "num_typos": 0,
      "page": 1,
      "per_page": 12,
      \"q": "*",\
      \"query_by": "app.id"\
    }
  ]
}

And this was the response:

{
  "results": [
    {
      \"found": 26100,\
      \"found_docs": 76048,\
      "grouped_hits": [...], // The length of this was 12, as the per_page parameter and the group key was the app.id
      "out_of": 76048,
      "page": 1,
      "request_params": {
        "collection_name": "games",
        "first_q": "*",
        "per_page": 12,
        "q": "*"
      },
      "search_cutoff": false,
      "search_time_ms": 123
    }
  ]
}

Indicating that it worked as expected. Even on this page you can see the grouping done in a similar way, and it behaved as expected. Could you post both the requests and responses in each site (Typesense Cloud / your frontend) to see if there's any differences between them? Thanks in advance

coatezy commented 4 months ago

So I checked the request and response generated by my ReactNative app.

Request

curl '{HOSTNAME}:443/multi_search?x-typesense-api-key={API_KEY}' \
  -H 'Content-Type: text/plain' \
  -H 'Accept: application/json, text/plain, */*' \
  --data-raw '{"searches":[{"query_by":"course.title","include_fields":"course.title,title,keywords","group_by":"course.title","group_limit":1,"per_page":12,"highlight_full_fields":"course.title","collection":"lessons_development_1721316941","q":"*","page":1}]}' \
  --compressed ;

Response

{
  "results": [
    {
      "facet_counts": [],
      "found": 4,
      "found_docs": 985
    }
  ],
  "facet_counts": [],
  "found": 4,
  "found_docs": 985,
  "grouped_hits": [...], // The length of this was 4
  "out_of": 985,
  "page": 1,
  "request_params": {
    "collection_name": "lesson_development_1721316940",
    "first_q": "*",
    "per_page": 12,
    "q": "*"
  },
  "search_cutoff": false,
  "search_time_ms": 0
}

I then replaced the request body with the body from my original example and all expected groups were returned, so I knew it had to be something in the request. I spotted the request body posted in the original issue message (generated from Typesense Cloud console) contained "sort_by":"_text_match:desc" whereas the request body generated by the app did not, so I added "sort_by":"_text_match:desc" so the curl request and boom, 41 found. I then explicitly added the sort_by to the adaptor config and all worked as expected. 🤯

I've not explore this but I wonder if it is related to not having a default_sorting_field set.

tharropoulos commented 4 months ago

Happy you found a working solution! Is your issue resolved after this?

coatezy commented 4 months ago

I think this is resolved in the context of this package. I'd like to understand why there is the need to set sort_by to ensure the results are returned, but I think that is a Typesense specific question. Thanks for checking this out, @tharropoulos!