typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
https://typesense.org
GNU General Public License v3.0
21.34k stars 661 forks source link

facet options in results from typesense api vs examples using instantsearch.js #1378

Open leonh opened 1 year ago

leonh commented 1 year ago

Description

first off. This is not a bug its a request for some guidance. I am stuck trying to receive facet count values in a similar way to the
https://ecommerce-store.typesense.org/

Steps to reproduce

consider a simple example with a boolean field called "is_available_for_self_drive"

initial query

{'q': '',
 'query_by': 'name',
 'facet_by': 'build_year,is_available_for_self_drive',
 'per_page': 20,
 'filter_by': 'is_available_for_self_drive:=[true]',
'max_facet_values': 5,
 'page': 1}

this returns with results, the facet counts section is extracted here

[{'counts': [{'count': 1589, 'highlighted': '2019', 'value': '2019'},
             {'count': 1488, 'highlighted': '2020', 'value': '2020'},
             {'count': 1246, 'highlighted': '2018', 'value': '2018'},
             {'count': 1232, 'highlighted': '2022', 'value': '2022'},
             {'count': 1173, 'highlighted': '2023', 'value': '2023'}],
  'field_name': 'build_year',
  'sampled': False,
  'stats': {'avg': 2016.6811767044326,
            'max': 2024.0,
            'min': 0.0,
            'sum': 111605153.0,
            'total_values': 101}},
 {'counts': [{'count': 11221, 'highlighted': 'true', 'value': 'true'},
             {'count': 3146, 'highlighted': 'false', 'value': 'false'}],
  'field_name': 'is_available_for_self_drive',
  'sampled': False,
  'stats': {'total_values': 2}}]

this results in a new query with "is_available_for_self_drive:=[true]"

{'q': '',
 'query_by': 'name',
 'facet_by': 'build_year,is_available_for_self_drive',
 'max_facet_values': 5,
 'per_page': 20,
 'filter_by': 'is_available_for_self_drive:=[true]',
 'page': 1}

with facet counts (note only the count for is_available_for_self_drive:=[true] is in the results)

[{'counts': [{'count': 1380, 'highlighted': '2019', 'value': '2019'},
             {'count': 1272, 'highlighted': '2020', 'value': '2020'},
             {'count': 1088, 'highlighted': '2018', 'value': '2018'},
             {'count': 1021, 'highlighted': '2022', 'value': '2022'},
             {'count': 1012, 'highlighted': '2023', 'value': '2023'}],
  'field_name': 'build_year',
  'sampled': False,
  'stats': {'avg': 2017.425378144623,
            'max': 2024.0,
            'min': 0.0,
            'sum': 102166456.0,
            'total_values': 69}},
 {'counts': [{'count': 11221, 'highlighted': 'true', 'value': 'true'}],
  'field_name': 'is_available_for_self_drive',
  'sampled': False,
  'stats': {'total_values': 1}}]

Desired Behavior

is there a way to construct a query that returns the facet count for is_available_for_self_drive:=[false] when the filter is on is_available_for_self_drive:=[true]

when I look at the e-commerce example

https://ecommerce-store.typesense.org/ it appears that this is possible when there is "Filter by Brands" choosing "AT&T" does not remove the facet counts for the other brands

Is this achieved by instantsearch.js maintaining state on the frontend in some way? Can this be achieved trough the typsense api ?

Metadata

Typesense Version: 0.25

OS: linux ubuntu

igor-iki commented 1 year ago

just be attentive ) multi_search is a key.

leonh commented 1 year ago

I can see it can be solved by not faceting on the most recently selected field (which would only return a single count for the selected option) and using the previous facet counts for that field (which allows you to view the other facet count options for that field) . However this requires you to maintain state on the client side for the previous facet response . I guess multi-search with different facet_by values could solve it in some way?

jasonbosco commented 1 year ago

Is this achieved by instantsearch.js maintaining state on the frontend in some way?

This is indeed achieved by maintaining state inside of instantsearch.js.

If you look at the multi_search requests sent to Typesense when you click on any of the facets in the e-commerce demo you shared above, you'll see the exact types of requests that need to be made.

leonh commented 1 year ago

Thanks for the guidance @igor-iki and @jasonbosco . I've had a look at the search requests and responses made by the e-commerce demo. I can see there is one search required for each of the facets (which is the search with that facet excluded) , and one search with all the facets applied (that is used to update the search results). I can see each search is returning it's result set of both facet count and hits, but I'm wondering how does this pattern scale up when you have a larger numbers of facetable things and facets within these? also and a larger number of results per page? Wondering is it possible make a multisearch call that only returns the facet counts and omits sending the hits which are not required on all but one of the search requests? Even better would be to be able to define on a facet the ability to return it's counts disregarding its appearance in filter_by (then you would not require multisearch)

igor-iki commented 1 year ago

multisearch needed not only for that. you are free to use any function on your end. i.e. async callbacking if you need only facets from query - use 'per_page':0 don't forget about Cache. you can cache query natively by typesense or by own cache layer

better test on your data instead predict