typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
https://typesense.org
GNU General Public License v3.0
20.36k stars 626 forks source link

Should synonyms work with filtering? #268

Open ctorgalson opened 3 years ago

ctorgalson commented 3 years ago

Description

We were looking at synonyms (specifically one-way synonyms) as a way to improve a client's search. The search uses filter_by in order to only return exact matches. But there's a special case of search where synonyms would be convenient.

Testing from the cli, I find that a one-way synonym works exactly as expected when using an ordinary search, but not at all when using filter_by for the search.

Am I seeing:

Steps to reproduce

Returns synonyms in results

http://typesense/collections/vehicles/documents/search
  ?q=volkswagen golf
  &query_by=vehicle
  &include_fields=make,model,year,price

Doesn't return any synonyms in results

http://typesense/collections/vehicles/documents/search
  ?q=*
  &query_by=vehicle
  &filter_by=vehicle:=[volkswagen golf]
  &include_fields=make,model,year,price

Expected Behavior

I don't know. I can't tell from the docs if what I'm trying is expected/unexpected/desired/undesired :grinning:

Actual Behavior

No synonyms in results when using a filter_by query.

Metadata

Typsense Version: 0.20.0

OS: any

kishorenc commented 3 years ago

@ctorgalson Synonyms are not used when resolving the filter query. I'm open to adding support for this, if there is a strong business case. Synonyms currently work only with query string because the query string is a user input, while filter_by queries are constructed by the application, so there is not much scope for ambiguity.

ctorgalson commented 3 years ago

Thanks @kishorenc. I guess we may not have a strong business case--especially since we probably have another approach to this, but here's the reasoning for why we looked at this as an option:


The alternate approach is to index the make + model as string[] like this:

['Volkswagen Golf', 'Volkswagen Golf SV',],
['Volkswagen Golf', 'Volkswagen Golf GTI',],
['Volkswagen Golf',],

That way, all three results exactly match volkswagen golf, but only one each matches volkswagen golf sv and volkswagen golf gti. This approach works well (we're using this strategy already in a related case), but it shifts the load onto the indexing routine since we need to know/check that an item has synonyms when that item is indexed (i.e. instead of maintaining a small list of synonyms separately).

But again, this isn't insurmountable.

jsutaria commented 1 year ago

+1