typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
https://typesense.org
GNU General Public License v3.0
20.44k stars 629 forks source link

Synonyms aren't applied on typo-corrected variations of the query #792

Closed 527bd5 closed 2 weeks ago

527bd5 commented 1 year ago

Description

If you have a document with "trousers" and a synonym trousers <-> pants, searching for "patns" doesn't yield the expected result because synonyms aren't applied on the typo-corrected variation of the query.

Steps to reproduce

See the following Python script to reproduce.

Expected Behavior

Got 1 results for “trousers” (Should match immediately)
Got 1 results for “trouesrs” (Should match by allowing misspelled words <trouser>)
Got 1 results for “pants” (Should match with synonym trousers<->pants)
Got 1 results for “patns” (Should match misspelled <pants> synonym to <trousers>)

Actual Behavior

Got 1 results for “trousers” (Should match immediately)
Got 1 results for “trouesrs” (Should match by allowing misspelled words <trouser>)
Got 1 results for “pants” (Should match with synonym trousers<->pants)
Got 0 results for “patns” (Should match misspelled <pants> synonym to <trousers>)

Metadata

Typsense Version: 0.23.1, 0.24.0rc20, and 0.24.0rcn32

OS: Running Typesense Docker image

527bd5 commented 1 year ago

@kishorenc replied on Slack that

Regarding patns -- that's expected because we don't apply synonyms on typo corrected variations of the query. This is to prevent false positives from happening.

which makes sense, but I think it's depends a lot on the context and exposing this as a flag would make sense IMO!

527bd5 commented 11 months ago

Bumping because I think it would make a lot of sense to support this! Synonyms are widely used and I think users expect typo tolerance there as well. I think false-positives should anyway be covered by typo_tokens_threshold?

alexeymaksakov-tomtom commented 6 months ago

Have a similar issue, and applying a synonym together with spelling correction is a valid case in application I'm building

DmitryKan-TomTom commented 6 months ago

hi @kishorenc this is quite a critical issue for geo-search use case -- do you see a way of supporting it?

kishorenc commented 6 months ago

We will look into this in a few days to identify scope of work. I will update this thread.

alexeymaksakov-tomtom commented 6 months ago

Just to clarify - it is not necessary that spelling correction and synonyms are applied on the same token, but they should be working together on different tokens at least. A specific example to illustrate:

  1. Record with a field value of 'Kelly Bridge Road' is indexed
  2. A synonym is registered {'synonym_road', {"synonyms": ["road", "rd"]}}

When user searches for 'Kelly Bridge Rd', 'Road' token is matched and highlighted, system behaves as expected When user searches for 'Kally Bridge Rd', 'Road' token is not matched and not highlighted (which causes additional issue of result list being overwhelmed with records like 'Someother bridge rd')

DmitryKan-TomTom commented 5 months ago

hey @kishorenc just wanted to see if you had a chance to evaluation this request? Thanks.

DmitryKan-TomTom commented 4 months ago

Hey Team, any update on this ticket?

kishorenc commented 4 months ago

We recently added support for this in a RC build. Try adding the following flags to 27.0.rc8

synonym_num_typos: 2
synonym_prefix: true

This enables both prefix searching and typos on synonyms respectively. Please try it out and let us know how it works for your use case.