zincsearch / zincsearch

ZincSearch . A lightweight alternative to elasticsearch that requires minimal resources, written in Go.
https://zincsearch-docs.zinc.dev
Other
17.01k stars 740 forks source link

Fuzzy + match not working as expected #256

Closed ayush71994 closed 2 years ago

ayush71994 commented 2 years ago

Community Note

Tell us about your request What do you want to see in Zinc?

Running a match query with fuzziness as 'AUTO' or very High value gives me the same result for edit distance of 1 or 2

Which service(s) is this relate to? This could be GUI or API

API - es//_search

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard or not doable? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Expectation Running a match query with fuzziness as 'AUTO' or very High value gives me the same result with different scores possibly, for edit distance of 1 or 2.

Current Behaviour Getting different results with misspellings

Steps to Repro

Running the following queries

Query 1 with no misspellings. Getting non-zero result on my dataset which contains the word 'Mirae' in the name

{
    "query":
    {
        "match":
        {
            "nameData.shortName":
            {
                "query": "Mirae",
                "fuzziness": "AUTO"
            }
        }
    },
    "from": 0,
    "size": 20
}

Query 2 with spelling mistake of 1 edit distance. Getting zero result in that case ( Mirae -> Mirea ) even (Mirae -> Misae) is not working

{
    "query":
    {
        "match":
        {
            "nameData.shortName":
            {
                "query": "Mirea",
                "fuzziness": "AUTO"
            }
        }
    },
    "from": 0,
    "size": 20
}

Tried with fuzziness 100 as well getting zero results with misspelling. Was expecting to get the same number of results with possibly different values for 'score'

Are you currently working around this issue? How are you currently solving this problem?

No workaround for the time being

hengfeiyang commented 2 years ago

Yes, it's not support for now, "fuzziness": "AUTO" is marked TODO.

hengfeiyang commented 2 years ago

@ayush71994

For AUTO:
its basically 1 edit distance for strings less than length 6 and 2 edit for greater than 6
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness

Thanks, i will create a bug-fix let it work in next release.

ayush71994 commented 2 years ago

Workaround for the time being as suggested by @hengfeiyang, is to use numeric value in 'fuzziness' instead of 'AUTO'.