opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.48k stars 1.74k forks source link

Can synonyms search and fuzzy search can be combined into single query? If not, how to achieve them in single query? #15378

Open JudithReshmaR opened 3 weeks ago

JudithReshmaR commented 3 weeks ago

Describe the bug

For my requirement, I wanted to achieve the fuzzy search and synonyms search in single query.

I noticed that the synonym search is considering typo issues with Fuzziness param set to AUTO. But the synonyms are not matched for the corresponding resolved term.

For example, if I search for "ice cream", it retrieves the results related to dessert, gelato and other defined synonym terms which I have configured. But if I search for "iec cream", search provides the result only with ice cream, but the defined synonyms are not considered in result.

If the term ice was resolved by Fuzziness, my expectation was to resolve the terms along with synonyms with some lower score, which is not happening currently.

If this is something that can be achieved in a different way, please let me know. I would like to explore that.

Related component

Search

To Reproduce

  1. Synonyms terms defined in csv and associated with AWS OS cluster: ice cream, gelato, frozen custard, dessert

  2. Created custom index with this associated synonyms path. { "settings": { "index": { "analysis": { "analyzer": { "my_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "my_synonym_filter"] } }, "filter": { "my_synonym_filter": { "type": "synonym", "synonyms_path": "analyzers/F**********", "updateable": true } } } } }, "mappings": { "properties": { "description": { "type": "text", "analyzer": "standard", "search_analyzer": "my_analyzer" } } } }

  3. Now searching for gelato, provides results with synonyms as expected.

  4. But with typo like 'gealto', results are retrieved only for gelato and not for the other synonyms defined. Note: Fuzziness is set to AUTO in search query. { "query": { "bool": { "should": [ { "match": { "description": { "query": "gealto", "fuzziness": "AUTO" } } } ] } } } Can someone help to resolve this issue?

Expected behavior

Expected behaviour is to return the results for ice cream, dessert even when there is typo in given input query as the fuzziness is set. If there is a better way to achieve the same, please let me know. Will explore more on it.

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

JudithReshmaR commented 3 weeks ago

Below index creation helps to achieve the above request, where the synonyms are hardcoded while creating index:

PUT /my_index_1
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "quick,fast",
            "jumps,leaps"
          ]
        },
        "phonetic_filter": {
          "type": "phonetic",
          "encoder": "double_metaphone",
          "replace": false
        }
      },
      "analyzer": {
        "synonym_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "synonym_filter",
            "phonetic_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "synonym_analyzer"
      }
    }
  }
}

Adding few data into created index

POST /my_index_1/_doc/1
{
  "content": "The quick brown fox jumps over the lazy dog"
}
POST /my_index_1/_doc/2
{
  "content": "A fast brown fox leaps over a lazy dog"
}

Search query:

GET /my_index_1/_search
{
  "query": {
    "match": {
      "content": {
        "query": "quikc",
        "fuzziness": "AUTO",
        "analyzer": "synonym_analyzer"
      }
    }
  }
}

This returns the results which includes the synonyms as well. ie, both records are returned.

But, when providing the synonyms as text file with synonyms_path, this index creation fails with below error:

PUT /my_index_3
{
  "settings": {
    "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms_path": "analyzers/F265650900",
          "updateable": true
        },
        "phonetic_filter": {
          "type": "phonetic",
          "encoder": "double_metaphone",
          "replace": false
        }
      },
      "analyzer": {
        "cus_syn_srch_analyzer": {
          "tokenizer": "standard",
          "type": "custom",
          "filter": [
            "lowercase",
            "synonym_filter",
            "phonetic_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "DOCUMENTTEXT": {
        "type": "text",
        "analyzer": "cus_syn_srch_analyzer"
      }
    }
  }
}

Error received: analyzer [cus_syn_srch_analyzer] contains filters [synonym_filter] that are not allowed to run in index time mode.