opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.81k stars 1.83k forks source link

[BUG] Wildcard query breaking change in 2.x branch. #5515

Closed penghuo closed 1 year ago

penghuo commented 1 year ago

Describe the bug Wildcard query has breaking change in latest 2.x and main branch.

To Reproduce

PUT {{baseUrl}}/wildcard_00001
Content-Type: application/json

{
  "mappings": {
    "properties": {
      "firstname": {
        "type": "text"
      }
    }
  }
}

POST {{baseUrl}}/wildcard_00001/_doc
Content-Type: application/json

{
  "firstname": "Amber JOHnny"
}

{ "from": 0, "size": 200, "timeout": "1m", "query": { "wildcard": { "firstname": { "wildcard": "Ambe?", "boost": 1 } } }, "_source": { "includes": [ "firstname" ], "excludes": [] }, "sort": [ { "_doc": { "order": "asc" } } ] }

But no hits

{ "took": 88, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": null, "hits": [] } }


**Expected behavior**
* Test on 2.4.

{ "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "wildcard_00001", "_id": "5smC-YQBBVP7erYv_lLT", "_score": null, "_source": { "firstname": "Amber JOHnny" }, "sort": [ 0 ] } ] } }



**Plugins**
Please list all plugins currently enabled.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Host/Environment (please complete the following information):**
 - OS: macOS
 - Version: 2.x/main branch

**Additional context**
Add any other context about the problem here.
penghuo commented 1 year ago

may related to https://github.com/opensearch-project/OpenSearch/pull/5462/.

penghuo commented 1 year ago

@nknize go throught https://github.com/opensearch-project/OpenSearch/pull/5462 and I agree it is the bug fix. Correct me if I am wrong, my understanding are:

If my understanding is correct, the query I posted is a bug and not a breaking change. But the query looks natural, not sure if there are users who depend on such a query.

nknize commented 1 year ago
  • Using standard analyzer and case_insensitive set to false, the expectation is wildcard pattern should match normalized term.

Standard Analyzer uses the lowercase token filter. So all indexed text will be lowercased. Setting case_insensitve to false on the wildcard query (which is the default behavior) means the pattern "Ambe?" should not match. So the behavior here is correct. Previously it was a bug. Set case_insensitive to true here and the pattern will match.

penghuo commented 1 year ago

Thanks @nknize.

The SQL plugin will change the code accordingly.