infinilabs / analysis-ik

🚌 The IK Analysis plugin integrates Lucene IK analyzer into Elasticsearch and OpenSearch, support customized dictionary.
Apache License 2.0
16.55k stars 3.27k forks source link

使用match_phrase匹配英文hello world问题 #882

Open oldunclez opened 3 years ago

oldunclez commented 3 years ago

一、建立index

curl -XPUT "localhost:9200/btcs-all-in-one"

二、设置mapping

curl -XPOST "localhost:9200/btcs-all-in-one/_mapping"   -H "Content-Type: application/json"   -d'
{
        "properties": {
            "message": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }
}'

三、写入docs

curl -XPOST http://localhost:9200/btcs-all-in-one/_create/1 -H 'Content-Type:application/json' -d'
{"message":"this is hello your world"}'

curl -XPOST http://localhost:9200/btcs-all-in-one/_create/2    -H "Content-Type: application/json"   -d'
{"message":"this is hello world"}'

curl -XPOST http://localhost:9200/btcs-all-in-one/_create/3    -H "Content-Type: application/json"   -d'
{"message":"this is hello and that is world"}'

curl -XPOST http://localhost:9200/btcs-all-in-one/_create/4    -H "Content-Type: application/json"   -d'
{"message":" this is hello only"}'

curl -XPOST http://localhost:9200/btcs-all-in-one/_create/5    -H "Content-Type: application/json"   -d'
{"message":" this is world only"}'

四、查询包含hello world字符的doc

curl -XGET http://localhost:9200/btcs-all-in-one/_search   -H "Content-Type: application/json"   -d'
{
  "query": {
    "match_phrase": {
      "message": "hello world"
    }
  }
}'

返回结果如下,请问什么“this is hello and that is world”会被匹配?

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.58052695,
    "hits": [
      {
        "_index": "btcs-all-in-one",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.58052695,
        "_source": {
          "message": "this is hello world"
        }
      },
      {
        "_index": "btcs-all-in-one",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.58052695,
        "_source": {
          "message": "this is hello and that is world"
        }
      }
    ]
  }
}
JustCodeZZL commented 11 months ago

"this is hello and that is world" 分词后的结果中this is and that is等会作为停用词被过滤掉,最终分词后的结果就是 hello world,所以能够匹配上就很正常了