首字母搜索，mec不能搜索木耳草

vancefantasy commented 3 years ago

索引配置

"analyzer": {
        "pinyin_analyzer": {
             "tokenizer": "my_pinyin"
        }
      }

 "tokenizer": {
       "my_pinyin": {
          "lowercase": "true",
          "keep_original": "false",
          "keep_first_letter": "true",
          "keep_separate_first_letter": "true",
          "type": "pinyin",
          "limit_first_letter_length": "64",
          "keep_full_pinyin": "true"
        }

 "properties": {
      "name": {
          "type": "keyword",
            "py": {
              "type": "text",
              "analyzer": "pinyin_analyzer",
              "search_analyzer": "pinyin_analyzer"
            } 
        }
  }

index time（木耳草）

{
"tokens": [
    {
        "token": "m",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 0
    },
    {
        "token": "mu",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 0
    },
    {
        "token": "e",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 1
    },
    {
        "token": "er",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 1
    },
    {
        "token": "c",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 2
    },
    {
        "token": "cao",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 2
    },
    {
        "token": "mec",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 2
    }
]}

search time (mec)

{
"tokens": [
    {
        "token": "me",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 0
    },
    {
        "token": "c",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 1
    },
    {
        "token": "mec",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 1
    }
] }

搜索时mec分词结果中包含me，使用phrase query检索时，检索不出来。有没有解决方案??

medcl commented 3 years ago

pinyin 如果产生多个重复的位置重叠的 term，本来就不适合 phrase 查询。换普通的查询应该是可以的，查询和索引都有分出 term：mec，应该可以查询出来的，

vancefantasy commented 3 years ago

@medcl 感谢回复。使用best_fields替换phrase后，命中范围有点大，一些不相干的结果都出来了如果指定search 的analyzer为keyword_analyzer，可以搜出来，解决了当前场景的问题，但是会引入其他问题，例如搜muer就不行了，有点难搞哦

yanjiali2020 commented 1 year ago

我用示例里的medcl3, POST /medcl3/_doc/lucy {"name":"敏感的心"} 发现搜索mingan,会搜出ming/an, min/gan都不到；但是分词里是有min, gan,搜索mg是可以的这个怎么解决 GET /medcl3/_validate/query?explain { "query": {"match": { "name.pinyin": "mingan" }} }

infinilabs / analysis-pinyin