infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.94k stars 547 forks source link

如何解决同音字的问题 #288

Open vvip2u opened 1 year ago

vvip2u commented 1 year ago

问题描述

记录中有刘德华,不想搜【柳】的时候,出现刘德华被命中的情况

Action

vvip2u commented 1 year ago

/### 临时方案1

前提

需要写一个搜索针对名字的搜索功能 名字假如是:刘德华 可以根据以下其中之一进行搜索:刘德华,liu,de,hua,刘,德,华 P.S. 只支持搜索,暂时不支持高亮

具体步骤

步骤一 create index

PUT /test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "chinese_analyzer": {
          "tokenizer": "chinese_chars_tokenizer"
        },
        "pinyin_analyzer": {
          "tokenizer": "pinyin_tokenizer"
        }
      },
      "tokenizer": {
        "chinese_chars_tokenizer": {
          "type": "pinyin",
          "keep_first_letter": false,
          "keep_separate_first_letter": false,
          "keep_full_pinyin": false,
          "keep_original": false,
          "limit_first_letter_length": 50,
          "keep_separate_chinese": true,
          "lowercase": true
        },
        "pinyin_tokenizer": {
          "type": "pinyin",
          "keep_first_letter": false,
          "keep_separate_first_letter": false,
          "keep_full_pinyin": true,
          "keep_original": false,
          "limit_first_letter_length": 50,
          "keep_separate_chinese": true,
          "lowercase": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "search_analyzer": "chinese_analyzer",
        "analyzer": "pinyin_analyzer"
      }
    }
  }
}

步骤二

PUT test_index/_doc/1 { "name": "刘德华" }

步骤三

验证1
GET /test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "name": "刘"
    }
  }
}
验证2
GET /test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "name": "liu"
    }
  }
}
验证3
GET /test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "name": "刘德华"
    }
  }
}
laozhuzz commented 1 year ago

大佬, 这个文章是不是你写的? es 修改拼音分词器源码实现汉字/拼音/简拼混合搜索时同音字不匹配