infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 548 forks source link

搜索时拼音划分异常(平翘舌音问题) #210

Open buptcjj opened 5 years ago

buptcjj commented 5 years ago

我使用首字母搜索的时候发现翘舌音(z/c/s+h)会在一起导致搜索异常。 比如库中有“中华人民共和国”: curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhonghua"}}}' curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"rm"}}}' 均能正确搜索结果,但是 curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zh"}}}' curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhrm"}}}' 却不行,应该是因为z+h认为是一个字导致无法识别

我的setting和mapping分别是 setting: "index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin" } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "keep_separate_first_letter" : True, "keep_first_letter" : True, "keep_full_pinyin" : True, "keep_original" : True, "limit_first_letter_length" : 30, "lowercase" : True, "remove_duplicated_term" : False, "keep_none_chinese_in_joined_full_pinyin": True } } } }

mapping: "properties": { "name": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost": 20, "fields":{ "primitive": { "type": "string", "store": "yes", "analyzer": "keyword" } } } }

或者能够说一下这部分代码要在哪里改么?

buptcjj commented 5 years ago

修改源码中的字典文件,使得 zh不能成为首字母即可