Open buptcjj opened 5 years ago
我使用首字母搜索的时候发现翘舌音(z/c/s+h)会在一起导致搜索异常。 比如库中有“中华人民共和国”: curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhonghua"}}}' curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"rm"}}}' 均能正确搜索结果,但是 curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zh"}}}' curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhrm"}}}' 却不行,应该是因为z+h认为是一个字导致无法识别
我的setting和mapping分别是 setting: "index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin" } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "keep_separate_first_letter" : True, "keep_first_letter" : True, "keep_full_pinyin" : True, "keep_original" : True, "limit_first_letter_length" : 30, "lowercase" : True, "remove_duplicated_term" : False, "keep_none_chinese_in_joined_full_pinyin": True } } } }
"index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin" } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "keep_separate_first_letter" : True, "keep_first_letter" : True, "keep_full_pinyin" : True, "keep_original" : True, "limit_first_letter_length" : 30, "lowercase" : True, "remove_duplicated_term" : False, "keep_none_chinese_in_joined_full_pinyin": True } } } }
mapping: "properties": { "name": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost": 20, "fields":{ "primitive": { "type": "string", "store": "yes", "analyzer": "keyword" } } } }
"properties": { "name": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost": 20, "fields":{ "primitive": { "type": "string", "store": "yes", "analyzer": "keyword" } } } }
或者能够说一下这部分代码要在哪里改么?
修改源码中的字典文件,使得 zh不能成为首字母即可
我使用首字母搜索的时候发现翘舌音(z/c/s+h)会在一起导致搜索异常。 比如库中有“中华人民共和国”: curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhonghua"}}}' curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"rm"}}}' 均能正确搜索结果,但是 curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zh"}}}' curl -XGET 'localhost:9200/news/_search' -d '{"query":{"match_phrase":{"name":"zhrm"}}}' 却不行,应该是因为z+h认为是一个字导致无法识别
我的setting和mapping分别是 setting:
"index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin" } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "keep_separate_first_letter" : True, "keep_first_letter" : True, "keep_full_pinyin" : True, "keep_original" : True, "limit_first_letter_length" : 30, "lowercase" : True, "remove_duplicated_term" : False, "keep_none_chinese_in_joined_full_pinyin": True } } } }
mapping:
"properties": { "name": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost": 20, "fields":{ "primitive": { "type": "string", "store": "yes", "analyzer": "keyword" } } } }
或者能够说一下这部分代码要在哪里改么?