Open Jiangtao976 opened 11 months ago
{ "settings":{ "number_of_shards":3, "number_of_replicas":1, "default_pipeline":"biz_timestamp_pipeline", "analysis":{ "analyzer":{ "pinyin_analyzer":{ "tokenizer":"my_pinyin" } }, "tokenizer":{ "my_pinyin":{ "type":"pinyin", "keep_separate_first_letter":true, "keep_full_pinyin":true, "keep_joined_full_pinyin":false, "keep_original":true, "limit_first_letter_length":16, "lowercase":true, "remove_duplicated_term":true, "ignore_pinyin_offset":false } } } }, "mappings":{ "properties":{ "vendorName":{ "type":"text", "analyzer":"pinyin_analyzer", "search_analyzer":"pinyin_analyzer", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } } } } }
示例一: 中文:刘德华阿里巴巴 分词结果: { "tokens": [ { "token": "l", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "liu", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "刘德华阿里巴巴", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 }, { "token": "ldhalbb", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 }, { "token": "d", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "de", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "h", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "hua", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "li", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "b", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 }, { "token": "ba", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 } ] }
查询: { "query": { "match_phrase": { "vendorName": { "query": "ldha" } } } }
可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。
示例二: 中文:深圳健安医药有限公司 { "tokens": [ { "token": "s", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "shen", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "深圳健安医药有限公司", "start_offset": 0, "end_offset": 10, "type": "word", "position": 0 }, { "token": "szjayyyxgs", "start_offset": 0, "end_offset": 10, "type": "word", "position": 0 }, { "token": "z", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "zhen", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "j", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "jian", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "an", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "y", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "yi", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "yao", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 }, { "token": "you", "start_offset": 6, "end_offset": 7, "type": "word", "position": 6 }, { "token": "x", "start_offset": 7, "end_offset": 8, "type": "word", "position": 7 }, { "token": "xian", "start_offset": 7, "end_offset": 8, "type": "word", "position": 7 }, { "token": "g", "start_offset": 8, "end_offset": 9, "type": "word", "position": 8 }, { "token": "gong", "start_offset": 8, "end_offset": 9, "type": "word", "position": 8 }, { "token": "si", "start_offset": 9, "end_offset": 10, "type": "word", "position": 9 } ] }
查询: { "query": { "match_phrase": { "vendorName": { "query": "szja" } } } }
可以看到分词结果中包含了首字母szja,但查询不到结果,"安"的首字母a,感觉是受到,"健"(jian)字中的a影响查不到。
其它中文,例如:深圳恩,使用sze同样查询不到,恩的首字母e 受到深(shen)字中的e影响查不到。
我调了很多参数都无法解决这个问题,有大佬救救我吗
查询: { "query": { "match_phrase": { "vendorName": { "query": "ldha" } } } } 可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。
分词结果并没有把 ldha 分成一个词,所以匹配不上, 你换成 liudehua 就可以查了
{ "settings":{ "number_of_shards":3, "number_of_replicas":1, "default_pipeline":"biz_timestamp_pipeline", "analysis":{ "analyzer":{ "pinyin_analyzer":{ "tokenizer":"my_pinyin" } }, "tokenizer":{ "my_pinyin":{ "type":"pinyin", "keep_separate_first_letter":true, "keep_full_pinyin":true, "keep_joined_full_pinyin":false, "keep_original":true, "limit_first_letter_length":16, "lowercase":true, "remove_duplicated_term":true, "ignore_pinyin_offset":false } } } }, "mappings":{ "properties":{ "vendorName":{ "type":"text", "analyzer":"pinyin_analyzer", "search_analyzer":"pinyin_analyzer", "fields":{ "keyword":{ "type":"keyword", "ignore_above":256 } } } } } }
示例一: 中文:刘德华阿里巴巴 分词结果: { "tokens": [ { "token": "l", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "liu", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "刘德华阿里巴巴", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 }, { "token": "ldhalbb", "start_offset": 0, "end_offset": 7, "type": "word", "position": 0 }, { "token": "d", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "de", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "h", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "hua", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "li", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "b", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 }, { "token": "ba", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 } ] }
查询: { "query": { "match_phrase": { "vendorName": { "query": "ldha" } } } }
可以看到分词结果中包含了首字母ldha,但查询不到结果,"阿"的首字母a,感觉是受到,"华"(hua)字中的a影响查不到。
示例二: 中文:深圳健安医药有限公司 { "tokens": [ { "token": "s", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "shen", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "深圳健安医药有限公司", "start_offset": 0, "end_offset": 10, "type": "word", "position": 0 }, { "token": "szjayyyxgs", "start_offset": 0, "end_offset": 10, "type": "word", "position": 0 }, { "token": "z", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "zhen", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "j", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "jian", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "an", "start_offset": 3, "end_offset": 4, "type": "word", "position": 3 }, { "token": "y", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "yi", "start_offset": 4, "end_offset": 5, "type": "word", "position": 4 }, { "token": "yao", "start_offset": 5, "end_offset": 6, "type": "word", "position": 5 }, { "token": "you", "start_offset": 6, "end_offset": 7, "type": "word", "position": 6 }, { "token": "x", "start_offset": 7, "end_offset": 8, "type": "word", "position": 7 }, { "token": "xian", "start_offset": 7, "end_offset": 8, "type": "word", "position": 7 }, { "token": "g", "start_offset": 8, "end_offset": 9, "type": "word", "position": 8 }, { "token": "gong", "start_offset": 8, "end_offset": 9, "type": "word", "position": 8 }, { "token": "si", "start_offset": 9, "end_offset": 10, "type": "word", "position": 9 } ] }
查询: { "query": { "match_phrase": { "vendorName": { "query": "szja" } } } }
可以看到分词结果中包含了首字母szja,但查询不到结果,"安"的首字母a,感觉是受到,"健"(jian)字中的a影响查不到。
其它中文,例如:深圳恩,使用sze同样查询不到,恩的首字母e 受到深(shen)字中的e影响查不到。
我调了很多参数都无法解决这个问题,有大佬救救我吗