infinilabs / analysis-ik

🚌 The IK Analysis plugin integrates Lucene IK analyzer into Elasticsearch and OpenSearch, support customized dictionary.
Apache License 2.0
16.48k stars 3.27k forks source link

不同版本的分词器ik_max_word分词结果不同 #1025

Open Doodlera opened 11 months ago

Doodlera commented 11 months ago

A版本:7.14.2 B版本:7.6.2

问题描述: 索引配置均相同,请求语句相同,分词结果不同,求助! 加了filter,最小长度为2 { "analyzer":"ik_max_word_analyzer", "text": "试驾体验阿维塔11,舒适与智能奠定阿维塔销量!" }

A版本结果: { "tokens" : [ { "token" : "体验", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "11", "start_offset" : 7, "end_offset" : 9, "type" : "LETTER", "position" : 6 }, { "token" : "舒适", "start_offset" : 10, "end_offset" : 12, "type" : "CN_WORD", "position" : 7 }, { "token" : "智能", "start_offset" : 13, "end_offset" : 15, "type" : "CN_WORD", "position" : 9 }, { "token" : "奠定", "start_offset" : 15, "end_offset" : 17, "type" : "CN_WORD", "position" : 10 }, { "token" : "销量", "start_offset" : 20, "end_offset" : 22, "type" : "CN_WORD", "position" : 14 } ] }

B版本结果: { "tokens" : [ { "token" : "试驾体验", "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 0 }, { "token" : "试驾", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "体验", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "阿维", "start_offset" : 4, "end_offset" : 6, "type" : "CN_WORD", "position" : 5 }, { "token" : "维塔", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 7 }, { "token" : "11", "start_offset" : 7, "end_offset" : 9, "type" : "LETTER", "position" : 9 }, { "token" : "舒适", "start_offset" : 10, "end_offset" : 12, "type" : "CN_WORD", "position" : 10 }, { "token" : "智能", "start_offset" : 13, "end_offset" : 15, "type" : "CN_WORD", "position" : 11 }, { "token" : "奠定", "start_offset" : 15, "end_offset" : 17, "type" : "CN_WORD", "position" : 14 }, { "token" : "阿维", "start_offset" : 17, "end_offset" : 19, "type" : "CN_WORD", "position" : 16 }, { "token" : "维塔", "start_offset" : 18, "end_offset" : 20, "type" : "CN_WORD", "position" : 18 }, { "token" : "销量", "start_offset" : 20, "end_offset" : 22, "type" : "CN_WORD", "position" : 20 } ] }