infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.94k stars 547 forks source link

严重BUG:当分词内容中包含单独的A字母时,这个A字母会被分词器扔掉 #287

Open Dustone-JavaWeb opened 1 year ago

Dustone-JavaWeb commented 1 year ago

GET /_analyze { "analyzer" : "ik_smart", "text" : "我们A A制" } { "tokens": [ { "token": "我们", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, { "token": "制", "start_offset": 5, "end_offset": 6, "type": "CN_CHAR", "position": 1 } ] }

wangming31 commented 1 year ago

ik默认会加载一个停用词典stopword.dic,里面包含字母'a'(在英文中被认为是停用词),所以会被过滤掉,把ik目录下/config/stopword.dic清空就可以了