infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 548 forks source link

keep_none_chinese_together和keep_none_chinese 都为默认值true 但是DJ音乐家中的DJ还是被拆分成了D,J #251

Open chiminh-lee opened 3 years ago

chiminh-lee commented 3 years ago

es版本 7.3.2 插件版本7.3.2

分词配置 "analyzer": { "pinyin_res": { "tokenizer": "name_pinyin" }, "pinyin_sug": { "tokenizer": "my_pinyin" } }, "tokenizer": { "name_pinyin": { "keep_joined_full_pinyin": "true", "type": "pinyin", "keep_none_chinese": "true", "keep_none_chinese_in_joined_full_pinyin": "true", "keep_original": "true", "keep_none_chinese_together": "true" },

analyze结果 { "text":"DJ音", "analyzer":"pinyin_res" }

结果 { "tokens": [ { "token": "d", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 }, { "token": "dj音", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 }, { "token": "djyin", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 }, { "token": "djy", "start_offset": 0, "end_offset": 0, "type": "word", "position": 0 }, { "token": "j", "start_offset": 0, "end_offset": 0, "type": "word", "position": 1 }, { "token": "yin", "start_offset": 0, "end_offset": 0, "type": "word", "position": 2 } ] }