infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.94k stars 547 forks source link

汉字转拼音时,避免拼音被拆分为多个token不生效 #301

Open idawwei opened 1 month ago

idawwei commented 1 month ago

Description

测试123EDF,避免拼音拆分多个token,期望效果“ceshi123EDF”

A description of what the bug is. 出现问题:数字被拆分,EDF被拆分,拆分成ce,shi

Steps to reproduce

索引设置: PUT /my_index { "settings": { "analysis": { "analyzer": { "pinyin_analyzer": { "tokenizer": "my_pinyin_tokenizer" } }, "tokenizer": { "my_pinyin_tokenizer": { "type": "pinyin", "keep_first_letter": false, "keep_separate_first_letter": false, "keep_full_pinyin": true, "limit_first_letter_length": 16, "lowercase": true, "none_chinese_pinyin_tokenize": true } } } } }

分词测试: GET /my_index/_analyze { "analyzer": "pinyin_analyzer", "text": "理财123EDF" }

Environment