infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 548 forks source link

中英文混合时能否也支持下提取英文单词首字母 #292

Open hulizhen opened 1 year ago

hulizhen commented 1 year ago

目前如果是中英文混合的情况下,只能对中文取首字母,英文还是完整单词。 比如:

GET /tests/_analyze
{
  "text": "我是谁 where am i",
  "tokenizer": {
    "type": "pinyin",
    "limit_first_letter_length": 64,
    "keep_full_pinyin": false,
    "keep_first_letter": true,
    "keep_none_chinese": false,
    "keep_none_chinese_together": true,
    "keep_none_chinese_in_first_letter": true,
    "none_chinese_pinyin_tokenize": true,
    "lowercase": false,
    "keep_original": false
  }
}

这会返回 token: wsswhereami。 能否支持下返回 wsswai