infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 548 forks source link

怎么在分词后保留"c++软件工程师"中“+”号在结果中,为什么拼音分词器会过滤掉符号呢 #291

Open Maskvvv opened 1 year ago

Maskvvv commented 1 year ago
GET /_analyze
{
  "tokenizer": "keyword", 
  "filter": [
    {
      "type": "pinyin",
      "keep_original": false,
      "keep_first_letter": false,
      "keep_full_pinyin": true,
      "none_chinese_pinyin_tokeniz": true,
      "ignore_pinyin_offset": false
    }
  ],
  "text": [
    "c++软件工程师"
  ]
}

结果

{
  "tokens" : [
    {
      "token" : "c",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "c",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "ruan",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "jian",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "gong",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "cheng",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "shi",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 6
    }
  ]
}