分词单个字母的问题，比如“我”会变成'w' 和 'wo'，其实只想要'wo'

Molerni commented 4 years ago

GET /pmall_goods_v2/_analyze { "analyzer": "pinyin", "text": ["我"] } 结果 { "tokens" : [ { "token" : "wo", "start_offset" : 0, "end_offset" : 0, "type" : "word", "position" : 0 }, { "token" : "w", "start_offset" : 0, "end_offset" : 0, "type" : "word", "position" : 0 } ] }

其实我只想要'wo'那个一个，有什么办法吗？

teaGod-s commented 4 years ago

+1

jayqian commented 3 years ago

加一个官方的length token filter https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html 把分词结果里长度为1的term去掉。

ljj6218 commented 2 years ago

keep_first_letter启用此选项时，例如：刘德华> ldh，默认值：true keep_separate_first_letter启用该选项时，将保留第一个字母分开，例如：刘德华> l，d，h，默认：假的，注意：查询结果也许是太模糊，由于长期过频 limit_first_letter_length 设置first_letter结果的最大长度，默认值：16 keep_full_pinyin当启用该选项，例如：刘德华> [ liu，de，hua]，默认值：true keep_joined_full_pinyin当启用此选项时，例如：刘德华> [ liudehua]，默认值：false

shaunhurryup commented 9 months ago

keep_separate_first_letter 看起来没生效啊，如果为 false 说明不应该拆分出 l, d, h 这样的 token

Request

POST  /_analyze
{
  "tokenizer": "pinyin",
  "text": "刘德华",
  "filter": [
    {
      "type": "pinyin",
      // default to false
      "keep_separate_first_letter": false
    }
  ]
}

Response

{
  "tokens": [
    {
      "token": "liu",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 0
    },
    {
      "token": "l",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 1
    },
    {
      "token": "d",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 2
    },
    {
      "token": "h",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 3
    },
    {
      "token": "ldh",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 3
    },
    {
      "token": "de",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 4
    },
    {
      "token": "hua",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 5
    }
  ]
}

infinilabs / analysis-pinyin

分词单个字母的问题，比如“我”会变成'w' 和 'wo'，其实只想要'wo' #242

Request

Response