infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 549 forks source link

youxiangongsi 分词异常 #201

Open zhmfan opened 5 years ago

zhmfan commented 5 years ago
    {
        "token": "you",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 0
    },
    {
        "token": "xiang",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 1
    },
    {
        "token": "o",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 2
    },
    {
        "token": "n",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 3
    },
    {
        "token": "g",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 4
    },
    {
        "token": "si",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 5
    }
renpengben commented 5 years ago

我也遇到这个问题 周星驰简拼 zxc 被分词成一对单个字符。

shiwl0329 commented 4 years ago

我也遇到了。假设拼音特意采用空格分隔,如:ying lun mi an,通过拼音分词能分成ying lun mi an,而不是现在的ying lun mian把mi和an黏在了一块