yanyiwu / gojieba

"结巴"中文分词的Golang版本
MIT License
2.43k stars 302 forks source link

非 hmm 时英文单词和数字被切成单个字符 #19

Closed DeanThompson closed 1 month ago

DeanThompson commented 7 years ago

问题如题所述,例子如下:

使用默认的字典,精确模式,输入和输出:

s := "最近一直用这款,这次涨价了!不过看了下生产日期是2015年的!原来是2014所以便宜!good."

// hmm = false
最近/一直/用/这/款/,/这次/涨价/了/!/不过/看/了/下/生产日期/是/2/0/1/5/年/的/!/原来/是/2/0/1/4/所以/便宜/!/g/o/o/d/

hmm = true
最近/一直/用/这款/,/这次/涨价/了/!/不过/看/了/下/生产日期/是/2015/年的/!/原来/是/2014/所以/便宜/!/good/.

使用 Python 版本没这个问题:

# hmm = True
最近/一直/用/这款/,/这次/涨价/了/!/不过/看/了/下/生产日期/是/2015/年/的/!/原来/是/2014/所以/便宜/!/good/.

# hmm = False
最近/一直/用/这/款/,/这次/涨价/了/!/不过/看/了/下/生产日期/是/2015/年/的/!/原来/是/2014/所以/便宜/!/good/.

我现在有个使用场景时,不需要 hmm,但希望英文单词和数字不会被切成单独的字符。请问该如何处理?

github-actions[bot] commented 2 months ago

This issue has not been updated for over 5 years and will be marked as stale. If the issue still exists, please comment or update the issue, otherwise it will be closed after 7 days.

github-actions[bot] commented 1 month ago

This issue has been automatically closed due to inactivity. If the issue still exists, please reopen it.