hankcs / HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
https://hanlp.hankcs.com/
Apache License 2.0
33.97k stars 10.18k forks source link

中文分词(粗分)错误:New in version 3.3. #1876

Closed wencan closed 9 months ago

wencan commented 9 months ago

Describe the bug 文本:

New in version 3.3.

https://hanlp.hankcs.com/demos/tok.html?text=New+in+version+3.3.%0A%0A&coarse=true

结尾的.是一个句号。但粗分把 3.3. 放一起了。细分没这问题

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Describe the current behavior A clear and concise description of what happened.

Expected behavior A clear and concise description of what you expected to happen.

System information

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

hankcs commented 9 months ago

这是英文分词的范畴而不是中文分词的bug。建议使用英文模型,或自定义辞典。