yanyiwu / cppjieba

"结巴"中文分词的C++版本
MIT License
2.58k stars 691 forks source link

如何自定义切分方式(正则表达式)? #160

Closed mikami-yua closed 3 weeks ago

mikami-yua commented 2 years ago

请问怎么使用正则表达式定义一组规则。我希望能从文本文件中将url地址完整的切分出来,但是在默认情况下却分离了域名和.com。 例如:有没有什么方式能获得这样的token:“baidu.com" 而不是"baidu"和”com“

github-actions[bot] commented 1 month ago

This issue has not been updated for over 1 year and will be marked as stale. If the issue still exists, please comment or update the issue, otherwise it will be closed after 7 days.

github-actions[bot] commented 3 weeks ago

This issue has been automatically closed due to inactivity. If the issue still exists, please reopen it.