shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
https://www.mulanai.com/product/corrector/
Apache License 2.0
5.61k stars 1.1k forks source link

是否有考虑提升检测效率? #104

Closed liyang-lomo closed 4 years ago

liyang-lomo commented 5 years ago

检测效率太低了,几百字一篇文章卡半天

shibing624 commented 5 years ago

规则的方法应该不会太慢,你是用哪种方法慢?可以具体介绍一下你的本地环境配置:cpu, 多少字花费多少时间?

liyang-lomo commented 5 years ago

是的,规则的文本快很多,大概70s五千字吧,直接放html页面进去会卡死

liyang-lomo commented 5 years ago

还有一个问题就是误报率比较高,有没有什么办法降低一下误报率呢

liyang-lomo commented 5 years ago

url:https://jianshu.com corrected_sent我不要你花钱我要我出钱渠内流行一句话土豪公司才敢说 detail[['圈', '渠', 11, 12]]

corrected_sent沪公网安倍好 detail[['安备', '安倍', 3, 5], ['号', '好', 5, 6]]

像这种误报的非常多

shibing624 commented 5 years ago

会修复,排期中,目前人力有限,慢慢来,欢迎pr

shibing624 commented 4 years ago

fixed with https://github.com/shibing624/pycorrector/commit/34a5316a41eeea886f3bb2e5a2ce0e0c882f522d

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动,机器人自动关闭此问题,如果需要欢迎提问)