shibing624 / pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
https://www.mulanai.com/product/corrector/
Apache License 2.0
5.61k stars 1.1k forks source link

纠正OCR误识别的结果 #419

Closed lwppwl closed 8 months ago

lwppwl commented 1 year ago

作者你好,请问一下,pycorrector库能否对OCR的误识别的字进行纠错呢,比如0和O,B和8这种,哪种模型效果会比较好呢?谢谢。

shibing624 commented 1 year ago

本库是从语言模型角度做文本纠错的,理论上可以处理数字和字母,但一般这种错误是长尾的,默认是关闭数字 字母纠错。

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动,机器人自动关闭此问题,如果需要欢迎提问)

xunhang1007 commented 8 months ago

作者你好,请问如何开启数字、字母纠错?

shibing624 commented 8 months ago

1.数字纠错不支持; 2.字母纠错,可以调用enspell开启英文字母纠错。见:

https://github.com/shibing624/pycorrector?tab=readme-ov-file#%E8%8B%B1%E6%96%87%E6%8B%BC%E5%86%99%E7%BA%A0%E9%94%99