hiroi-sora / Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
MIT License
26.05k stars 2.64k forks source link

Rapid http接口怎么提高认识率? #551

Closed peng-poso2o closed 3 months ago

peng-poso2o commented 3 months ago

Issues

Umi-OCR version 程序版本

Umi-OCR_Paddle_v2.1.1

Windows version 系统版本

win server 2022

OCR plugins Used 使用的OCR插件

RapidOCR

Reproduction steps 复现步骤

30

{"code": 100, "data": [{"box": [[358, 34], [406, 34], [406, 66], [358, 66]], "score": 0.9989183247089386, "text": "均码", "end": ""}, {"box": [[112, 68], [161, 68], [161, 99], [112, 99]], "score": 0.9993809163570404, "text": "均色", "end": "\n"}], "score": 0.9991496205329895, "time": 0.06163215637207031, "timestamp": 1717726911.2983482}

图片数字 1 丢失。

Problem screenshots or related files (optional) 问题截图或相关文件(可选)

No response

peng-poso2o commented 3 months ago

转为PNG格式就正常了!

{"code": 100, "data": [{"box": [[359, 36], [404, 36], [404, 63], [359, 63]], "score": 0.999853640794754, "text": "均码", "end": ""}, {"box": [[113, 69], [160, 69], [160, 96], [113, 96]], "score": 0.9998188614845276, "text": "均色", "end": " "}, {"box": [[372, 70], [392, 70], [392, 97], [372, 97]], "score": 0.9969452023506165, "text": "1", "end": "\n"}], "score": 0.9988725682099661, "time": 0.08299851417541504, "timestamp": 1717727592.686934}

hiroi-sora commented 3 months ago

转为PNG格式就正常了!

对,我也观察到相同图片使用png编码比jpg的准确率更高,可能是opencv解码后,jpg转位图会损失一些信息,干扰网络的特征提取。

现在 Umi 内部的图片字节流都用png编码。如果用户用 http 等外部接口的话,传入 png 图片也会好一点。

peng-poso2o commented 3 months ago

谢谢!