jingsongliujing / OnnxOCR

基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快
Apache License 2.0
651 stars 62 forks source link

Inference speed Fast #9

Closed guanfuchen closed 1 month ago

guanfuchen commented 1 month ago

Inference speed is mainly due to the conversion of the PaddlePaddle model to the ONNX format? Or are there other modifications made to the model itself?

jingsongliujing commented 1 month ago

It's just about converting the Paddle model to the ONNX model, aligning the preprocessing, front-end processing, and post-processing with the official inference. There is a memory leak when directly using the Paddle framework for inference.

DapperZhengLong commented 1 month ago

test_ocr.py在gpu上运行比cpu慢,cpu运行0.361s,但是gpu需要花9.466s,有大佬测试一下么?是什么原因

jingsongliujing commented 1 month ago

这个问题和cuda的版本有关系,建议看一下onnxruntime-gpu和cuda的对应版本关系

---Original--- From: @.> Date: Tue, Jul 16, 2024 11:56 AM To: @.>; Cc: @.**@.>; Subject: Re: [jingsongliujing/OnnxOCR] Inference speed Fast (Issue #9)

test_ocr.py在gpu上运行比cpu慢,cpu运行0.361s,但是gpu需要花9.466s,有大佬测试一下么?是什么原因

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

DapperZhengLong commented 1 month ago

我测试cudatoolkit11.3,cudnn8.4.0,onnxruntime-gpu 1.15.0,cudatoolkit11.3,cudnn8.2.1,onnxruntime-gpu 1.14.1,cudatoolkit11.6.2,cudnn8.8.0.121,onnxruntime-gpu 1.17.0 效果均不好。作者大大用的是哪一个版本呢?

jingsongliujing commented 1 month ago

看一下你本地的cuda驱动版本,得一一对应:https://blog.csdn.net/qq_38308388/article/details/137679214

jingsongliujing commented 1 month ago

我测试cudatoolkit11.3,cudnn8.4.0,onnxruntime-gpu 1.15.0,cudatoolkit11.3,cudnn8.2.1,onnxruntime-gpu 1.14.1,cudatoolkit11.6.2,cudnn8.8.0.121,onnxruntime-gpu 1.17.0 效果均不好。作者大大用的是哪一个版本呢?

还有,你使用onnxruntime-gpu的时候得把pip uninstall onnxruntime 卸载,两个包不能同时存在

RobertLiu0905 commented 1 month ago

It's just about converting the Paddle model to the ONNX model, aligning the preprocessing, front-end processing, and post-processing with the official inference. There is a memory leak when directly using the Paddle framework for inference.它只是将 Paddle 模型转换为 ONNX 模型,使预处理、前端处理和后处理与官方推理保持一致。直接使用 Paddle 框架进行推理时存在内存泄漏。

您好,我现在把test_ocr.py的逻辑封装成了一个fastapi接口,每次执行完之后执行清理缓冲,但是显存还是会往上涨,有什么其他办法吗? if torch.cuda.is_available(): torch.cuda.empty_cache()