Inference speed Fast - Githubissues

jingsongliujing / OnnxOCR

基于PaddleOCR重构，并且脱离PaddlePaddle深度学习训练框架的轻量级OCR，推理速度超快

Apache License 2.0

651 stars 62 forks source link

Inference speed Fast #9

Closed guanfuchen closed 1 month ago

guanfuchen commented 1 month ago

Inference speed is mainly due to the conversion of the PaddlePaddle model to the ONNX format? Or are there other modifications made to the model itself?

jingsongliujing commented 1 month ago

It's just about converting the Paddle model to the ONNX model, aligning the preprocessing, front-end processing, and post-processing with the official inference. There is a memory leak when directly using the Paddle framework for inference.

DapperZhengLong commented 1 month ago

test_ocr.py在gpu上运行比cpu慢，cpu运行0.361s，但是gpu需要花9.466s，有大佬测试一下么？是什么原因

jingsongliujing commented 1 month ago

这个问题和cuda的版本有关系，建议看一下onnxruntime-gpu和cuda的对应版本关系

---Original--- From: @.> Date: Tue, Jul 16, 2024 11:56 AM To: @.>; Cc: @.**@.>; Subject: Re: [jingsongliujing/OnnxOCR] Inference speed Fast (Issue #9)

test_ocr.py在gpu上运行比cpu慢，cpu运行0.361s，但是gpu需要花9.466s，有大佬测试一下么？是什么原因

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

DapperZhengLong commented 1 month ago

我测试cudatoolkit11.3,cudnn8.4.0,onnxruntime-gpu 1.15.0，cudatoolkit11.3,cudnn8.2.1,onnxruntime-gpu 1.14.1，cudatoolkit11.6.2,cudnn8.8.0.121,onnxruntime-gpu 1.17.0 效果均不好。作者大大用的是哪一个版本呢？

jingsongliujing commented 1 month ago

看一下你本地的cuda驱动版本，得一一对应：https://blog.csdn.net/qq_38308388/article/details/137679214

jingsongliujing commented 1 month ago

我测试cudatoolkit11.3,cudnn8.4.0,onnxruntime-gpu 1.15.0，cudatoolkit11.3,cudnn8.2.1,onnxruntime-gpu 1.14.1，cudatoolkit11.6.2,cudnn8.8.0.121,onnxruntime-gpu 1.17.0 效果均不好。作者大大用的是哪一个版本呢？

还有，你使用onnxruntime-gpu的时候得把pip uninstall onnxruntime 卸载，两个包不能同时存在

RobertLiu0905 commented 1 month ago

It's just about converting the Paddle model to the ONNX model, aligning the preprocessing, front-end processing, and post-processing with the official inference. There is a memory leak when directly using the Paddle framework for inference.它只是将 Paddle 模型转换为 ONNX 模型，使预处理、前端处理和后处理与官方推理保持一致。直接使用 Paddle 框架进行推理时存在内存泄漏。

您好，我现在把test_ocr.py的逻辑封装成了一个fastapi接口，每次执行完之后执行清理缓冲，但是显存还是会往上涨，有什么其他办法吗？ if torch.cuda.is_available(): torch.cuda.empty_cache()