Inference speed problem even if using a high-end Hardware.

Deckard-2049 commented 7 months ago

Describe the issue

We have trained ultralytics yolov8 model on 1024*1024, 3 channel images and converted to onnx and ran that onnx in visual studio 2022 c# .net v4.8 with onnxruntime-gpu v1.16.3 and it's taking around 90 ms on A5000 GPU. We also tried different onnxruntime sessions options like : Graph Optimization Level, inter_op_num_threads, intra_op_num_threads, Execution mode (ORT_PARALLEL and ORT_SEQUENTIAL), Optimization Options (enable_mem_pattern). But still there is no difference in the inference time. So can anyone suggest if we are missing something or how we can reduce the time further even a bit?

To reproduce

Nothing to mention.

Urgency

Yes, it's urgent, Please do help.

Platform

Windows

OS Version

10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

yuslepukhin commented 7 months ago

Can you share the converted model?

Also anyone that uses C# API would benefit from reading this

Deckard-2049 commented 7 months ago

actually, we ran this converted model on rtx-4090 and our inference time was 35 ms but when we are trying to run it on rtx-A5000, we are getting somewhere around 90 ms. We are using the same version of CUDA (11.2) here for both the devices. We want to deploy this model on rtx-A5000 with the acceptable inference time of 35-40 ms. What are the possible reasons because of which we are encountering this time issue?

Also, the onnx model has nms embedded in it. We wrote a script to embed the nms inside the onnx model, I have attatched the script file alongside here for reference. adding_nms.py-20240313T133344Z-001.zip

unfortunately, i cannot share the model, our company policy restricts the sharing of proprietary models. So, we would appreciate some suggestions and solutions on our queries, would be of great help.

yuslepukhin commented 7 months ago

https://onnxruntime.ai/docs/performance/tune-performance/

microsoft / onnxruntime