itsnine / yolov5-onnxruntime

YOLOv5 ONNX Runtime C++ inference code.
242 stars 58 forks source link

Inference speed #5

Closed guishilike closed 2 years ago

guishilike commented 2 years ago

What is the speed of inference?I try it on RTX2060 with only 10 FPS.

itsnine commented 2 years ago

@guishilike I tested yolov5m trained on coco with 640x640 input_shape and with onnxruntime-gpu 1.9 and RTX 2080Ti, it was around 45 fps, on cpu I got around 5 fps. Note that I removed cout debug info before benchmark, I'll commit it a bit later. Could you provide any details about your configuration, about model, input_shape of the model, shape of images. Have you set the --gpu argument? Does your onnxruntime build support cuda? Also you can try to compile with Release flag: cmake .. -DONNXRUNTIME_DIR=path_to_onnxruntime -DCMAKE_BUILD_TYPE=Release

guishilike commented 2 years ago

I did not use cmake model: yolov5x 640x640 GPU: RTX2060 ONNXRUTIME 1.10 I did not use cmake In fact, dynamic was not used at the beginning. FPS is10. Approximately 18 FPS after using dynamic Using custom data Preprocessing time is too long 1639798799(1)

zhiqwang commented 2 years ago

Hi @guishilike

Approximately 18 FPS after using dynamic.

Did you mean that the dynamic mechanism is more quick?

itsnine commented 2 years ago

@guishilike I tested yolov5x 640x640 (trained on coco with 80 classes) with dynamic input_shape, on 1920x1080 image with a lot of objects, preprocessing takes 1 ms and postprocessing also takes 1 ms, overall I had 28 fps, on smaller images with less objects even more fps, on zidane.jpg I had 38 fps. Maybe It takes a lot of time to Resize images if they have large resolution. So what is your test images resolution? What fps did you have with official yolov5 PyTorch repo?

itsnine commented 2 years ago

@zhiqwang hi, It seems that with dynamic input shape the overall fps is a bit larger at least for onnxruntime inference, probably because the image is resized to 640x640 when we use model without dynamic input shape and e.g. it's resized to 640x480 when we use dynamic input shape, in the second situation there are less output boxes before postprocessing because we use input image without paddings. Also I will glad to hear your ideas If you have another thoughts about this.

itsnine commented 2 years ago

@guishilike Also you can try YOLOv5 Runtime Stack repo: zhiqwang/yolov5-rt-stack where pre- and postprocessing are embedded in the model's graph.

guishilike commented 2 years ago

test image size is 4096x2160, maybe I should try to change the input shape to 2:1 instead of using dynamic. In fact, my image size is not fixed, the result may be different. use ultralytics yolov5 onnx inference on win10 RTX2060: image pytorch inference image pytorch inference with half precision image

I tested ultralytics yolov5 onnx on a ubuntu system with a RTX3070

image Use a very short time for preprocess. The intermediate conversion may be related to the machine. Half-precision has a significant impact on speed