Closed guishilike closed 2 years ago
@guishilike I tested yolov5m trained on coco with 640x640 input_shape and with onnxruntime-gpu 1.9 and RTX 2080Ti, it was around 45 fps, on cpu I got around 5 fps. Note that I removed cout debug info before benchmark, I'll commit it a bit later. Could you provide any details about your configuration, about model, input_shape of the model, shape of images. Have you set the --gpu argument? Does your onnxruntime build support cuda? Also you can try to compile with Release flag:
cmake .. -DONNXRUNTIME_DIR=path_to_onnxruntime -DCMAKE_BUILD_TYPE=Release
I did not use cmake model: yolov5x 640x640 GPU: RTX2060 ONNXRUTIME 1.10 I did not use cmake In fact, dynamic was not used at the beginning. FPS is10. Approximately 18 FPS after using dynamic Using custom data Preprocessing time is too long
Hi @guishilike
Approximately 18 FPS after using dynamic.
Did you mean that the dynamic mechanism is more quick?
@guishilike I tested yolov5x 640x640 (trained on coco with 80 classes) with dynamic input_shape, on 1920x1080 image with a lot of objects, preprocessing takes 1 ms and postprocessing also takes 1 ms, overall I had 28 fps, on smaller images with less objects even more fps, on zidane.jpg I had 38 fps. Maybe It takes a lot of time to Resize images if they have large resolution. So what is your test images resolution? What fps did you have with official yolov5 PyTorch repo?
@zhiqwang hi, It seems that with dynamic input shape the overall fps is a bit larger at least for onnxruntime inference, probably because the image is resized to 640x640 when we use model without dynamic input shape and e.g. it's resized to 640x480 when we use dynamic input shape, in the second situation there are less output boxes before postprocessing because we use input image without paddings. Also I will glad to hear your ideas If you have another thoughts about this.
@guishilike Also you can try YOLOv5 Runtime Stack repo: zhiqwang/yolov5-rt-stack where pre- and postprocessing are embedded in the model's graph.
test image size is 4096x2160, maybe I should try to change the input shape to 2:1 instead of using dynamic. In fact, my image size is not fixed, the result may be different. use ultralytics yolov5 onnx inference on win10 RTX2060: pytorch inference pytorch inference with half precision
I tested ultralytics yolov5 onnx on a ubuntu system with a RTX3070
Use a very short time for preprocess. The intermediate conversion may be related to the machine. Half-precision has a significant impact on speed
What is the speed of inference?I try it on RTX2060 with only 10 FPS.