marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.5k stars 361 forks source link

Convert with onnx slower then wts #569

Open YaroslavPavlovich opened 2 months ago

YaroslavPavlovich commented 2 months ago

I converted my custom YOLOv5 model to .onnx using the export_yoloV5.py script. I then ran my program on a Jetson TX2 with the new version of DeepStream-Yolo, and here are the FPS results:

Sep 10 03:31:05 sts-desktop rtdts[28869]: 2024-09-10 03:31:05,103 DEBU: Fps of streams: {'stream-0': 10.0, 'stream-1': 10.66}
Sep 10 03:31:08 sts-desktop rtdts[28869]: 2024-09-10 03:31:08,103 DEBU: Fps of streams: {'stream-0': 9.33, 'stream-1': 9.33}
Sep 10 03:31:11 sts-desktop rtdts[28869]: 2024-09-10 03:31:11,105 DEBU: Fps of streams: {'stream-0': 10.66, 'stream-1': 10.66}
Sep 10 03:31:14 sts-desktop rtdts[28869]: 2024-09-10 03:31:14,106 DEBU: Fps of streams: {'stream-0': 9.66, 'stream-1': 9.33}
Sep 10 03:31:17 sts-desktop rtdts[28869]: 2024-09-10 03:31:17,107 DEBU: Fps of streams: {'stream-0': 10.0, 'stream-1': 10.33}
Sep 10 03:31:20 sts-desktop rtdts[28869]: 2024-09-10 03:31:20,113 DEBU: Fps of streams: {'stream-0': 10.65, 'stream-1': 10.64}
Sep 10 03:31:23 sts-desktop rtdts[28869]: 2024-09-10 03:31:23,114 DEBU: Fps of streams: {'stream-0': 11.33, 'stream-1': 11.0}
Sep 10 03:31:26 sts-desktop rtdts[28869]: 2024-09-10 03:31:26,115 DEBU: Fps of streams: {'stream-0': 11.0, 'stream-1': 10.66}
Sep 10 03:31:29 sts-desktop rtdts[28869]: 2024-09-10 03:31:29,119 DEBU: Fps of streams: {'stream-0': 5.99, 'stream-1': 6.33}
Sep 10 03:31:32 sts-desktop rtdts[28869]: 2024-09-10 03:31:32,119 DEBU: Fps of streams: {'stream-0': 10.67, 'stream-1': 9.67}
Sep 10 03:31:35 sts-desktop rtdts[28869]: 2024-09-10 03:31:35,121 DEBU: Fps of streams: {'stream-0': 9.99, 'stream-1': 10.66}

The FPS is unstable (I can fix that), but the main problem is that it’s slower than the previous version (10 FPS vs 12 FPS). If I use the older version with commit ab6de54, and convert using gen_wts_yoloV5.py, I get the following results:

Sep 10 03:49:09 sts-desktop rtdts[9345]: 2024-09-10 03:49:09,928 DEBU: Fps of streams: {'stream-0': 11.99, 'stream-1': 12.65}
Sep 10 03:49:12 sts-desktop rtdts[9345]: 2024-09-10 03:49:12,928 DEBU: Fps of streams: {'stream-0': 12.67, 'stream-1': 13.0}
Sep 10 03:49:15 sts-desktop rtdts[9345]: 2024-09-10 03:49:15,930 DEBU: Fps of streams: {'stream-0': 10.99, 'stream-1': 12.99}
Sep 10 03:49:18 sts-desktop rtdts[9345]: 2024-09-10 03:49:18,932 DEBU: Fps of streams: {'stream-0': 11.66, 'stream-1': 13.32}
Sep 10 03:49:21 sts-desktop rtdts[9345]: 2024-09-10 03:49:21,934 DEBU: Fps of streams: {'stream-0': 10.66, 'stream-1': 12.99}
Sep 10 03:49:24 sts-desktop rtdts[9345]: 2024-09-10 03:49:24,937 DEBU: Fps of streams: {'stream-0': 12.65, 'stream-1': 13.32}
Sep 10 03:49:27 sts-desktop rtdts[9345]: 2024-09-10 03:49:27,941 DEBU: Fps of streams: {'stream-0': 13.32, 'stream-1': 12.65}
Sep 10 03:49:30 sts-desktop rtdts[9345]: 2024-09-10 03:49:30,941 DEBU: Fps of streams: {'stream-0': 14.33, 'stream-1': 13.33}

I only changed the conversion script and the .so file. Is this performance difference normal, or have I made a mistake?

For conversion on Jetson, I used:

python3 export_yoloV5.py -w yolov5s.pt --batch 2 --simplify --opset 12

I also tried:

python3 export_yoloV5.py -w yolov5s.pt --batch 2 --opset 12

but there was no change. The Deepstream version on Jetson is 6.0.

marcoslucianops commented 1 month ago

Did you try with higher opset (17 for example) + simplify? The ONNX support is easiest for me to keep updated on the repo, that's why it's used now.