Exporting to ONNX, including the NMS

joihn commented 2 years ago

Search before asking

[X] I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

When exporting to Onnyx or TensoRT, one still need to manually rewrite the NMS (non maxima supression) for the target platform, or is there an easier solution ?

(my target plateform can't have pytorch, so I'm currently trying to rewrite the NMS in numpy )

Additional

No response

glenn-jocher commented 2 years ago

Try https://github.com/onnx/onnx/blob/main/docs/Operators.md#nonmaxsuppression

joihn commented 2 years ago

Thanks for the link ! Another option would be to directly include the NMS in the Onnx model like in this repo: https://github.com/zhiqwang/yolov5-rt-stack

What do you think about this solution ?

glenn-jocher commented 2 years ago

requires torch

zhiqwang commented 2 years ago

Hi @joihn and @glenn-jocher I filed an issue about this solution before https://github.com/ultralytics/yolov5/issues/6430 .

glenn-jocher commented 2 years ago

@zhiqwang hi! It looks like OP wants a non-pytorch solution for ONNX or TRT. What do you think is the right approach?

Currently only TF models can be exported with --nms flag: https://github.com/ultralytics/yolov5/blob/71685cbf91a9f60eb2f9c46ced8fa7becf6813d9/export.py#L566

It looks like if we could figure out how to add https://github.com/onnx/onnx/blob/main/docs/Operators.md#nonmaxsuppression to ONNX models (using same --nms flag) then TRT models might inherit this since our TRT exports are based on previous ONNX exported models.

zhiqwang commented 2 years ago

Hi @glenn-jocher ,

It looks like if we could figure out how to add https://github.com/onnx/onnx/blob/main/docs/Operators.md#nonmaxsuppression to ONNX models (using same --nms flag) then TRT models might inherit this since our TRT exports are based on previous ONNX exported models.

Seems that TensorRT didn't admit the NonMaxSuppression provided by ONNX, that's not possible to use the ONNX's NonMaxSuppression to do the inference on TensorRT (even if we concat the ONNX model with the NonMaxSuppression op).

Instead TensorRT provides a Plugin interface to allow applications to provide implementations of operations that TensorRT does not support natively. TensorRT ships with a library of plugins, and source for many of these and some additional plugins can be found here. Specially, we can use the batchedNMSPlugin or efficientNMSPlugin with YOLOv5 (BTW we found that efficientNMSPlugin is much faster than batchedNMSPlugin in our tests on real devices).

glenn-jocher commented 2 years ago

@zhiqwang ah interesting!

zhiqwang commented 2 years ago

Hi @glenn-jocher , I guess the essential reason here should be that the batched_nms operator (in PyTorch/TorchVision) contains some dynamics and control flow, which ONNX admits but TensorRT doesn't.

joihn commented 2 years ago

requires torch

Hum that's surprising, I thought yolov5-rt-stack could package everything in the Onnx file/TensorRT model (including the NMS), and then Torch is not needed anymore for inference
Could you elaborate why it still needs torch ?

zhiqwang commented 2 years ago

Could you elaborate why it still needs torch ?

Hi @joihn , I guess what Glenn means is that the following function relies on the torch. (PyTorch is relatively easy to install on the python platform after all.)

https://github.com/ultralytics/yolov5/blob/fa569cdae52dfd3074561129c3a5185bded60b16/models/common.py#L279

joihn commented 2 years ago

All good, I managed achieve a pytorch-less inference, include a GPU powered non maxima supression. For future reference: I converted the yolov5 to yolov5-rt-stack, then converted to onnx. The onnx file contains both the letterbox preprocessing (padding to square) and non maxima suppression.

@glenn-jocher @zhiqwang Thanks for your help, and your amazing respective repo ! :D

Sayyam-Jain commented 2 years ago

All good, I managed achieve a pytorch-less inference, include a GPU powered non maxima supression. For future reference: I converted the yolov5 to yolov5-rt-stack, then converted to onnx. The onnx file contains both the letterbox preprocessing (padding to square) and non maxima suppression.

@glenn-jocher @zhiqwang Thanks for your help, and your amazing respective repo ! :D

Can you please elaborate a bit, yolov5-rt-stack still uses PyTorch. It provides Onnx and TensortRT backends. Which format did you convert your trained model to, and how did you convert to ONXX whose inference requires no pytorch? Pleae help.

THanks

zhiqwang commented 2 years ago

Hi @Sayyam-Jain ,

The official yolov5 here and custom yolov5-rt-stack here both use PyTorch for data binding. Install PyTorch Python is reletively simple, you can also check PyTorch's blog about deploying YOLOv5 on Jetson Nano https://pytorch.org/blog/running-pytorch-models-on-jetson-nano/ .

You can use pycuda as an alternative if you don't want to install pytorch. Check TensorRT samples here. https://github.com/NVIDIA/TensorRT/blob/f4a8635/samples/python/detectron2/infer.py#L114

But the C++ interface of yolort (yolov5-rt-stack) didn't need PyTorch, you can see this docs https://github.com/zhiqwang/yolov5-rt-stack/tree/main/deployment/tensorrt for more details.

See https://github.com/zhiqwang/yolov5-rt-stack/discussions/381#discussioncomment-2546007 for more details.

joihn commented 2 years ago

Can you please elaborate a bit, yolov5-rt-stack still uses PyTorch. It provides Onnx and TensortRT backends. Which format did you convert your trained model to, and how did you convert to ONXX whose inference requires no pytorch? Pleae help.

THanks

I used YoloRT to get an ONNX version (model.onnx ) and tensorRT version (model.engine). Then for inference in tensorRT, I wrote some custom code to load my image using numpy and openCV. Since for quite special reasons, my existing codebase can't have pytorch, to do the data binding (aka transfers between CPU and GPU), I used Pycuda, exactly as @zhiqwang mentionned.

The example he mentionned is a very helpful one :)

Sayyam-Jain commented 2 years ago

Can you please elaborate a bit, yolov5-rt-stack still uses PyTorch. It provides Onnx and TensortRT backends. Which format did you convert your trained model to, and how did you convert to ONXX whose inference requires no pytorch? Pleae help. THanks

I used YoloRT to get an ONNX version (model.onnx ) and tensorRT version (model.engine). Then for inference in tensorRT, I wrote some custom code to load my image using numpy and openCV. Since for quite special reasons, my existing codebase can't have pytorch, to do the data binding (aka transfers between CPU and GPU), I used Pycuda, exactly as @zhiqwang mentionned.

The example he mentionned is a very helpful one :)

Thank you for your help. I was previously using YoloV4 using https://github.com/Tianxiaomo/pytorch-YOLOv4 repo. Here, the author was able to run trt engine without using pytorch or detectron. I was hoping, if same is achievable in YoloV5.

The python example mentioned by @zhiqwang uses detectron, but I want to use only cv2 or pycuda, is it possible?

Thanks again for your help

zhiqwang commented 2 years ago

The python example mentioned by @zhiqwang uses detectron, but I want to use only cv2 or pycuda, is it possible?

Hi, @Sayyam-Jain , That's possible, you can just replace the pre-processing and date-binding using the method in the detectron2 example, and the rest of the code doesn't need to be modified.

ultralytics / yolov5