jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
https://jkjung-avt.github.io/
MIT License
1.74k stars 547 forks source link

yolo to onnx with dynamic input shape #457

Closed kgksl closed 3 years ago

kgksl commented 3 years ago

I saw that TensorRT supports dynamic input shapes and thought this will allow us to use YOLO models trained with random=1 to be used with a range of network sizes with a single TensorRT engine. Still not 100% sure of the process. However it means we need to have a onnx model that supports dynamic input sizes as well, and I couldn't find if we can do this by modifying yolo_to_onnx.py. As I can see from the code, the output tensor shape for the ONNX graph needs to be defined and for YOLO models, is defined based on the network width and height read from cfg. Can we define a dynamic output shape for ONNX and is that the only thing we would need to change in this case? thank you.

jkjung-avt commented 3 years ago

I think dynamic input shape for TensorRT YOLO engines is possible. It would require quite some modification to the code in this repo, though:

  1. The YoloLayerPlugin class needs to inherit from IPluginV2DynamicExt instead of IPluginV2IOExt. Refer to source code here.

    https://github.com/jkjung-avt/tensorrt_demos/blob/9dd56b3b8d841dcfc2e5d1868f4bd785a50cbe98/plugins/yolo_layer.h#L44

  2. The plugin could not save fixed yolo_width, yolo_height, input_width and input_height. It needs to derive those values from its input tensor shape.

    https://github.com/jkjung-avt/tensorrt_demos/blob/9dd56b3b8d841dcfc2e5d1868f4bd785a50cbe98/plugins/yolo_layer.cu#L39

  3. Set proper net_h and net_w range in the TensorRT optimization profile. Note that those dimensions (H & W) should be set to -1 (dynamic) in the ONNX file.

    https://github.com/jkjung-avt/tensorrt_demos/blob/9dd56b3b8d841dcfc2e5d1868f4bd785a50cbe98/yolo/onnx_to_tensorrt.py#L146-L150

  4. At inference time, the input shape could not be dynamic. So you need to set input shapes manually for the test images.

    https://github.com/jkjung-avt/tensorrt_demos/blob/9dd56b3b8d841dcfc2e5d1868f4bd785a50cbe98/utils/yolo_with_plugins.py#L178

I don't guarantee this is an exhaustive list. But anyway, those are the things on top of my head.

kgksl commented 3 years ago

@jkjung-avt thanks alot for the quick reply.

Do we need to modify yolo_to_onnx.py code as well? or we dont need as the we perfrom ONNX to TensorRT conversion later and that defines the output shapes etc?

jkjung-avt commented 3 years ago

Do we need to modify yolo_to_onnx.py code as well? or we dont need as the we perfrom ONNX to TensorRT conversion later and that defines the output shapes etc?

As stated in my previous post, you need to set input tensor's H and W dimensions to -1 (dynamic) in the ONNX model. The source code is as shown below. (You need to set height and width to -1.)

https://github.com/jkjung-avt/tensorrt_demos/blob/9dd56b3b8d841dcfc2e5d1868f4bd785a50cbe98/yolo/yolo_to_onnx.py#L608-L610

kgksl commented 3 years ago

@jkjung-avt sorry I missed it. Thanks alot!

kgksl commented 3 years ago

do we need to modify the below lines as well? https://github.com/jkjung-avt/tensorrt_demos/blob/9dd56b3b8d841dcfc2e5d1868f4bd785a50cbe98/yolo/yolo_to_onnx.py#L979-L990

jkjung-avt commented 3 years ago

do we need to modify the below lines as well?

Most likely yes. Try setting h and w of all output tensors to -1 as well.

kgksl commented 3 years ago

thank you.

philipp-schmidt commented 3 years ago

What would we need to change for the plugin to only make the batch size dynamic, but leave all other input sizes static? @jkjung-avt

This would be pretty cool for deploying to Triton Inference Server, because it can do Dynamic Batch Scheduling, which bundles single requests up to bigger batches for higher throughput.

philipp-schmidt commented 3 years ago

I have implemented a draft for dynamic batch size plugin which works for yolov4, but still has issues with e.g. yolov4-tiny-3l.