jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
https://jkjung-avt.github.io/
MIT License
1.75k stars 547 forks source link

Remove Non Max Suppression #555

Closed wadhwasahil closed 2 years ago

wadhwasahil commented 2 years ago

I see that NMS is the output node in the SSD model. However I want to do NMS myself and get all the outputs from the model. Is there a way to do that while converting the frozen graph to tensorrt format?

https://github.com/jkjung-avt/tensorrt_demos/blob/a061e44a82e1ca097f57e5a32f20daf5bebe7ade/ssd/build_engine.py#L176

NMS defined as output node https://github.com/jkjung-avt/tensorrt_demos/blob/a061e44a82e1ca097f57e5a32f20daf5bebe7ade/ssd/build_engine.py#L284

jkjung-avt commented 2 years ago

You could try to use graphsergeon to remove the NMS node at the end. For example, add the code here.

    if 'NMS' not in [node.name for node in graph.graph_outputs]:
        graph.remove(graph.graph_outputs, remove_exclusive_dependencies=False)
        if 'NMS' not in [node.name for node in graph.graph_outputs]:
            # We expect 'NMS' to be one of the outputs
            raise RuntimeError('bad graph_outputs')
+   graph.remove('NMS', remove_exclusive_dependencies=False)
    ......

The outputs of the graph should become: "Squeeze", "concat_priorbox" and "concat_box_conf". Please verify that and modify this line of code as well.

https://github.com/jkjung-avt/tensorrt_demos/blob/a061e44a82e1ca097f57e5a32f20daf5bebe7ade/ssd/build_engine.py#L284

wadhwasahil commented 2 years ago

tested this - it works. However you do you know how to get back bounding box coordinates with these three outputs?

jkjung-avt commented 2 years ago

You basically need to do what the nmsPlugin does. Source code could be found in NVIDIA's TensorRT OSS repo:

https://github.com/NVIDIA/TensorRT/blob/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/nmsPlugin/nmsPlugin.cpp#L228 https://github.com/NVIDIA/TensorRT/blob/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/common/kernels/detectionForward.cu#L20

wadhwasahil commented 2 years ago

where can I find "bbox.utils" in https://github.com/NVIDIA/TensorRT/blob/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/common/kernels/detectionForward.cu#L18 and "kernel.h"? I need to implement these in python

jkjung-avt commented 2 years ago

https://github.com/NVIDIA/TensorRT/blob/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/common/bboxUtils.h https://github.com/NVIDIA/TensorRT/blob/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/common/kernel.h

jkjung-avt commented 2 years ago

You might be able to leverage the implementation in TensorFlow Object Detection API:

https://github.com/tensorflow/models/blob/c626177d6ba65a211b8f791d612cdcf8b9c0fe7e/research/object_detection/core/post_processing.py#L422

wadhwasahil commented 2 years ago

what codeType is used in the conversion? https://github.com/NVIDIA/TensorRT/tree/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/nmsPlugin

jkjung-avt commented 2 years ago

I don't understand your question...

The nmsPlugin code is not developed by me. You could raise your question in NVIDIA's repo instead.

wadhwasahil commented 2 years ago

So my question is there is a codeType parameter which can be passed when creating the plugin as you did https://github.com/jkjung-avt/tensorrt_demos/blob/a061e44a82e1ca097f57e5a32f20daf5bebe7ade/ssd/build_engine.py#L176

If I have "Squeeze", "concat_priorbox" and "concat_box_conf" as outputs, then there is an encoding and decoding formula to convert the locations to image coordinates. So that's why I asked is there a default value of the parameter codeType that you think is used. Even if you are not aware, that's fine as well. You have created such a great repo anyway.

wadhwasahil commented 2 years ago

And one more thing. Although the conversion is correct I get some out-of-place/incorrect outputs from tensorrt model without NMS.

The output of concat_box_conf is something like this

3.7920928 -5.1384053 -5.2293253 -7.214224   5.2867723 -6.4359317
 -6.9501925 -7.243856   4.9889793 -6.1531906 -6.5420723 -6.1558976
  4.79943   -5.6793633 -6.02685   -7.6264396  5.0429716 -6.0365148
 -6.494671  -7.0263824  5.7787313 -6.908192  -7.093956  -7.3831034
  5.0442924 -5.8505297 -5.887159

which seems incorrect.

The output of concat _priorbox is [ 0.025 -0.025 0.125 0.075] [-0.06642136 -0.04571068 0.21642137 0.09571068] [ 0.00428932 -0.11642136 0.14571068 0.16642137] [ 0.075 -0.025 0.175 0.075] which seems correct.

The output of Squeeze aka locations predicted by the model are

[ 0.62961006, -1.0679286 , -1.3024708 , -4.3457003 , -0.07474239,
        0.6752152 , -1.2724931 , -0.9880088 ,  2.0700245 , -0.21972644,
        1.4357893 , -2.8759375 , -0.05146412,  0.13889286, -3.513672  ,
        0.29653752,  0.3436456 ,  1.3412868 , -2.6850836 ,  0.38701284,
        0.549531  ,  0.20000173, -0.53817743,  0.09094931, -0.68967026,
        0.69363225, -3.950452  ,  0.63995445,  0.31779078]

which also seems incorrect.

jkjung-avt commented 2 years ago

The default NMS codeType in the source code is CodeTypeSSD::TF_CENTER. So I think that's what you should use.

https://github.com/NVIDIA/TensorRT/blob/052281f0ab795b6c1a19047dc8a449cd397995a9/plugin/nmsPlugin/nmsPlugin.cpp#L714

jkjung-avt commented 2 years ago

The output of concat_box_conf is something like this

I don't have time to look into this. But please at least check whether you'll need to apply softmax() on those values first.

The output of Squeeze aka locations predicted by the model are

I'm not sure whether those values are wrong. On top of my head, you need to offset priorBox coordinates with these values to get the detection bbox coordinates. Please study them by yourself. I don't think I have time to look into this further.

wadhwasahil commented 2 years ago

Sure no worries. If you get time later, please update your approach in this thread

jkjung-avt commented 2 years ago

Closing, since I'm not going to track this issue.