marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.46k stars 355 forks source link

Questions about gpu batched nms #192

Closed YoungjaeDev closed 2 years ago

YoungjaeDev commented 2 years ago

The cluster_mode in the config is set to 4, but did you improve the post-processing by squeezing the code yourself? In other words, it seems that you did not use clustering provided by deepstream, and you put in the code yourself Can you tell me exactly which part it is? Thank you

marcoslucianops commented 2 years ago

I added the GPU Batched NMS, so it's not needed to use the CPU NMS (cluster-mode=2) anymore. You can see the comparison here https://github.com/marcoslucianops/DeepStream-Yolo/issues/142. The cluster-mode=4 disables the clustering did by DeepStream. I changed the ouptus to fit the TensorRT BatchedNMS plugin , then created a logic to sort the outputs, and used the TensorRT createBatchedNMSPlugin function to create the NMS layer.

YoungjaeDev commented 2 years ago

@marcoslucianops

Did you have that experience in tenssort7

nemosupremo commented 2 years ago

@marcoslucianops

It seems GPU Batched NMS has a 66% performance improvement which is amazing, but a drawback here is the TensorRT engine needs to be rebuilt if the iou/score/topk changes + not being able to per-class config options.

Is it possible to support both modes (use CPU when cluster-mode=2 and GPU when cluster-mode=4)?

marcoslucianops commented 2 years ago

@youngjae-avikus, in what specifically?

@nemosupremo, you can use the per-class config (class-attrs-0, class-attrs-1, etc). The score-threshold will work as minimum score, then the pre-cluster-threshold will filter the scores according to each object (the same goes for the topk, but it should be the max topk value in the config_nms.txt file). The cluster-mode=2 only uses the nms-iou-threshold value. It's possible to change the code to use each one according to the cluster-mode but, in my opinion, it's not necessary because the improvement of GPU Batched NMS is too big.

Note: Using pre-cluster-threshold and topk in [class-attrs] section will increase the CPU usage and may decrease the performance.

YoungjaeDev commented 2 years ago

@marcoslucianops

about TensorRT BatchedNMS plugin

nemosupremo commented 2 years ago

@marcoslucianops

So if I have 3 classes, 10, 1, 3 in my config_infer_primary.txt I will have:

[class-attrs-0]
pre-cluster-threshold=0.2
nms-iou-threshold=.213
[class-attrs-1]
pre-cluster-threshold=0.4
nms-iou-threshold=.4
[class-attrs-3]
pre-cluster-threshold=0.5
nms-iou-threshold=.5

Then in my config_nms I would have to do something like:

[property]
iou-threshold=min(nms-iou-threshold)
score-threshold=min(pre-cluster-threshold)
topk=300

Correct?

marcoslucianops commented 2 years ago

@youngjae-avikus, There's the same function for TensorRT 7 (createBatchedNMSPlugin()) but it's easy to use from the plugins too.

@nemosupremo, the nms-iou-threshold only works with cluster-mode=2, which is disabled ( cluster-mode=4) due to GPU BatchedNMS. You should use only the pre-cluster-threshold key.

nemosupremo commented 2 years ago

@marcoslucianops

So with this setup, my iou-threshold is identical for every class; but my class confidence can vary as long as it is greater than the score-threshold in config_nms. Ok.

marcoslucianops commented 2 years ago

@nemosupremo, yes

YoungjaeDev commented 2 years ago

I want to activate the class agnostic nms option, can I control it from the tensorrt nms plug-in to the coded?

@marcoslucianops

marcoslucianops commented 2 years ago

@youngjae-avikus, I'm not familiar with agnostic nms, but I think you need to change the yoloLayer outputs to fit the batchedNMSPlugin input with shareLocation = false and the output shape. You probably need to change the logic to add all classes to the output bbox instead of the maxProb class.

YoungjaeDev commented 2 years ago

@marcoslucianops

Thank you. I'll try it over time Please don't close the issue for a while

adimukewar commented 2 years ago

Is topk filter for nms applied before or after the NMS GPU implementation? When I increase the topk, higher confidence bounding boxes appear. Also, total number of objects detected is same in both scenarios.

marcoslucianops commented 2 years ago

@adimukewar, the topK is applied to limit the outputs before the NMS (yoloLayer) and during the NMS (GPU Batched NMS).

marcoslucianops commented 2 years ago

New optimized NMS https://github.com/marcoslucianops/DeepStream-Yolo/issues/142