Closed YoungjaeDev closed 2 years ago
I added the GPU Batched NMS, so it's not needed to use the CPU NMS (cluster-mode=2
) anymore. You can see the comparison here https://github.com/marcoslucianops/DeepStream-Yolo/issues/142. The cluster-mode=4
disables the clustering did by DeepStream. I changed the ouptus to fit the TensorRT BatchedNMS plugin , then created a logic to sort the outputs, and used the TensorRT createBatchedNMSPlugin function to create the NMS layer.
@marcoslucianops
Did you have that experience in tenssort7
@marcoslucianops
It seems GPU Batched NMS has a 66% performance improvement which is amazing, but a drawback here is the TensorRT engine needs to be rebuilt if the iou/score/topk changes + not being able to per-class config options.
Is it possible to support both modes (use CPU when cluster-mode=2
and GPU when cluster-mode=4
)?
@youngjae-avikus, in what specifically?
@nemosupremo, you can use the per-class config (class-attrs-0
, class-attrs-1
, etc). The score-threshold
will work as minimum score, then the pre-cluster-threshold
will filter the scores according to each object (the same goes for the topk
, but it should be the max topk
value in the config_nms.txt file). The cluster-mode=2
only uses the nms-iou-threshold
value. It's possible to change the code to use each one according to the cluster-mode
but, in my opinion, it's not necessary because the improvement of GPU Batched NMS is too big.
Note: Using pre-cluster-threshold
and topk
in [class-attrs]
section will increase the CPU usage and may decrease the performance.
@marcoslucianops
@marcoslucianops
So if I have 3 classes, 10
, 1
, 3
in my config_infer_primary.txt
I will have:
[class-attrs-0]
pre-cluster-threshold=0.2
nms-iou-threshold=.213
[class-attrs-1]
pre-cluster-threshold=0.4
nms-iou-threshold=.4
[class-attrs-3]
pre-cluster-threshold=0.5
nms-iou-threshold=.5
Then in my config_nms
I would have to do something like:
[property]
iou-threshold=min(nms-iou-threshold)
score-threshold=min(pre-cluster-threshold)
topk=300
Correct?
@youngjae-avikus, There's the same function for TensorRT 7 (createBatchedNMSPlugin()
) but it's easy to use from the plugins too.
@nemosupremo, the nms-iou-threshold
only works with cluster-mode=2
, which is disabled ( cluster-mode=4
) due to GPU BatchedNMS. You should use only the pre-cluster-threshold
key.
@marcoslucianops
So with this setup, my iou-threshold
is identical for every class; but my class confidence can vary as long as it is greater than the score-threshold
in config_nms
. Ok.
@nemosupremo, yes
I want to activate the class agnostic nms option, can I control it from the tensorrt nms plug-in to the coded?
@marcoslucianops
@youngjae-avikus, I'm not familiar with agnostic nms, but I think you need to change the yoloLayer outputs to fit the batchedNMSPlugin input with shareLocation = false
and the output shape. You probably need to change the logic to add all classes to the output bbox instead of the maxProb class.
@marcoslucianops
Thank you. I'll try it over time Please don't close the issue for a while
Is topk filter for nms applied before or after the NMS GPU implementation? When I increase the topk, higher confidence bounding boxes appear. Also, total number of objects detected is same in both scenarios.
@adimukewar, the topK is applied to limit the outputs before the NMS (yoloLayer) and during the NMS (GPU Batched NMS).
New optimized NMS https://github.com/marcoslucianops/DeepStream-Yolo/issues/142
The cluster_mode in the config is set to 4, but did you improve the post-processing by squeezing the code yourself? In other words, it seems that you did not use clustering provided by deepstream, and you put in the code yourself Can you tell me exactly which part it is? Thank you