Create release branches

HeeebsInc commented 1 year ago

I want to start by saying thank you for all your hard work with this code repo. Its clear it was no easy feat and its been a life saver for me personally.

Wanted to reach out to ask if it would be possible to create release branches as you update the code base. For example, the new change to onnx will be much better long term, but we are seeing issues with inference performance compared with the previous release. This makes it difficult to create releases if you overwrite the old with the new without keeping a copy of the old. I remember the same thing happened with deepstream 5.

Is there a specific reason you dont use releases? I know you are only a single person so I dont want to create new work for you, but all I ask is if there is a new version you create a new tag/branch while keeping the old release intact.

The issue we are seeing with onnx is that the same model does not get the same detections. For example, we have a test video that we run our models through for QA. When running our main model though the video using the old version, we get 3 detections. Running the same model (with the same threshold + pgie variables) through the same video with this current onnx version, we get 0 detections. I confirmed that the model is making predictions using other videos but the results are not the same. At first I thought this was a configuration issue with the new version, but running it through "easier" videos we confirmed that we are able to get good detections with the new version. This non-deterministic behavior makes it tough for us to sign off on deployment

I also want to extend a hand and offer support wherever you need.

marcoslucianops commented 1 year ago

I didn't make branches because it's better to me to maintain only one branch updated with the news.

About the ONNX, in my tests, the mAP was equal to the wts and cfg conversion with the same FPS. The ONNX have support for convert any model easily to the DeepStream without need to create every layer to convert to wts.

About your issue, can you send more information about this error? Screenshots and the config_inter_primary files?

marcoslucianops commented 1 year ago

Added support for DeepStream 5.1

HeeebsInc commented 1 year ago

I didn't make branches because it's better to me to maintain only one branch updated with the news.

About the ONNX, in my tests, the mAP was equal to the wts and cfg conversion with the same FPS. The ONNX have support for convert any model easily to the DeepStream without need to create every layer to convert to wts.

About your issue, can you send more information about this error? Screenshots and the config_inter_primary files?

its not an error but rather a detection difference. For example, using the same model, this new version gets a different number of detections in one of our test videos compared to older Deepstream-Yolo.

marcoslucianops commented 1 year ago

Can you send the config_infer_primary files and output images from both detections?

HeeebsInc commented 1 year ago

New Release

[property]
gpu-id = 0
model-color-format = 0
labelfile-path = labels.txt
uff-input-blob-name = input_image
process-mode = 1
num-detected-classes = 2
interval = 0
batch-size = 1
gie-unique-id = 1
is-classifier = 0
maintain-aspect-ratio = 1
network-mode = 2
workspace-size = 9000
net-scale-factor = 0.003921569790691
cluster-mode = 2
offsets = 0;0;0
infer-dims = 3;544;960
onnx-file = 256-best_ap.onnx
model-engine-file = model-epoch1.fp16.engine
parse-bbox-func-name = NvDsInferParseYolo
custom-lib-path = /opt/nvidia/deepstream/deepstream-6.2/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
symmetric-padding = 1

[class-attrs-0]
post-cluster-threshold = 0.5

[class-attrs-1]
post-cluster-threshold = 0.5

Old Release

[property]
gpu-id = 0
model-color-format = 0
labelfile-path = labels.txt
uff-input-blob-name = input_image
process-mode = 1
num-detected-classes = 2
interval = 0
batch-size = 1
gie-unique-id = 1
is-classifier = 0
maintain-aspect-ratio = 1
network-mode = 2
workspace-size = 6500
cluster-mode = 2
offsets = 0;0;0
force-implicit-batch-dim = 1
infer-dims = 3;544;960
net-scale-factor = 0.003921569790691137
custom-network-config = 256-best_ap.cfg
model-file = 256-best_ap.wts
model-engine-file = model-epoch1.fp16.engine
parse-bbox-func-name = NvDsInferParseYolo
custom-lib-path = /opt/nvidia/deepstream/deepstream-6.2/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name = NvDsInferYoloCudaEngineGet
symmetric-padding = 1

[class-attrs-0]
post-cluster-threshold = 0.5

[class-attrs-1]
post-cluster-threshold = 0.5

For privacy purposes, I cannot share the detection images. Our team is working on getting an example that we can share but it's been difficult because the new release detections almost always match up with the old release detections. The reason I bring this up as an issue is because there is 1 detection in our test set that we missed in the new version, but got in the old version. Although its a single detection, its critical for our use case. I know its a needle in a haystack type of issue but inference should be deterministic regardless if its 1 or 1000 detection difference.

marcoslucianops commented 1 year ago

Your config_infer_primary files are wrong. One of the problems is the net-scale-factor. It should be the same for both models (depends on the normalization of the model input used in the training). In the repo, there are the config_infer_primary files for each model with the correct parameters for the default normalization used in each model (you should use the value used in the file related to your model if you didn't change this parameter during the training of the model). Another problem is: the wts and cfg files should have the model name in the start of the file (see the old documentation in this repo).

Assuming you are using YOLOv8 model, please use:

For the ONNX model

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=256-best_ap.onnx
model-engine-file=256-best_ap.onnx_b1_gpu0_fp16.engine
labelfile-path=labels.txt
batch-size=1
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.5
topk=300

[class-attrs-0]
pre-cluster-threshold=0.5

[class-attrs-1]
pre-cluster-threshold=0.5

For the wts and cfg model

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=yolov8_256-best_ap.cfg
model-file=yolov8_256-best_ap.wts
model-engine-file=model_b1_gpu0_fp16.engine
labelfile-path=labels.txt
batch-size=1
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.5
topk=300

[class-attrs-0]
pre-cluster-threshold=0.5

[class-attrs-1]
pre-cluster-threshold=0.5

Please check the files (docs and config) available in this repo to use the model on DeepStream-Yolo.

HeeebsInc commented 1 year ago

okay I will try that. Having the model name not present in the cfg has never been an issue. If it was wouldn't it throw an error?

marcoslucianops commented 1 year ago

The old DeepStream-Yolo uses the filename to know which model is running and select the correct foward.

HeeebsInc commented 1 year ago

when you say old how many releases prior? Roughly

Your code works, so does it default to another forward if there is not a model name present in the file name? We have been able to build engines with no issue in yolov5 using your code while keeping the naming convention the same as the file I dropped above

marcoslucianops commented 1 year ago

Commit 68f762d and older. Check the yolo.cpp and yoloPlugins.cpp files.

HeeebsInc commented 1 year ago

Thanks for the reply and all your help. We have just restarted our simulation using the new netscale factor. I will keep you updated on our results when comparing this new release to the old release.

HeeebsInc commented 1 year ago

@marcoslucianops after heavy digging I found that it may not have to do with your new code but rather the deepstream version (specifically TensorRT). See my conversation here

https://forums.developer.nvidia.com/t/detections-change-in-deepstream-6-2/257280/7

From your perspective, were there many changes that could have affected inference performance in 6.1? I think the discrepancy between deepstreams is due to TensorRT, but I can know for sure since your code utilizes TensorRT for engine building. I can't dig up the specific version I have of Deepstream-Yolo but I have attached it here.

I really need to update to Deepstream 6.2 but I cannot without figuring this out.. ive been down many rabbit holes and cant find a definitive answer

I have tried DS 6.1 + zip attached DS 6.2 + zip attached DS 6.2 + new repo w/ onnx all of these get different results on the same video + model. The pgie configs I used are linked in the Nvidia forum above

nvdsinfer_custom_impl_Yolo.zip

HeeebsInc commented 1 year ago

here are more results I found today

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=/model.onnx
model-engine-file=/model.engine
labelfile-path=/labels.txt
batch-size=1
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
infer-dims=3;544;960
maintain-aspect-ratio=1
symmetric-padding=1
force-implicit-batch-dim=1
workspace-size=9000
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.1/sources/nvdsinfer_custom_impl_Yolo_Onnx/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-0]
post-cluster-threshold = 0.83
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

[class-attrs-1]
post-cluster-threshold = 0.85
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
model-engine-file=/model.engine
labelfile-path=/labels.txt
batch-size=1
network-mode=2
num-detected-classes=2
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
infer-dims=3;544;960
maintain-aspect-ratio=1
force-implicit-batch-dim=1
workspace-size=9000
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=/opt/nvidia/deepstream/deepstream-6.1/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
custom-network-config = /model.cfg
model-file=/model.wts

[class-attrs-0]
post-cluster-threshold = 0.83
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

[class-attrs-1]
post-cluster-threshold = 0.85
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300

Above are sample configs I used to compare .wts and .onnx implementation. The source resolution of the video was the same inference resolution as the model so there was no resizing involved

marcoslucianops commented 1 year ago

I did the mAP tests on pycocotools and the variance was +0.01 at max from wts to ONNX. How did you calculate the TP, TN, FP and FN?

HeeebsInc commented 1 year ago

The dataset we are using is not COCO as its internal to our company. We have an internal tool that calculates TP/TN/FP/FN. In the graph above, TN and FN are not calculated the same way as COCO but TP and FP is - for the sake of conformity ignore differences in these two columns. FP is a detection that has no IOU with the ground truth while TP has an IOU with ground truth

I will post more results shortly (this time without TN and FN). What im finding is that threshold are a lot less sensitive in onnx (which is okay). We are able to get relatively the same performance by dropping threshold by .25+ when using onnx models compared to the same model generated w/ .wts

However, we still see difference results when comparing deepstream versions regardless if the model is using .wts or .onnx. For example, .onnx generated in 6.1, 6.1.1, 6.2 all have different results. Same with .wts

Do you have an idea why this is?

HeeebsInc commented 1 year ago

resultsGitHub.xlsx Here are results. I noticed there was an issue with how I ran int8 in the previous export so please use this file when analyzing int8.

Each sheet corresponds to tests ran with the same model. Sheet 135 is model 1 and 186 is model 2. All entries in sheet 186 are int8 while sheet 135 is fp16.

marcoslucianops commented 1 year ago

I will need to check about it later with more time.

HeeebsInc commented 9 months ago

@marcoslucianops bringing this back up as we are seeing this issue again. We were able to get about the same performance by lowering thresholds using .onnx, when the same model using .wts needed a higher threshold.

However, with .onnx we are seeing some edge cases where the precision and recall drop to practically 0 when performing calibration. When using the same calibration set, calibration batch size, and the same models with .wts, we did not see this drop - so the only things that changed were 1) DS 6.3 + TensorRT version (before it was DS 6.2) 2) Onnx instead of .wts (and the code changes implemented)

marcoslucianops / DeepStream-Yolo

Create release branches #358