marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.39k stars 344 forks source link

Inference video does not have an inference box #441

Open JasonChao-yue opened 10 months ago

JasonChao-yue commented 10 months ago

When deploying YOLOv5, the deployment was successful, but the target was not identified in the video during video inference. Is it a configuration error that caused the displayed video to not be boxed

marcoslucianops commented 10 months ago

Are there any bbox on the detection? Are you testing with the deepstream-app?

JasonChao-yue commented 10 months ago

Are there any bbox on the detection? Are you testing with the deepstream-app?

Thank you for your reply! I used deepstream-app. There was no bbox when I detected it, but the gpu occupancy rate was very high, indicating that I was reasoning. At the same time, I also noticed that I changed the engine, but it was all the same phenomenon.

marcoslucianops commented 10 months ago

Can you send the output from the terminal when you run the DeepStream?

ghost commented 10 months ago

Hi @marcoslucianops. I can confirm that I'm facing the same issue for YOLOv8. The model inference is happening but there are no bounding boxes (detections) drawn on the output in Deepstream 6.0.1. Strangely, 6.3 seems to work fine, though at this moment due to driver version restrictions. I need DS 6.0.1. Any leads would be appreciated.

JasonChao-yue commented 10 months ago

Can you send the output from the terminal when you run the DeepStream? I can send the output from the terminal when run the DeepStream The environment I am using is: Xavier nx ubuntu18.04 Deepstream 6.0.1 Yolov5 The terminal output result is: (ceshi) nvidia@nvidia-desktop:~/Downloads/DeepStream-Yolo-master$ deepstream-app -c deepstream_app_config.txt

Using winsys: x11 WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. p0:00:04.446720746 21283 0x24f7a630 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() [UID = 1]: deserialized trt engine from :/home/nvidia/Downloads/DeepStream-Yolo-master/model_b1_gpu0_fp32.engine INFO: [Implicit Engine Info]: layers num: 5 0 INPUT kFLOAT images 3x640x640
1 OUTPUT kFLOAT onnx::Sigmoid_347 3x80x80x85
2 OUTPUT kFLOAT onnx::Sigmoid_404 3x40x40x85
3 OUTPUT kFLOAT onnx::Sigmoid_460 3x20x20x85
4 OUTPUT kFLOAT output 25200x85

0:00:04.447148140 21283 0x24f7a630 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() [UID = 1]: Use deserialized engine model: /home/nvidia/Downloads/DeepStream-Yolo-master/model_b1_gpu0_fp32.engine 0:00:04.744482165 21283 0x24f7a630 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/home/nvidia/Downloads/DeepStream-Yolo-master/config_infer_primary_yoloV5.txt sucessfully

Runtime commands: h: Print this help q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source. To go back to the tiled display, right-click anywhere on the window.

** INFO: : Pipeline ready

Opening in BLOCKING MODE NvMMLiteOpen : Block : BlockType = 261 NVMEDIA: Reading vendor.tegra.display-size : status: 6 NvMMLiteBlockCreate : Block : BlockType = 261 ** INFO: : Pipeline running

PERF: FPS 0 (Avg)
PERF: 25.61 (24.62)
** INFO: : Pipeline paused

r ** INFO: : Pipeline running

** INFO: : Received EOS. Exiting ...

Quitting App run successful

marcoslucianops commented 10 months ago

You are export the ONNX model in wrong way. Please follow: https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/YOLOv5.md

The output of your model is incorrect.

JasonChao-yue commented 10 months ago

You are export the ONNX model in wrong way. Please follow: https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/YOLOv5.md

The output of your model is incorrect.

I did export it through the export method in the document and it has been displayed successfully. The command used is: python3 export_ YOLOV5. py - w YOLOv5s. pt -- dynamic, which location did you use to determine the error in exporting ONNX?

JasonChao-yue commented 10 months ago

You are export the ONNX model in wrong way. Please follow: https://github.com/marcoslucianops/DeepStream-Yolo/blob/master/docs/YOLOv5.md

The output of your model is incorrect. @marcoslucianops @linga-abhishek

The problem has been resolved. It is indeed a problem with the generated ONNX model. The problem occurred when using the YOLOv5 5.0 environment to generate ONNX, but was resolved by changing it to YOLOv5 6.0

HeeebsInc commented 10 months ago

@JasonChao-yue how did you find what version you needed? I am running into the same issue - is there a PyTorch method I can use to see what version I trained the model with? This is the first time that a version is causing issues, onnx + deepstream 6.2 works but does not work for 6.3 with yolov5

marcoslucianops commented 10 months ago

Use the master branch to convert all YOLOv5 versions.

HeeebsInc commented 10 months ago

I'll try that out. Thanks Marco!

HeeebsInc commented 10 months ago

@marcoslucianops i switched from v7 back to master but I am still not getting detections with a custom yolov5 model. The onnx export and engine build is successful, but during inference there are no detections at all (almost like everything is suppressed).

marcoslucianops commented 10 months ago

@HeeebsInc can you send the output from the terminal?

HeeebsInc commented 10 months ago

@marcoslucianops

WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: Deserialize engine failed because file path: best_ap-1-fp16.engine open error
0:00:03.809266374    54      0x4445af0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1976> [UID = 1]: deserialize engine from file : best_ap-1-fp16.engine failed
0:00:03.872457154    54      0x4445af0 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2081> [UID = 1]: deserialize backend context from engine from file :best_ap-1-fp16.engine failed, try rebuild
0:00:03.873130954    54      0x4445af0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: [TRT]: onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

Building complete

0:02:14.251159294    54      0x4445af0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2034> [UID = 1]: serialize cuda engine to file: /ze/data/Experiments/TEST/TE-563/specs/186-best_ap-1-fp16.engine successfully
WARNING: [TRT]: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x544x960       
1   OUTPUT kFLOAT boxes           32130x4         
2   OUTPUT kFLOAT scores          32130x1         
3   OUTPUT kFLOAT classes         32130x1 

steps for what i did

1) python3 python3 export_yoloV5.py -w {self.model_path} -s {self.height} {self.width} 2) pgie

[property]
gpu-id = 0
model-color-format = 0
labelfile-path = labels.txt
process-mode = 1
num-detected-classes = 3
interval = 0
batch-size = 1
gie-unique-id = 1
maintain-aspect-ratio = 0
network-mode = 2
workspace-size = 6500
cluster-mode = 2
network-type = 0
force-implicit-batch-dim = 1
infer-dims = 3;544;960
net-scale-factor = 0.0039215697906911373
onnx-file = best_ap.onnx
model-engine-file = best_ap-1-fp16.engine
parse-bbox-func-name = NvDsInferParseYolo
custom-lib-path = /opt/nvidia/deepstream/deepstream-6.3/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name = NvDsInferYoloCudaEngineGet
symmetric-padding = 0

[class-attrs-0]
post-cluster-threshold = 0.1
detected-min-w = 1
detected-min-h = 1
nms-iou-threshold = 0.1
topk = 1000
pre-cluster-threshold = 0.1

[class-attrs-1]
post-cluster-threshold = 0.1
detected-min-w = 1
detected-min-h = 1
nms-iou-threshold = 0.1
topk = 1000
pre-cluster-threshold = 0.1

[class-attrs-2]
post-cluster-threshold = 0.1
marcoslucianops commented 10 months ago

@HeeebsInc the model output seems correct. Did you train you model in 960x544 size?

HeeebsInc commented 10 months ago

@marcoslucianops After running another test I believe there may be an issue with the new deepstream 6.3 implementation. Below is a config that I used for both 6.2 and 6.3 (the only difference being the path of custom-lib-path=/opt/nvidia/deepstream/deepstream-6.3/sources/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

When running through deepstream 6.2, the model gets detections while nothing it detected in deepstream 6.2. They both use the same .onnx file but a different .engine file.

marcoslucianops commented 10 months ago

I will do some tests later today. Can you send me your model in the email?

marcoslucianops commented 10 months ago

Are you using Jetson or x86 platform?

HeeebsInc commented 10 months ago

x86 - 3090 with Driver Version 525.125.06 + Cuda 12.1

I cannot send the exact model but I can prepare one that should be identical. It will likely be next week though.

I think a good first attempt would be to export a yolov5l model at 544 960 python3 export.py -s 544 960 --weights yolov5l.pt and that should show the same behavior. I will try that on my end as well.

marcoslucianops commented 10 months ago

The export.py from YOLOv5 repo doesn't work with the DeepStream-Yolo, You should use the export_yoloV5.py from DeepStream-Yolo/utils. Are you training your model with image normalization in the pre-processing?

HeeebsInc commented 10 months ago

The export.py from YOLOv5 repo doesn't work with the DeepStream-Yolo, You should use the export_yoloV5.py from DeepStream-Yolo/utils. Are you training your model with image normalization in the pre-processing?

Apologies that was a typo on my end. I am using export_yolov5.py from your repo (not ultralytics). I will confirm our normalization technique. I believe we did not change anything from the traditional ultralytics implementation when it comes to pre processing.

marcoslucianops commented 10 months ago

Got it. The ultralytics implementation doesn't uses normalization (mean and standard deviation values of [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225] for example).

marcoslucianops commented 10 months ago

If you are using normalization, need to change some parameters in the config_infer file to run the inference.

HeeebsInc commented 10 months ago

@marcoslucianops confirmed with our team that we have not messed with normalization, so it's the same as default ultralytics. We're you able to reproduce the issue?

marcoslucianops commented 10 months ago

What's your YOLOv5 version?

marcoslucianops commented 10 months ago

If possible, share the steps to reproduce this issue (with YOLOv5 version and which branch of YOLOv5 repo you are using).

HeeebsInc commented 10 months ago

1) Pull master branch of this repository 2) pull yolov5l.pt weights 3) Pull master branch of ultralytics, and run pip install onnx onnxruntime 4) convert yolov5l.pt weights to yolov5.onnx, with a batch size of 1 and input resolution of 544x960 (hxw) 5) use the config I dropped above, modifying the paths to what you generated above

marcoslucianops commented 10 months ago

DeepStream 6.3

image

image

HeeebsInc commented 10 months ago

@marcoslucianops thanks for checking. Did you use the same pgie.txt config I dropped above? If not can you drop the config you used?

Again, really appreciate your help with this.

marcoslucianops commented 10 months ago

Yes, it's the same, I just changed the paths, the network-mode to 0, and workspace-size to 2000 (due to my GPU limitation). But I recommend you use the config_infer_primary_yoloV5.txt file as sample because there are some parameters like maintain-aspect-ratio and symmetric-padding that are used in the YOLOv5, and the correct nms-iou-threshold placement (in the [class-attrs-all] property).

HeeebsInc commented 10 months ago

@marcoslucianops I believe the issue I was facing is resolved and was caused by a library we have that didnt fully migrate to deepstream 6.3. After modifying the build script, I have deepstream 6.3 working but not with your repository. When trying to run any other model, I am able to inference, however when running this library I am hit with this error

nvbufsurftransform:cuInit failed : 100

What's weird is that upon starting a docker container, if I run your library first, it will cause every subsequent library to fail and forces me to create a new container. On the other hand, if I run the other libraries first, I do not face an issue but still cannot run your library. I have done some research, and I believe its do to how the GPU is loading software, see here https://stackoverflow.com/questions/53369652/how-to-remove-cuinit-failed-unknown-error-in-cuda-pycuda

I have tried purging my machine of nvidia drivers thinking it was my cuda version with no luck. I have 12.1 cuda installed on host, along with driver 530

do you have any advice? I am able to fun faster-rcnn, older yolo models, but still cannot run yolov5.

EDIT:

After doing some digging, I am starting to think it's the pyds python bindings. I successfully ran the deepstream-app -c with the same config and did not get errors. I tried changing the cuda memory version of nvvideoconvert + streammux within our pyds pipeline from 3 (unified) to 0 (default) but it didnt help. @marcoslucianops I will continue to dig, but have you tested this code with python_deepstream_apps bindings?

marcoslucianops commented 9 months ago

I can't reproduce this issue. I'm able to run all YOLO modes in docker and outside the docker.

marcoslucianops commented 9 months ago

Yes, I tested the code with Python and C/C++ in my projects.

marcoslucianops commented 9 months ago

Create a new python environment and install only the pyds to check. Test with the deepstream-test applications.

HeeebsInc commented 9 months ago

@marcoslucianops wanted to give an update. I was able to resolve the issue and it seems it was caused by something within the ultralytics repository. I converted your export_yolov5.py script to act as a module, so instead of calling python3 export_yolov5.py from the command line, I called it directly within our code from export_yolov5 import main. Because your export script imports from ultralytics, there is something within their module that is reserving GPU indirectly, so by the time i go to start the pipeline I hit a segmentation fault. Completely removing any reference to yolov5_export within our code fixed the issue. This was new to us as before we called the export script via a os.system() call which forks a new process.

Issue is resolved but I will post here when I find the culprit within ultralytics as im sure others could face the same issue

If you would like to reproduce the issue, add the following to the very top of your python pipeline and you will for sure hit the segmentation fault nvbufsurftransform:cuInit failed : 100 (which is very misleading as this error code means there are no GPUs attached, which is not the case..).

from export_yoloV5 import main
main(
        weights = weights_path,
        size = [height, width],
        dynamic=False
)
#this will call the export script from within the main thread that will eventually start deepstream, the output onnx file is a dummy file and does not need to be used
marcoslucianops commented 9 months ago

I didn't understand. The steps should be:

After those steps, the DeepStream will use only the generated engine to do the inference. The ultralytics repo is needed only one time to export the ONNX model.

As far I understood, you were importing the ultralytics modules or the export_yoloV5 in the DeepStream code? Is it?

HeeebsInc commented 9 months ago

Correct, we followed those steps but the only difference is that we did those steps within our application, so we imported export_yoloV5 in the deepstream code, and because export_yoloV5 also imports Detect from ultralytics its also grabbing implicit imports from their code base. so if you had a python file that did the following you would hit a sementation fault 1) import your export script 2) convert .pt -> onnx 3) start deepstream

Its not a bug with your repository, but im sure im not the only one that is creating a pipeline to dynamically export files within an application rather than expect the onnx files to be present at runtime. running python3 export_yoloV5.py is manual, and I only see a use case for production deployment (not simulated testing).
on the other hand if you did 1) export .pt --> onnx via the command line (python3 export_yoloV5.py) 2) run deepstream it would work, but again this is a manual process

HeeebsInc commented 9 months ago

I will dig up the piece within the ultralytics repository thats causing this. If you dig around their codebase, you can see they call os.environ[] quite a bit so im sure its something like that, or they are using a torch operation thats reserving GPU within scope. The fix will be commenting something out but once I find what that is ill let you know in case its useful for your documentation.

Appreciate the help with this as always

marcoslucianops commented 9 months ago

You can use the exported ONNX model for all of your projects. Don't need to export everytime. It will be easier for deployment and production.