Zero output on YOLOv4 TensorRT (errors when working)

TheEverglow commented 1 year ago

Hello! I upgraded my hardware configuration and faced with extremely strange problem that I can not expain.

The used TensorRT worked properly, but after updating the system (including part of hardware) and, accordingly, a new build of the YOLOv4 model, which occurs absolutely without errors, I get zero output at the output of the model on absolutely the same data in the same project.

A little prerequisites about tested systems. I have two systems with which I was able to test this problem and make any guesses about its causes.

Common programs and tools:

Windows 11
WSL2 (Ubuntu 20.04.1, 20.04.4, 20.04.5)
Docker Desktop
Nvidia Driver 517.48 (latest version)
CUDA 11.0, 11.1.1; CUDNN 8.0.5, 8.1.1
TensorRT (20.12, container; 7.1.3.4 inside makefile as in repository), python, YOLOv4 (crowdhuman or another)
Absolutely the same data for test

I used files from your repository in the same view as they are. yolo_with_plugins.py, yolo_to_onnx.py, onnx_to_tensorrt.py, Makefile is still the same.I didn't even change versions or other lines

1st configuration (notebook MSI Pulse) CPU : i5-12700H RAM : DDR4 Motherboard : MSI (LGA1700) System : Windows 11 GPU : RTX 3060 6GB (notebook version)

Everything works, I was able to use Python, TensorRT, build the model and output result comes out, I see bounding boxes and etc.

====================================================================================

2d configuration CPU : i5-6400 RAM : DDR4 Motherboard : Asrock H270 Pro4 (LGA 1151) System : Windows 10 GPU : GTX 1060 6GB

Everything works, I was able to use Python, TensorRT, build the model and output result comes out, I see bounding boxes and etc.

====================================================================================

I upgraded 2d configuration and now this is 3d configuration: CPU : i5-12700 RAM : DDR5 Motherboard : MSI Z690 Carbon (LGA1700) System : Windows 10/11 (both were tested) GPU : GTX 1060 6GB

I was able build the model with TensorRT without any errors, app is running without errors of TensorRT, but zero output comes out, no bounding boxes.

I assumed a variety of possible causes and tried the following:

use both Windows systems in this configuration
different versions of docker
different versions of ubuntu
reinstalling CUDA
construction of different models (all give a zero result of the model)
I've also tried upgrading TensorRT from 20.12 to 21.12 for example, but there are already errors when building the YOLOv4-crowdhuman model or another that I use, so here I don't know how to check to rule out that a higher version of tensorrt solves or doesn't solve the problem.

The most probable reason for this “effect” is some kind of strange internal work (incompatibility) of the bundle of CUDA, the GPU and the new chipset Z690 (and, perhaps, but I hope that this is unlikely, a new type of memory of DDR5), because this is exactly what this It is a difference of 2 and 3 of the configurations described by me.

$ ldd libyolo_layer.so

linux-vdso.so.1 (0x00007ffe0707d000)
libnvinfer.so.7 => /usr/lib/x86_64-linux-gnu/libnvinfer.so.7 (0x00007f2a569a8000)
libcudart.so.11.0 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f2a56723000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2a56542000)
libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2a56527000)
libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f2a56335000)
libcudnn.so.8 => /usr/lib/x86_64-linux-gnu/libcudnn.so.8 (0x00007f2a5610c000)
libmyelin.so.1 => /usr/lib/x86_64-linux-gnu/libmyelin.so.1 (0x00007f2a5588a000)
libnvrtc.so.11.1 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libnvrtc.so.11.1 (0x00007f2a536bf000)
librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007f2a536b4000)
libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2a536ae000)
libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f2a5355f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2a7c117000)
libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2a5353c000)
libcublas.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcublas.so.11 (0x00007f2a4b11e000)
libcublasLt.so.11 => /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007f2a3d12a000)

Build the ONNX model

Checking ONNX model...
Saving ONNX file...
Done.

$ python3 onnx_to_tensorrt.py -c 2 -m yolov4-crowdhuman-416x416_1060_test_new/yolov4-crowdhuman-416x416

Loading the ONNX file...
Adding yolo_layer plugins...
Building an engine.  This would take a while...
(Use "--verbose" or "-v" to enable verbose logging.)
[TensorRT] WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
Completed creating engine.
Serialized the TensorRT engine to file: yolov4-crowdhuman-416x416_1060_test_new/yolov4-crowdhuman-416x416.trt

Here the warning about Half2 support does not affect this problem because as I described it works with the same warning in the 2nd configuration

And I just get results on each frame of video as following (it's batch, so don't worry about output):

bboxes
 [array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32)]

bboxes
 [array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32)]

bboxes
 [array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32)]

bboxes
 [array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32)]

bboxes
 [array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32)]

bboxes
 [array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32), array([], dtype=float32)]

I tried to print output tensor of TensorRT result and it has 0. in cells of output matrix.

I hope you can help with a suggestion for this case or a possible solution to the problem, maybe I can try something else that I haven't tried.

TheEverglow commented 1 year ago

I tried to use last version of your code and change onnx version from 1.4.1 to 1.9.0 that is installed inside nvidia-docker-tensorrt as you mentioned in one of commits. It works now in configuration 3 and it is still strange. In configuration 1 and 2 it works with older code's version.

However, there are bugs that stop the program that never happened before. For example:

free(): invalid next size (fast)
Aborted

or

corrupted size vs. prev_size
Aborted

or

Segmentation fault

If you have any suggestions for this, please share them.

TheEverglow commented 1 year ago

All this time I was deleting only one libyolo_layer.so file, not knowing that during model building another yolo_layer.o file was created, which I thought was standard and not specific to the system. As a result, detection works both on the old version of the code and on the new one, with the exception that the errors described above are produced from the new version of the code.

If you have any comments about the occurrence of these errors on the latest version of the code, then I will be glad to hear them. Thank you!

jkjung-avt / tensorrt_demos

Zero output on YOLOv4 TensorRT (errors when working) #583