jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
https://jkjung-avt.github.io/
MIT License
1.74k stars 547 forks source link

Model conversion from ONNX to TensorRT on RTX-2080? #454

Closed shivamordanny closed 3 years ago

shivamordanny commented 3 years ago

How do I modify these reference to convert onnx model to TensorRT for the Higher end GPUs like RTX 2080, a5000. Or does it only support Jetson family. Bigger questions: Does these GPUs support TensorRT?

Here are steps that I performed to convert yolo-v4 converted onnx model to tensorrt:

This is on a machine with an RTX 2080S.

I am using the following nVidia docker image for TensorRT: https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_20-09.html#rel_20-09 I am running it with the following command: docker run -it --rm --ulimit core=-1 --ipc=host --gpus=all --volume=/home/ccai/code/project:/app nvcr.io/nvidia/tensorrt:20.09-py3

I rebuilt the ldd/plugins/libyolo_layer.so library since the one checked is for arm64 with: make clean && make

I am also installing the following dependencies in the container to run the conversion utilities: apt update apt install -y protobuf-compiler libprotoc-dev pip install onnx==1.4.1 protobuf

Here is the error I am getting:

root@c1fe9394c820:/app/yolo# python3 yolo/onnx_to_tensorrt.py -c 1 -m tiny-yolov4-optical-640x480
Loading the ONNX file...
Adding yolo_layer plugins...
python3: yolo_layer.cu:413: virtual nvinfer1::IPluginV2IOExt* nvinfer1::YoloPluginCreator::createPlugin(const char*, const nvinfer1::PluginFieldCollection*): Assertion `input_multiplier == 8 || input_multiplier == 16 || input_multiplier == 32' failed.
Aborted (core dumped)

Below is the backtrace of the core file which shows the same spot in the plugins/yolo_layer.cu file, line 413.

gdb shows the following backtrace:

Core was generated by `python3 yolo/onnx_to_tensorrt.py -c 1 -m tiny-yolov4-optical-640x480'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1331615740 (LWP 755))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f133104a921 in __GI_abort () at abort.c:79
#2  0x00007f133103a48a in __assert_fail_base (fmt=0x7f13311c1750 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f12fbc33ab0 "input_multiplier == 8 || input_multiplier == 16 || input_multiplier == 32", 
    file=file@entry=0x7f12fbc33843 "yolo_layer.cu", line=line@entry=413, 
    function=function@entry=0x7f12fbc33c80 <nvinfer1::YoloPluginCreator::createPlugin(char const*, nvinfer1::PluginFieldCollection const*)::__PRETTY_FUNCTION__> "virtual nvinfer1::IPluginV2IOExt* nvinfer1::YoloPluginCreator::createPlugin(const char*, const nvinfer1::PluginFieldCollection*)") at assert.c:92
#3  0x00007f133103a502 in __GI___assert_fail (assertion=0x7f12fbc33ab0 "input_multiplier == 8 || input_multiplier == 16 || input_multiplier == 32", file=0x7f12fbc33843 "yolo_layer.cu", line=413, 
    function=0x7f12fbc33c80 <nvinfer1::YoloPluginCreator::createPlugin(char const*, nvinfer1::PluginFieldCollection const*)::__PRETTY_FUNCTION__> "virtual nvinfer1::IPluginV2IOExt* nvinfer1::YoloPluginCreator::createPlugin(const char*, const nvinfer1::PluginFieldCollection*)") at assert.c:101
#4  0x00007f12fbbdfcbf in nvinfer1::YoloPluginCreator::createPlugin(char const*, nvinfer1::PluginFieldCollection const*) () from ../plugins/libyolo_layer.so
#5  0x00007f132f9af1c6 in void pybind11::cpp_function::initialize<tensorrt::lambdas::{lambda(nvinfer1::IPluginCreator&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nvinfer1::PluginFieldCollection const*)#16} const&, nvinfer1::IPluginV2*, nvinfer1::IPluginCreator&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nvinfer1::PluginFieldCollection const*, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::sibling, char const*>(tensorrt::lambdas::{lambda(nvinfer1::IPluginCreator&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nvinfer1::PluginFieldCollection const*)#16} const&, nvinfer1::IPluginV2* (*)(nvinfer1::IPluginCreator&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nvinfer1::PluginFieldCollection const*), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, pybind11::sibling const&, char const* const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) [clone .lto_priv.2598] ()
   from /usr/lib/python3.6/dist-packages/tensorrt/tensorrt.so
#6  0x00007f132f9c3fa0 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.6/dist-packages/tensorrt/tensorrt.so
#7  0x0000000000566ddc in _PyCFunction_FastCallDict ()
#8  0x000000000050a783 in ?? ()
#9  0x000000000050c1f4 in _PyEval_EvalFrameDefault ()
#10 0x0000000000507f24 in ?? ()
#11 0x0000000000509c50 in ?? ()
#12 0x000000000050a64d in ?? ()
#13 0x000000000050c1f4 in _PyEval_EvalFrameDefault ()
#14 0x0000000000507f24 in ?? ()
#15 0x0000000000509c50 in ?? ()
#16 0x000000000050a64d in ?? ()
#17 0x000000000050c1f4 in _PyEval_EvalFrameDefault ()
#18 0x0000000000509918 in ?? ()
#19 0x000000000050a64d in ?? ()
#20 0x000000000050c1f4 in _PyEval_EvalFrameDefault ()
#21 0x0000000000507f24 in ?? ()
#22 0x000000000050b053 in PyEval_EvalCode ()
#23 0x0000000000634dd2 in ?? ()
#24 0x0000000000634e87 in PyRun_FileExFlags ()
#25 0x000000000063863f in PyRun_SimpleFileExFlags ()
#26 0x00000000006391e1 in Py_Main ()
#27 0x00000000004b0dc0 in main ()
jkjung-avt commented 3 years ago

RTX 2080 and RTX A5000 should both work. You could refer to README_x86.md for modifications required for x86 platforms.

python3: yolo_layer.cu:413: virtual nvinfer1::IPluginV2IOExt nvinfer1::YoloPluginCreator::createPlugin(const char, const nvinfer1::PluginFieldCollection*): Assertion `input_multiplier == 8 || input_multiplier == 16 || input_multiplier == 32' failed.

This indicates an assertion failure in the code. In fact, you are not using the latest code in this repo. Please update it. You could refer to the latest source code here:

https://github.com/jkjung-avt/tensorrt_demos/blob/25adc48c43580d8a70f92032da3ac67c1a7bd70e/plugins/yolo_layer.cu#L354-L355

shivamordanny commented 3 years ago

Thanks for the quick response, I will go over the modifications you suggested on both the machines and will update this thread over next week.

shivamordanny commented 3 years ago

Still getting the same error with the latest code. I put a print statement to get the multiplier value, it turns out it is 42.

Loading the ONNX file...
Adding yolo_layer plugins...
input_multiplier = 42
python3: yolo_layer.cu:415: virtual nvinfer1::IPluginV2IOExt* nvinfer1::YoloPluginCreator::createPlugin(const char*, const nvinfer1::PluginFieldCollection*): Assertion `input_multiplier == 8 || input_multiplier == 16 || input_multiplier == 32 || input_multiplier ==64' failed.
Aborted (core dumped)
jkjung-avt commented 3 years ago

Could you provide your custom cfg file? I will take a look.

shivamordanny commented 3 years ago

sure, here are the contents of cfg file: tiny-yolov4-640x480.cfg

[net]
# Testing
batch=1
subdivisions=1
# Training
#batch=64
#subdivisions=16
try_fix_nan=1
width=640
height=480
channels=1
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
adam=0

learning_rate=0.00261
burn_in=1000

max_batches = 6000
policy=steps
steps=4800,5400
scales=.1,.1

mosaic = 1
#cutmix = 1

#weights_reject_freq=1001
#ema_alpha=0.9998
#equidistant_point=1000
#num_sigmas_reject_badlabels=3
#badlabels_rejection_percentage=0.2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

##################################

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=6
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=1
resize=1.5
nms_kind=greedynms
beta_nms=0.6
#new_coords=1
#scale_x_y = 2.0

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=6
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=1
resize=1.5
nms_kind=greedynms
beta_nms=0.6
#new_coords=1
#scale_x_y = 2.0
shivamordanny commented 3 years ago

Hi @jkjung-avt, I figured out the issue is with recent changes done to yolo_layer.cu and plugins.py, the code is not correctly identifying the 2 different dimenstions input. Ex: 640x480. The current code works with symmetric or single dimensions. Ex:416x416 I tried running the latest code over AGX as well and it gave me the same assertion error, somehow input_multiplier is getting values like 42,21 (So it's a generic bug, not just for x_86 machines) I ran with an older version of code on AGX and on RTX-3070, it worked fine for both symmetric and asymmetric dimensions.

jkjung-avt commented 3 years ago

@shivamordanny Thanks for reporting this issue. This is indeed a bug introduced by myself with the July 18 merge request. It is now fixed with the f9f349 commit.

In addition, I also fixed some other issues for channels=1 (i.e. using grayscale images as inputs) custom models. The latest code in the repo should work for your custom model now.

shivamordanny commented 3 years ago

Great!! The other fix was also needed, I used to change the channel parameters in onnx_to_tensorrt.py from 3 to 1 to get it working for grayscale images. Closing this issue.