kalfazed / tensorrt_starter

This repository give a guidline to learn CUDA and TensorRT from the beginning.
120 stars 29 forks source link

The code output in Section 6.2-deploy-classification-advanced is abnormal! #6

Open Melody-Zhou opened 2 months ago

Melody-Zhou commented 2 months ago

Hi, Thank you for the detailed CUDA and TensorRT tutorials. They’ve really helped me out.

I encountered a strange phenomenon and I hope you can help analyze it. When executing the code from Section 6.2-deploy-classification-advanced, I encounter a uniform but incorrect inference output across various images when processed using GPU. The output is as follows:

[info]Model:      resnet50.onnx
[info]Image:      cat.png
[info]Inference result: cleaver, meat cleaver, chopper
[info]Confidence is 3.092%

[info]Model:      resnet50.onnx
[info]Image:      gazelle.png
[info]Inference result: cleaver, meat cleaver, chopper
[info]Confidence is 3.092%

[info]Model:      resnet50.onnx
[info]Image:      eagle.png
[info]Inference result: cleaver, meat cleaver, chopper
[info]Confidence is 3.092%

[info]Model:      resnet50.onnx
[info]Image:      fox.png
[info]Inference result: cleaver, meat cleaver, chopper
[info]Confidence is 3.092%

[info]Model:      resnet50.onnx
[info]Image:      tiny-cat.png
[info]Inference result: cleaver, meat cleaver, chopper
[info]Confidence is 3.092%

[info]Model:      resnet50.onnx
[info]Image:      wolf.png
[info]Inference result: cleaver, meat cleaver, chopper
[info]Confidence is 3.092%

The inference results are the same for all pictures, So I started trying to debug to find the problem. First, I set the params.dev = model::device::CPU to see if it was a problem with the pre-processing and post-processing on the GPU. And I found that the result inference on the CPU works fine, The output is as follows:

[info]Model:      resnet50.onnx
[info]Image:      cat.png
[info]Inference result: Egyptian cat
[info]Confidence is 45.422%

[info]Model:      resnet50.onnx
[info]Image:      gazelle.png
[info]Inference result: gazelle
[info]Confidence is 62.041%

[info]Model:      resnet50.onnx
[info]Image:      eagle.png
[info]Inference result: bald eagle, American eagle, Haliaeetus leucocephalus
[info]Confidence is 99.869%

[info]Model:      resnet50.onnx
[info]Image:      fox.png
[info]Inference result: red fox, Vulpes vulpes
[info]Confidence is 80.789%

[info]Model:      resnet50.onnx
[info]Image:      tiny-cat.png
[info]Inference result: Persian cat
[info]Confidence is 75.310%

[info]Model:      resnet50.onnx
[info]Image:      wolf.png
[info]Inference result: timber wolf, grey wolf, gray wolf, Canis lupus
[info]Confidence is 39.663%

So I seriously suspect that there is a problem with the pre-processing or post-processing implementation on the GPU, I found that the post-processing part of the classification task was relatively simple and was not implemented on the GPU, so I positioned the problem on the implementation of preprocess_gpu function.

I found that the default strategy adopted in the process::preprocess_resize_gpu is process::tactics::GPU_BILINEAR, I tried changing it to process::tactics::GPU_NEAREST and got the following output:

[info]Model:      resnet50.onnx
[info]Image:      cat.png
[info]Inference result: Egyptian cat
[info]Confidence is 24.423%

[info]Model:      resnet50.onnx
[info]Image:      gazelle.png
[info]Inference result: gazelle
[info]Confidence is 37.486%

[info]Model:      resnet50.onnx
[info]Image:      eagle.png
[info]Inference result: bald eagle, American eagle, Haliaeetus leucocephalus
[info]Confidence is 99.946%

[info]Model:      resnet50.onnx
[info]Image:      fox.png
[info]Inference result: red fox, Vulpes vulpes
[info]Confidence is 86.935%

[info]Model:      resnet50.onnx
[info]Image:      tiny-cat.png
[info]Inference result: Persian cat
[info]Confidence is 76.643%

[info]Model:      resnet50.onnx
[info]Image:      wolf.png
[info]Inference result: coyote, prairie wolf, brush wolf, Canis latrans
[info]Confidence is 70.486%

The output is normal, so I think there is a problem with the implementation part of the bilinear_BGR2RGB_nhwc2nchw_norm_kernel kernel function. But I read the code carefully and didn't find any problems. Even I asked ChatGPT to analyze it for me, but I didn't get any useful suggestions.

So I started trying to save the image data processed by the kernel function and use python to view it. I found that all the values in the pre-processed image data were zero! So I decided that there should be a problem with the implementation of some parts of the kernel function. I found that the kernel function does not calculate out-of-bounds coordinates and does nothing, like follows:

if (src_y1 < 0 || src_x1 < 0 || src_y2 > srcH || src_x2 > srcW) {
    // bilinear interpolation -- 对于越界的坐标不进行计算
} else {
    // bilinear interpolation -- 计算原图上的坐标(浮点类型)在0~1之间的值
    ...
}

This seems to be no problem, and the kernel function of nearest neighbor interpolation also does this. I accidentally thought of adding an extra return; when the coordinates are out of bounds, as follows:

if (src_y1 < 0 || src_x1 < 0 || src_y2 > srcH || src_x2 > srcW) {
    // bilinear interpolation -- 对于越界的坐标不进行计算
    return;  // Newly added
} else {
    // bilinear interpolation -- 计算原图上的坐标(浮点类型)在0~1之间的值
    ...
}

When I run it again I got the correct value, The output is as follows:

[info]Model:      resnet50.onnx
[info]Image:      cat.png
[info]Inference result: Egyptian cat
[info]Confidence is 45.700%

[info]Model:      resnet50.onnx
[info]Image:      gazelle.png
[info]Inference result: gazelle
[info]Confidence is 62.227%

[info]Model:      resnet50.onnx
[info]Image:      eagle.png
[info]Inference result: bald eagle, American eagle, Haliaeetus leucocephalus
[info]Confidence is 99.873%

[info]Model:      resnet50.onnx
[info]Image:      fox.png
[info]Inference result: red fox, Vulpes vulpes
[info]Confidence is 80.740%

[info]Model:      resnet50.onnx
[info]Image:      tiny-cat.png
[info]Inference result: Persian cat
[info]Confidence is 75.263%

[info]Model:      resnet50.onnx
[info]Image:      wolf.png
[info]Inference result: timber wolf, grey wolf, gray wolf, Canis lupus
[info]Confidence is 40.196%

I asked ChatGPT and the reason it gave was that an out-of-bounds thread not returned might access an invalid memory address, leading to undefined behavior. But doesn't nearest neighbor interpolation do the same? Why is there no such problem?

So I want to know if you have the same problem and what is the reason why this problem occurs.

Thank you very much.

My environment is as follows:

Ubuntu 20.04.06 LTS
Gen Intel(R) Core(TM) i5-12400F
NVIDIA GeForce RTX3060
CUDA 11.6, cuDNN 8.4.0, TensorRT 8.6.1.6, OpenCV 4.6.0
kalfazed commented 3 weeks ago

@Melody-Zhou Thank you for pointing out this issue and sorry for the late response. (I somehow didn't check the issue recently)... I've created the same environment in docker as yours but I cannot reproduce the result. It works correctly and can generate the same score with or without return in the conditional branch. However, I think it is a good manner to add return so that the code readability can be improved. I will create a PR to fix it.

Different with the nearest neighbor, in bilinear implementation, the src_x of src_y might be less than zero or larger than the original size, because of 0.5.

    int src_y1 = floor((y + 0.5) * scaled_h - 0.5);
    int src_x1 = floor((x + 0.5) * scaled_w - 0.5);
    int src_y2 = src_y1 + 1;
    int src_x2 = src_x1 + 1;

However, it will not happen in nearest neighbor implementation. If you change floor to ceil, the out-of-bound access may happen.

    int src_y = floor((float)y * scaled_h);
    int src_x = floor((float)x * scaled_w);
Melody-Zhou commented 3 weeks ago

@kalfazed Thanks so much for your reply, I got it and will recheck it.