nanmi / YOLOX-deepstream

deploy yolox algorithm use deepstream
89 stars 27 forks source link

Some bugs when you use your own dataset to train model #5

Closed lantudou closed 3 years ago

lantudou commented 3 years ago

Very nice work to save my lots of time!

But there are some bugs for own trained model in your code:

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L168

should be:

const int basic_pos = anchor_idx * (num_class + 4 + 1);

lantudou commented 3 years ago

By the way, I don't understand why you need to use the source input size:

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L259 https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L260

In the deepstream, the source input size is not the nvinfer input size for model inference, the element will resize the video frame with the source input size to your tensorRT input size automatically. Theoretically speaking, the source input size should not be involved in the bbox operation.

For my own model, the performance using deepstream is very worse than the inference using YOLOX reference code in pytorch. I am not sure whether the reason is the deepstream resizing operation or your code bug

nanmi commented 3 years ago

对了,我不明白为什么需要使用源输入大小:

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L259

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L260

在深流中,源输入大小不是模型推断的nvinfer输入大小,元素将自动将源输入大小的视频帧调整为张量RT输入大小。理论上,bbox操作中不应涉及源输入大小。

对于我自己的模型,使用deepstream的性能比在pytorch中使用YOLOX参考代码的推断要差得多。不确定原因,是深流调整大小操作还是你的代码bug

By the way, I don't understand why you need to use the source input size:

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L259

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L260

In the deepstream, the source input size is not the nvinfer input size for model inference, the element will resize the video frame with the source input size to your tensorRT input size automatically. Theoretically speaking, the source input size should not be involved in the bbox operation.

For my own model, the performance using deepstream is very worse than the inference using YOLOX reference code in pytorch. I am not sure whether the reason is the deepstream resizing operation or your code bug

You are smart and really good. On the first question, OSD rendering in deepstream operates on the original image, so you need to set the size of the original image in the parsing process. Of course, you can get it through the API in deepstream, but you still need to set [source0] in the configuration file. On the second question, my test in deepstream passed perfectly. Please check the settings in the relevant deepstream. In addition, the official yolox model has been updated recently. I don't have time to make the latest adaptation. Sorry. If you have deep stream related questions, you can also raise them, because I have been studying end-to-end model deployment. Thank you

lantudou commented 3 years ago

After many days debugging, I have finally found the issue of performance reduction.

Your preprocessing process using deepstream is totally wrong, this is a little long story:

First bug: Input Normalization

Before the 2021/8/19, the yolox use the pretrained model of imagenet dataset as the backbone. Therefore, the preprocess about the input normalization need to still keep the configuration of training the model about imagenet dataset, In fact, you can still find this in the new version of yolox code:

https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_augment.py#L248 img -= np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1) https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_augment.py#L249 img /= np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1)

Therefore, your preprocess should refer to this. In deepstream, the input normalization process is determined by the parameters of net-scale-factor and mean-file: https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html

Unluckily, this process did not support the channel-wise normalization: https://forums.developer.nvidia.com/t/reflecting-pytorch-normalize-transform-parameter-to-deepstream-configuration/160294 To recover the performance you must change the library file of the deepstream.

After the 2021/8/19, yolox dropped the pretrained model of imagenet dataset: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/docs/updates_note.md

Now the input of the model is just 0-255 value without normalization for the model after the 2021/8/19. so you just need to set the net-scale-factor = 1 and don't set the mean-file.

Your configure about this is just setting the net-scale-factor = 1/255: https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/config_infer_primary.txt#L3

Please, the deepstream don't know what is your model input and cannot help you to finish all thing automatically!

Second bug: Input image channel. The yolox use the cv2 to read the image as the training input. For opencv, the default reading channel is BGR rather than RGB:

so, it should be: model-color-format=1 rather than: https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/config_infer_primary.txt#L6

Thrid bug:

For trainging process and the demo.py in yolox, the input resize process contains the border padding to maintain the image aspect ratio: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_augment.py#L154

So, it should be: maintain-aspect-ratio=1 rather than: https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/config_infer_primary.txt#L22

But for the tensorRT cpp reference in yolox, there is no border padding resize process.

Fourth bug:

In the deepstream, the source input size is not the nvinfer input size for model inference, the element will resize the video frame with the source input size to your tensorRT input size automatically. Theoretically speaking, the source input size should not be involved in the bbox operation.

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L259

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L260

These should be your model test input. But this config will not cause some serious wrong result for 640 test input. Any way, the source input size is not the nvinfer input size for model inference and the source input size should not be involved in the bbox operation. The deepstream will help you to finish the image preprocess resize and bbox postprocess resize. Don't worry.

lantudou commented 3 years ago

By the way, I have done the test about the above modified code. The test process contains printing the nvinfer element resizing result and the model engine result and comparing the result between the reference tensorrt cpp file: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/demo/TensorRT/cpp/yolox.cpp

and deepstream result.

For jpeg image as the source input, I got the totally same result for my own trained model. THIS IS CALLED PASSED CODE TEST!

tulbureandreit commented 2 years ago

@lantudou have you had any problems running the makefile or/and running YOLOx with deepsort?

tulbureandreit commented 2 years ago

@lantudou I managed to run YOLOX on my xavier AGX but it runs at just 1 FPS (yolox-s)

Have you managed to convert and run yolox-nano or some smaller detector? Because it seems to me that i can`t run the TRT engine of yolox-nano with deepstream.

lantudou commented 2 years ago

@lantudou I managed to run YOLOX on my xavier AGX but it runs at just 1 FPS (yolox-s)

Have you managed to convert and run yolox-nano or some smaller detector? Because it seems to me that i can`t run the TRT engine of yolox-nano with deepstream.

Hi! bro. Sorry for reply so late. I have tested the yolox-nano in Linux x64. And I guess your problem is the network input size. For yolox-nano and yolox-tiny, the default network input size is 416416 not 640640, so please change the values:

https://github.com/nanmi/YOLOX-deepstream/blob/5ced7c32d16807af78efa9b01991175e29e763b0/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L34

https://github.com/nanmi/YOLOX-deepstream/blob/5ced7c32d16807af78efa9b01991175e29e763b0/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L35

https://github.com/nanmi/YOLOX-deepstream/blob/5ced7c32d16807af78efa9b01991175e29e763b0/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L259

https://github.com/nanmi/YOLOX-deepstream/blob/5ced7c32d16807af78efa9b01991175e29e763b0/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L260

to 416. And you will get the correct result. Good luck!

lantudou commented 2 years ago

@lantudou我设法在我的 xavier AGX 上运行 YOLOX,但它的运行速度仅为 1 FPS (yolox-s)

您是否设法转换并运行 yolox-nano 或一些较小的检测器?因为在我看来,我无法使用 deepstream 运行 yolox-nano 的 TRT 引擎。

For yolox-s, have you passed the test of converted tensorrt engine without deepstream?

https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/TensorRT/cpp

How long is your infer time using the converted the engine file?

tulbureandreit commented 2 years ago

@lantudou yes. And it runs at about 1 fps :(

I am using python

Moreover, I convert yolox-nano to TRT and when I try to do inference, I cannot do it. Because it just freezes.

tulbureandreit commented 2 years ago

@lantudou now when I switch the engine file to yolox-nano, it freezes or returns segmenttion fault. Yolox-s runs, but 1 FPS both passed "test" ran with demo file.

tulbureandreit commented 2 years ago
sudo deepstream-app -c deepstream_app_config.txt 
Opening in BLOCKING MODE 
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Loading TRT Engine for tracker ReID...
[NvMultiObjectTracker] Loading Complete!
[NvMultiObjectTracker] Initialized
0:00:05.945366492  3292     0x294dfb00 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-nvdsanalytics/YOLOX-deepstream/yolox_nano/model_trt.engine
INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_0         3x416x416       
1   OUTPUT kFLOAT output_0        3549x85         

0:00:05.945827155  3292     0x294dfb00 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-nvdsanalytics/YOLOX-deepstream/yolox_nano/model_trt.engine
0:00:05.956705122  3292     0x294dfb00 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-nvdsanalytics/YOLOX-deepstream/config_infer_primary.txt sucessfully

Runtime commands:
    h: Print this help
    q: Quit

    p: Pause
    r: Resume

**PERF:  FPS 0 (Avg)    
**PERF:  0.00 (0.00)    
** INFO: <bus_callback:194>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:180>: Pipeline running

NvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
**PERF:  0.00 (0.00)    
**PERF:  0.00 (0.00)    
tulbureandreit commented 2 years ago

Either it freezes like this or I get a segmentation fault

tsmiyamoto commented 2 years ago

@tulbureandreit Have you changed this line? After revising num_class to appropriate value, I don't get segmentation fault.

https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L158