Closed lantudou closed 3 years ago
By the way, I don't understand why you need to use the source input size:
https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L259 https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L260
In the deepstream, the source input size is not the nvinfer input size for model inference, the element will resize the video frame with the source input size to your tensorRT input size automatically. Theoretically speaking, the source input size should not be involved in the bbox operation.
For my own model, the performance using deepstream is very worse than the inference using YOLOX reference code in pytorch. I am not sure whether the reason is the deepstream resizing operation or your code bug
对了,我不明白为什么需要使用源输入大小:
在深流中,源输入大小不是模型推断的nvinfer输入大小,元素将自动将源输入大小的视频帧调整为张量RT输入大小。理论上,bbox操作中不应涉及源输入大小。
对于我自己的模型,使用deepstream的性能比在pytorch中使用YOLOX参考代码的推断要差得多。不确定原因,是深流调整大小操作还是你的代码bug
By the way, I don't understand why you need to use the source input size:
In the deepstream, the source input size is not the nvinfer input size for model inference, the element will resize the video frame with the source input size to your tensorRT input size automatically. Theoretically speaking, the source input size should not be involved in the bbox operation.
For my own model, the performance using deepstream is very worse than the inference using YOLOX reference code in pytorch. I am not sure whether the reason is the deepstream resizing operation or your code bug
You are smart and really good. On the first question, OSD rendering in deepstream operates on the original image, so you need to set the size of the original image in the parsing process. Of course, you can get it through the API in deepstream, but you still need to set [source0] in the configuration file. On the second question, my test in deepstream passed perfectly. Please check the settings in the relevant deepstream. In addition, the official yolox model has been updated recently. I don't have time to make the latest adaptation. Sorry. If you have deep stream related questions, you can also raise them, because I have been studying end-to-end model deployment. Thank you
After many days debugging, I have finally found the issue of performance reduction.
Your preprocessing process using deepstream is totally wrong, this is a little long story:
First bug: Input Normalization
Before the 2021/8/19, the yolox use the pretrained model of imagenet dataset as the backbone. Therefore, the preprocess about the input normalization need to still keep the configuration of training the model about imagenet dataset, In fact, you can still find this in the new version of yolox code:
https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_augment.py#L248
img -= np.array([0.485, 0.456, 0.406]).reshape(3, 1, 1)
https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_augment.py#L249
img /= np.array([0.229, 0.224, 0.225]).reshape(3, 1, 1)
Therefore, your preprocess should refer to this. In deepstream, the input normalization process is determined by the parameters of net-scale-factor and mean-file: https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html
Unluckily, this process did not support the channel-wise normalization: https://forums.developer.nvidia.com/t/reflecting-pytorch-normalize-transform-parameter-to-deepstream-configuration/160294 To recover the performance you must change the library file of the deepstream.
After the 2021/8/19, yolox dropped the pretrained model of imagenet dataset: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/docs/updates_note.md
Now the input of the model is just 0-255 value without normalization for the model after the 2021/8/19. so you just need to set the net-scale-factor = 1 and don't set the mean-file.
Your configure about this is just setting the net-scale-factor = 1/255: https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/config_infer_primary.txt#L3
Please, the deepstream don't know what is your model input and cannot help you to finish all thing automatically!
Second bug: Input image channel. The yolox use the cv2 to read the image as the training input. For opencv, the default reading channel is BGR rather than RGB:
so, it should be:
model-color-format=1
rather than:
https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/config_infer_primary.txt#L6
Thrid bug:
For trainging process and the demo.py in yolox, the input resize process contains the border padding to maintain the image aspect ratio: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/yolox/data/data_augment.py#L154
So, it should be: maintain-aspect-ratio=1 rather than: https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/config_infer_primary.txt#L22
But for the tensorRT cpp reference in yolox, there is no border padding resize process.
Fourth bug:
In the deepstream, the source input size is not the nvinfer input size for model inference, the element will resize the video frame with the source input size to your tensorRT input size automatically. Theoretically speaking, the source input size should not be involved in the bbox operation.
These should be your model test input. But this config will not cause some serious wrong result for 640 test input. Any way, the source input size is not the nvinfer input size for model inference and the source input size should not be involved in the bbox operation. The deepstream will help you to finish the image preprocess resize and bbox postprocess resize. Don't worry.
By the way, I have done the test about the above modified code. The test process contains printing the nvinfer element resizing result and the model engine result and comparing the result between the reference tensorrt cpp file: https://github.com/Megvii-BaseDetection/YOLOX/blob/c9fe0aae2db90adccc90f7e5a16f044bf110c816/demo/TensorRT/cpp/yolox.cpp
and deepstream result.
For jpeg image as the source input, I got the totally same result for my own trained model. THIS IS CALLED PASSED CODE TEST!
@lantudou have you had any problems running the makefile or/and running YOLOx with deepsort?
@lantudou I managed to run YOLOX on my xavier AGX but it runs at just 1 FPS (yolox-s)
Have you managed to convert and run yolox-nano or some smaller detector? Because it seems to me that i can`t run the TRT engine of yolox-nano with deepstream.
@lantudou I managed to run YOLOX on my xavier AGX but it runs at just 1 FPS (yolox-s)
Have you managed to convert and run yolox-nano or some smaller detector? Because it seems to me that i can`t run the TRT engine of yolox-nano with deepstream.
Hi! bro. Sorry for reply so late. I have tested the yolox-nano in Linux x64. And I guess your problem is the network input size. For yolox-nano and yolox-tiny, the default network input size is 416416 not 640640, so please change the values:
to 416. And you will get the correct result. Good luck!
@lantudou我设法在我的 xavier AGX 上运行 YOLOX,但它的运行速度仅为 1 FPS (yolox-s)
您是否设法转换并运行 yolox-nano 或一些较小的检测器?因为在我看来,我无法使用 deepstream 运行 yolox-nano 的 TRT 引擎。
For yolox-s, have you passed the test of converted tensorrt engine without deepstream?
https://github.com/Megvii-BaseDetection/YOLOX/tree/main/demo/TensorRT/cpp
How long is your infer time using the converted the engine file?
@lantudou yes. And it runs at about 1 fps :(
I am using python
Moreover, I convert yolox-nano to TRT and when I try to do inference, I cannot do it. Because it just freezes.
@lantudou now when I switch the engine file to yolox-nano, it freezes or returns segmenttion fault. Yolox-s runs, but 1 FPS both passed "test" ran with demo file.
sudo deepstream-app -c deepstream_app_config.txt
Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvMultiObjectTracker] Loading TRT Engine for tracker ReID...
[NvMultiObjectTracker] Loading Complete!
[NvMultiObjectTracker] Initialized
0:00:05.945366492 3292 0x294dfb00 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-nvdsanalytics/YOLOX-deepstream/yolox_nano/model_trt.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT input_0 3x416x416
1 OUTPUT kFLOAT output_0 3549x85
0:00:05.945827155 3292 0x294dfb00 INFO nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-nvdsanalytics/YOLOX-deepstream/yolox_nano/model_trt.engine
0:00:05.956705122 3292 0x294dfb00 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-nvdsanalytics/YOLOX-deepstream/config_infer_primary.txt sucessfully
Runtime commands:
h: Print this help
q: Quit
p: Pause
r: Resume
**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:194>: Pipeline ready
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
Either it freezes like this or I get a segmentation fault
@tulbureandreit
Have you changed this line?
After revising num_class
to appropriate value, I don't get segmentation fault.
Very nice work to save my lots of time!
But there are some bugs for own trained model in your code:
https://github.com/nanmi/YOLOX-deepstream/blob/96f44f9a5b5e450276a029e7580667136cbb2320/nvdsinfer_custom_impl_yolox/nvdsparsebbox_yolox.cpp#L168
should be:
const int basic_pos = anchor_idx * (num_class + 4 + 1);