marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.39k stars 344 forks source link

Yolov8n is running slower than Yolov8s in Jetson Xavier NX #410

Closed hdnh2006 closed 11 months ago

hdnh2006 commented 12 months ago

Hello,

I just want to ask why a small model like yolov8n is running slower than yolov8s. I have tried same config files with same input and batch size and I see yolov8n is taking too much time inferencing.

This is my environment:

Jetpack 5.1
Deepstream 6.2
onnx==1.14.0
onnxruntime==1.15.1
onnxsim==0.4.33
torch==1.12.0
torchvision==0.13.0

# Export with
python3 export_yoloV8.py -w yolov8n.pt (yolov8s.pt) --batch 1 --opset 11

Config primary file has not changed. But deepstream app has changed:

config-file=config_infer_primary_yoloV8.txt

Logs for yolov8n:

(deepstream-app:2352724): GLib-GObject-WARNING **: 13:39:19.850: value "TRUE" of type 'gboolean' is invalid or out of range for property 'sync' of type 'gboolean'

(deepstream-app:2352724): GLib-GObject-WARNING **: 13:39:19.851: value "TRUE" of type 'gboolean' is invalid or out of range for property 'qos' of type 'gboolean'
WARNING: Deserialize engine failed because file path: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine open error
0:00:04.940386537 2352724 0xaaaac0cc4430 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed
0:00:05.006163363 2352724 0xaaaac0cc4430 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed, try rebuild
0:00:05.006272708 2352724 0xaaaac0cc4430 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

Building complete

0:06:04.414507871 2352724 0xaaaac0cc4430 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       
1   OUTPUT kFLOAT boxes           8400x4          
2   OUTPUT kFLOAT scores          8400x1          
3   OUTPUT kFLOAT classes         8400x1          

0:06:04.523638735 2352724 0xaaaac0cc4430 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/config_infer_primary_yoloV8.txt sucessfully

Runtime commands:
    h: Print this help
    q: Quit

    p: Pause
    r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

**PERF:  FPS 0 (Avg)    
**PERF:  0.00 (0.00)    
** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:225>: Pipeline running

**PERF:  57.81 (57.60)  
**PERF:  57.89 (57.82)  
**PERF:  57.90 (57.81)  
**PERF:  57.90 (57.85)  
**PERF:  57.91 (57.88)  
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting ...

Quitting
App run successful

Logs for yolov8s:

(deepstream-app:2352829): GLib-GObject-WARNING **: 13:55:41.488: value "TRUE" of type 'gboolean' is invalid or out of range for property 'sync' of type 'gboolean'

(deepstream-app:2352829): GLib-GObject-WARNING **: 13:55:41.489: value "TRUE" of type 'gboolean' is invalid or out of range for property 'qos' of type 'gboolean'
WARNING: Deserialize engine failed because file path: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine open error
0:00:04.522515636 2352829 0xaaab18315630 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed
0:00:04.590148868 2352829 0xaaab18315630 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed, try rebuild
0:00:04.590719467 2352829 0xaaab18315630 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

WARNING: [TRT]: Tactic Device request: 4202MB Available: 3317MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 4202 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4202MB Available: 3317MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 4202 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4245MB Available: 3302MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 4245 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4245MB Available: 3302MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 4245 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4364MB Available: 3304MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 4364 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4364MB Available: 3305MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 4364 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6367MB Available: 3307MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 6367 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6367MB Available: 3307MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 6367 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6301MB Available: 3303MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 6301 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6301MB Available: 3303MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 6301 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
Building complete

0:06:53.536586030 2352829 0xaaab18315630 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       
1   OUTPUT kFLOAT boxes           8400x4          
2   OUTPUT kFLOAT scores          8400x1          
3   OUTPUT kFLOAT classes         8400x1          

0:06:53.967989461 2352829 0xaaab18315630 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/config_infer_primary_yoloV8.txt sucessfully

Runtime commands:
    h: Print this help
    q: Quit

    p: Pause
    r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

**PERF:  FPS 0 (Avg)    
**PERF:  0.00 (0.00)    
** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:225>: Pipeline running

**PERF:  21.92 (21.76)  
**PERF:  25.77 (24.12)  
**PERF:  25.77 (24.74)  
**PERF:  25.77 (25.03)  
**PERF:  25.77 (25.19)  
**PERF:  25.78 (25.26)  
**PERF:  25.78 (25.34)  
**PERF:  25.77 (25.40)  
**PERF:  25.77 (25.45)  
**PERF:  25.77 (25.48)  
**PERF:  25.78 (25.51)  
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting ...

Quitting
App run successful

Here you can download my onnx files (available 1 week): https://we.tl/t-WV8ipOjo7p

Am I doing something wrong??

marcoslucianops commented 12 months ago

The YOLOv8n is faster. It's running in 57.91 FPS while the YOLOv8s is running in 25.78 FPS.

hdnh2006 commented 11 months ago

oh my god, I feel so stupid... It was at the end of the afternoon and I couldn't realize about it.

I believe the reason for my confusion may be because I am accustomed to providing logs regarding milliseconds (ms) in inference rather than frames per second (fps).

Sorry @marcoslucianops.

marcoslucianops commented 11 months ago

No problem