Hello,

I just want to ask why a small model like yolov8n is running slower than yolov8s. I have tried same config files with same input and batch size and I see yolov8n is taking too much time inferencing.

This is my environment:

Jetpack 5.1
Deepstream 6.2
onnx==1.14.0
onnxruntime==1.15.1
onnxsim==0.4.33
torch==1.12.0
torchvision==0.13.0

# Export with
python3 export_yoloV8.py -w yolov8n.pt (yolov8s.pt) --batch 1 --opset 11

Config primary file has not changed. But deepstream app has changed:

config-file=config_infer_primary_yoloV8.txt

Logs for yolov8n:

(deepstream-app:2352724): GLib-GObject-WARNING **: 13:39:19.850: value "TRUE" of type 'gboolean' is invalid or out of range for property 'sync' of type 'gboolean'

(deepstream-app:2352724): GLib-GObject-WARNING **: 13:39:19.851: value "TRUE" of type 'gboolean' is invalid or out of range for property 'qos' of type 'gboolean'
WARNING: Deserialize engine failed because file path: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine open error
0:00:04.940386537 2352724 0xaaaac0cc4430 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed
0:00:05.006163363 2352724 0xaaaac0cc4430 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed, try rebuild
0:00:05.006272708 2352724 0xaaaac0cc4430 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

Building complete

0:06:04.414507871 2352724 0xaaaac0cc4430 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       
1   OUTPUT kFLOAT boxes           8400x4          
2   OUTPUT kFLOAT scores          8400x1          
3   OUTPUT kFLOAT classes         8400x1          

0:06:04.523638735 2352724 0xaaaac0cc4430 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/config_infer_primary_yoloV8.txt sucessfully

Runtime commands:
    h: Print this help
    q: Quit

    p: Pause
    r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

**PERF:  FPS 0 (Avg)    
**PERF:  0.00 (0.00)    
** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:225>: Pipeline running

**PERF:  57.81 (57.60)  
**PERF:  57.89 (57.82)  
**PERF:  57.90 (57.81)  
**PERF:  57.90 (57.85)  
**PERF:  57.91 (57.88)  
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting ...

Quitting
App run successful

Logs for yolov8s:

(deepstream-app:2352829): GLib-GObject-WARNING **: 13:55:41.488: value "TRUE" of type 'gboolean' is invalid or out of range for property 'sync' of type 'gboolean'

(deepstream-app:2352829): GLib-GObject-WARNING **: 13:55:41.489: value "TRUE" of type 'gboolean' is invalid or out of range for property 'qos' of type 'gboolean'
WARNING: Deserialize engine failed because file path: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine open error
0:00:04.522515636 2352829 0xaaab18315630 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed
0:00:04.590148868 2352829 0xaaab18315630 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine failed, try rebuild
0:00:04.590719467 2352829 0xaaab18315630 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

WARNING: [TRT]: Tactic Device request: 4202MB Available: 3317MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 4202 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4202MB Available: 3317MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 4202 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4245MB Available: 3302MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 4245 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4245MB Available: 3302MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 4245 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4364MB Available: 3304MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 4364 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 4364MB Available: 3305MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 4364 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6367MB Available: 3307MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 6367 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6367MB Available: 3307MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 6367 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6301MB Available: 3303MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 3 due to insufficient memory on requested size of 6301 detected for tactic 0x0000000000000004.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
WARNING: [TRT]: Tactic Device request: 6301MB Available: 3303MB. Device memory is insufficient to use tactic.
WARNING: [TRT]: Skipping tactic 8 due to insufficient memory on requested size of 6301 detected for tactic 0x000000000000003c.
Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit().
Building complete

0:06:53.536586030 2352829 0xaaab18315630 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /home/henry/Projects/VisionAnalytics/DeepStream-Yolo/model_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       
1   OUTPUT kFLOAT boxes           8400x4          
2   OUTPUT kFLOAT scores          8400x1          
3   OUTPUT kFLOAT classes         8400x1          

0:06:53.967989461 2352829 0xaaab18315630 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/home/henry/Projects/VisionAnalytics/DeepStream-Yolo/config_infer_primary_yoloV8.txt sucessfully

Runtime commands:
    h: Print this help
    q: Quit

    p: Pause
    r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

**PERF:  FPS 0 (Avg)    
**PERF:  0.00 (0.00)    
** INFO: <bus_callback:239>: Pipeline ready

Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
** INFO: <bus_callback:225>: Pipeline running

**PERF:  21.92 (21.76)  
**PERF:  25.77 (24.12)  
**PERF:  25.77 (24.74)  
**PERF:  25.77 (25.03)  
**PERF:  25.77 (25.19)  
**PERF:  25.78 (25.26)  
**PERF:  25.78 (25.34)  
**PERF:  25.77 (25.40)  
**PERF:  25.77 (25.45)  
**PERF:  25.77 (25.48)  
**PERF:  25.78 (25.51)  
nvstreammux: Successfully handled EOS for source_id=0
** INFO: <bus_callback:262>: Received EOS. Exiting ...

Quitting
App run successful

Here you can download my onnx files (available 1 week): https://we.tl/t-WV8ipOjo7p

Am I doing something wrong??

marcoslucianops / DeepStream-Yolo

Yolov8n is running slower than Yolov8s in Jetson Xavier NX #410

Logs for yolov8n:

Logs for yolov8s: