Issue Description

I am experiencing inconsistent results when running the YOLOv8 ONNX model in both NVIDIA DeepStream and a Python environment. The same ONNX file produces different outcomes when processed by these two platforms.

Environment

DeepStream Version: 6.4
NVIDIA GPU: A5000
ONNX File: yolov8s.onnx
input: (3 640 640)
Operating System: ubuntu-22.04

Steps to Reproduce

Convert the YOLOv8 model to an ONNX format using the following settings: I created an onnx file as instructed by the site. yolov8,

Run the ONNX model in DeepStream using this config: This is the setting of infer.config in deepstream-test5


[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=1
onnx-file=/opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-test5/yolov8s.onnx
#int8-calib-file=calib.table
labelfile-path=labels.txt
network-input-order=0
infer-dims=3;640;640
batch-size=1
network-mode=2
num-detected-classes=80
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
#workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all] pre-cluster-threshold=0.25 post-cluster-threshold=0.25 nms-iou-threshold=0.35 topk=300

deepstream-app.config

[application] enable-perf-measurement=1 perf-measurement-interval-sec=5

gie-kitti-output-dir=streamscl

Note: [source-list] now support REST Server with use-nvmultiurisrcbin=1

[source-list] num-source-bins=1

list=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4;file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h265.mp4

list=file:///opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-test5/configs/C001202_001_full_video_1_h264.mp4; use-nvmultiurisrcbin=1

To display stream name in FPS log, set stream-name-display=1

stream-name-display=0

sensor-id-list vector is one to one mapped with the uri-list

identifies each sensor by a unique ID

sensor-id-list=UniqueSensorId1;

Optional sensor-name-list vector is one to one mapped with the uri-list

sensor-name-list=UniqueSensorName1; max-batch-size=10 http-ip=localhost http-port=9000

sgie batch size is number of sources * fair fraction of number of objects detected per frame per source

the fair fraction of number of object detected is assumed to be 4

sgie-batch-size=40

Set the below key to keep the application running at all times

[source-attr-all] enable=1 type=3 num-sources=1 gpu-id=0 cudadec-memtype=0 latency=100 rtsp-reconnect-interval-sec=0

[streammux] gpu-id=0

Note: when used with [source-list], batch-size is ignored

instead, max-batch-size config is used

batch-size=1

time out in usec, to wait after the first buffer is available

to push the batch even if the complete batch is not formed

batched-push-timeout=33333

Set muxer output width and height

width=640 height=640

Enable to maintain aspect ratio wrt source, and allow black borders, works

along with width, height properties

enable-padding=1 nvbuf-memory-type=0

If set to TRUE, system timestamp will be attached as ntp timestamp

If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached

attach-sys-ts-as-ntp=1

drop-pipeline-eos ignores EOS from individual streams muxed in the DS pipeline

It is useful with source-list/use-nvmultiurisrcbin=1 where the REST server

will be running post last stream EOS to accept new streams

drop-pipeline-eos=1

Boolean property to inform muxer that sources are live

When using nvmultiurisrcbin live-source=1 is preferred default

to allow batching of available buffers when number of sources is < max-batch-size configuration

live-source=0

[sink0] enable=1

Type - 1=FakeSink 2=EglSink 3=File

type=2 sync=0 source-id=0 gpu-id=0 nvbuf-memory-type=0

[sink2] enable=1 type=3

1=mp4 2=mkv

container=1

1=h264 2=h265 3=mpeg4

only SW mpeg4 is supported right now.

codec=3 sync=1 bitrate=2000000 output-file=out.mp4 source-id=0

[osd] enable=1 gpu-id=0 border-width=5 text-size=15 text-color=1;1;1;1; text-bg-color=0.3;0.3;0.3;1 font=Serif show-clock=0 clock-x-offset=800 clock-y-offset=820 clock-text-size=12 clock-color=1;0;0;0 nvbuf-memory-type=0

config-file property is mandatory for any gie section.

Other properties are optional and if set will override the properties set in

the infer config file.

[primary-gie] enable=1 gpu-id=0 gie-unique-id=1 nvbuf-memory-type=0 config-file=/opt/nvidia/deepstream/deepstream-6.4/sources/apps/sample_apps/deepstream-test5/configs/infer_config.txt



### Expected Results
The expectation is that both environments would produce similar results given that the same model and input data are used.

### Actual Results
I ran the same onnx file with python as the result in deepstream, but the start time of the detected frame is different.
Also, the positions of confidence score and bbox are slightly different, so what can I figure out as a problem?

### Additional Information
Any additional information that might help troubleshoot the issue, such as specific settings or modifications made to the model or environment setup.

Thank you for looking into this issue. I am looking forward to your suggestions on how to resolve these discrepancies.

marcoslucianops / DeepStream-Yolo

Inconsistent results between DeepStream and Python using YOLOv8 ONNX model #537