marcoslucianops / DeepStream-Yolo

NVIDIA DeepStream SDK 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
MIT License
1.39k stars 344 forks source link

Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.) #404

Open lewis-bo opened 1 year ago

lewis-bo commented 1 year ago

env: DS6.2, CUDA11.4.315
Jetpack:5.1.1

onnx: python3 export_yoloV5.py -w vehicle.pt -s 640 --simplify --dynamic

when I run "deepstream-app -c deepstream_app_config.txt" , I get this error


WARNING: Deserialize engine failed because file path: /home/nvidia/develop/DeepStream-Yolo/vehicle_fp32.engine open error
0:00:02.553676200 30126 0xaaaac99c1090 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/home/nvidia/develop/DeepStream-Yolo/vehicle_fp32.engine failed
0:00:02.711605981 30126 0xaaaac99c1090 WARN                 nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/home/nvidia/develop/DeepStream-Yolo/vehicle_fp32.engine failed, try rebuild
0:00:02.711697215 30126 0xaaaac99c1090 INFO                 nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.

Building the TensorRT Engine

ERROR: [TRT]: 4: [network.cpp::validate::3062] Error Code 4: Internal Error (Network has dynamic or shape inputs, but no optimization profile has been defined.)
Building engine failed
marcoslucianops commented 1 year ago

Are you using force-implicit-batch-dim=1 in the config_infer_primary_yoloV8.txt file?

lewis-bo commented 1 year ago

yes,

net-scale-factor=0.0039215697906911373

0=RGB, 1=BGR, 2=GRAY

model-color-format=0

0=NCHW, 1=NHWC, 2=CUSTOM

network-input-order=0 onnx-file=plate.onnx model-engine-file=model_b16_dla1_int8.engine int8-calib-file=cabli.table labelfile-path=labels.txt batch-size=16

0: FP32 1: INT8 2: FP16

network-mode=1 num-detected-classes=1 interval=0 gie-unique-id=1

1=Primary 2=Secondary

process-mode=1

0: Detector 1: Classifier 2: Segmentation 3: Instance Segmentation

network-type=0 cluster-mode=2 maintain-aspect-ratio=1 symmetric-padding=1 force-implicit-batch-dim=1 workspace-size=1000 enable-dla=1 use-dla-core=0 parse-bbox-func-name=NvDsInferParseYolo

parse-bbox-func-name=NvDsInferParseYoloCuda

custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so engine-create-func-name=NvDsInferYoloCudaEngineGet

marcoslucianops commented 1 year ago

I think it's not working well yet. Please comment it and try again.

lewis-bo commented 12 months ago

I think it's not working well yet. Please comment it and try again.

It work when I change 'force-implicit-batch-dim=1' to 'force-implicit-batch-dim=0'

image

marcoslucianops commented 12 months ago

I think it's need to change the network shape to force the implicit batch. I will some tests soon.

lewis-bo commented 12 months ago

I hava another question about dynamic batch。

I converted yolov5s.onnx to an int8 type engine and set the maximum batch to 16. But when I run it, whether it's with one input or multiple inputs or 1b or 16b, it's very slow. max fps < 20

Screenshot from 2023-07-10 15-07-11

marcoslucianops commented 12 months ago

To use force-implicit-batch-dim=1 you to export the model with --batch 1 (example for batch-size 1) instead the --dynamic.

I converted yolov5s.onnx to an int8 type engine and set the maximum batch to 16. But when I run it, whether it's with one input or multiple inputs or 1b or 16b, it's very slow. max fps < 20.

The batch-size is related to the number of sources on DeepStream. How is the FPS with FP16?

lewis-bo commented 10 months ago

sorry so late to reply, you mean use --batch when export onnx model

marcoslucianops commented 10 months ago

Only if you aren't using --dynamic on the export command. The --batch should be the same of the batch-size you are using.