When batch>2, can not conver onnx to trt

JobinWeng commented 1 year ago

deepstream=6.2 Jetson NX ori

[09/15/2023-09:21:15] [E] Error[4]: [shapeCompiler.cpp::evaluateShapeChecks::1180] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: IShuffleLayer /1/Reshape_12: reshaping failed for tensor: /1/Expand_1_output_0 Reshape would change volume.) [09/15/2023-09:21:15] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) [09/15/2023-09:21:15] [E] Engine could not be created from network [09/15/2023-09:21:15] [E] Building engine failed [09/15/2023-09:21:15] [E] Failed to create engine from model or file. [09/15/2023-09:21:15] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=yolov8s-seg.onnx --minShapes=input:1x3x640x640 --optShapes=input:4x3x640x640 --maxShapes=input:4x3x640x640 --saveEngine=yolov8s-seg.engine --workspace=1024

marcoslucianops commented 1 year ago

https://github.com/marcoslucianops/DeepStream-Yolo-Seg/blob/master/docs/YOLOv8_Seg.md

JobinWeng commented 1 year ago

I follow the https://github.com/marcoslucianops/DeepStream-Yolo-Seg/blob/master/docs/YOLOv8_Seg.md， but have the problem as bellow, when batch >2

WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. WARNING: [TRT]: Using PreviewFeature::kFASTER_DYNAMIC_SHAPES_0805 can help improve performance and resolve potential functional issues. ERROR: [TRT]: 4: [shapeCompiler.cpp::evaluateShapeChecks::1180] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: IShuffleLayer /1/Reshape_12: reshaping failed for tensor: /1/Expand_1_output_0 Reshape would change volume.) ERROR: Build engine failed from config file ERROR: failed to build trt engine. 0:00:08.643111021 3100622 0xffff00002360 ERROR nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() [UID = 1]: build engine file failed 0:00:08.847122917 3100622 0xffff00002360 ERROR nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::generateBackendContext() [UID = 1]: build backend context failed 0:00:08.847179559 3100622 0xffff00002360 ERROR nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() [UID = 1]: generate backend failed, check config file settings 0:00:08.847607312 3100622 0xffff00002360 WARN nvinfer gstnvinfer.cpp:888:gst_nvinfer_start: error: Failed to create NvDsInferContext instance 0:00:08.847624465 3100622 0xffff00002360 WARN nvinfer gstnvinfer.cpp:888:gst_nvinfer_start: error: Config file path: /home/nvidia/DeepStream-Yolo-Seg-master/config_infer_primary_yoloV8_seg.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED ** ERROR: : Failed to set pipeline to PAUSED Quitting nvstreammux: Successfully handled EOS for source_id=0 nvstreammux: Successfully handled EOS for source_id=1 nvstreammux: Successfully handled EOS for source_id=2 nvstreammux: Successfully handled EOS for source_id=3

JeroendenBoef commented 1 year ago

I've been running into the same issue, irregardless of whether the model is exported with a dynamic batch size, static batch size, simplified or not. Setting the batch size to higher than 1 triggers: ERROR: [TRT]: 4: [graphShapeAnalyzer.cpp::analyzeShapes::1294] Error Code 4: Miscellaneous (IShuffleLayer /1/Reshape_12: reshape changes volume. Reshaping [4,400,25606] to [4,100,25606].) ERROR: Build engine failed from config file. This is on deepstream 6.1.1 on a Jetson AGX Xavier

marcoslucianops commented 1 year ago

I will do some tests to check

Czhazha commented 1 year ago

I ran into the same problem too. Could you advise if there is a solution for this issue?

JeroendenBoef commented 1 year ago

It seems to me like the max detections parameter in the line final_dets = batched_dets.new_zeros((b, self.max_det, i)) is causing the issue, as this is set to 100 statically regardless of batch size, causing the reshape to [.., 100, ...]. You can modify the forward pass of the DeepstreamOutput class in export_yoloV8_seg.py to multiply the max detections with the batch size, specifically, if you add to the beginning of the forward pass:

dynamic_max_det = x[0].shape[0] * self.max_det

And then modify line 98 to:

final_dets = batched_dets.new_zeros((b, dynamic_max_det, i))

While this works and the exported ONNX model can be used to successfully build an engine with a dynamic batch size, this does decimate my throughput when used on a single UDP stream input for some reason. Going from ~15FPS on batch size = 1 with an AGX Xavier on Jetpack 5.0.2 to ~3FPS on batch size = 2 and 4. Might be an issue with my modification or perhaps an issue with how I'm applying batching in this context. Would be curious to see if this causes similar issues for others.

marcoslucianops / DeepStream-Yolo-Seg

When batch>2, can not conver onnx to trt #3