Exporting TensorRT (engine) with dynamic batch size failing

Search before asking

[X] I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Export

Bug

I'm trying to export the pretrained yolov5 to engine format with dynamic shape using this command -

python export.py --weights yolov5m_coco.pt --include engine --device 0 --batch-size 256 --dynamic

But it's failing for some reason. Here's the output log.

export: data=data/coco128.yaml, weights=['yolov5m_coco.pt'], imgsz=[640, 640], batch_size=256, device=0, half=False, inplace=False, keras=False, optimize=False, int8=False, dynamic=True, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['engine'] fatal: detected dubious ownership in repository at '/media/dev/aditya/yolov5' To add an exception for this directory, call:

git config --global --add safe.directory /media/dev/aditya/yolov5

YOLOv5 🚀 2022-9-29 Python-3.7.13 torch-1.12.1+cu102 CUDA:0 (Quadro RTX 8000, 48601MiB)

Fusing layers... YOLOv5m summary: 290 layers, 21172173 parameters, 0 gradients

PyTorch: starting from yolov5m_coco.pt with output shape (256, 25200, 85) (40.8 MB)

False starting export with onnx 1.12.0... [W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapperindex_select) (function ComputeConstantFolding) [W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select) (function ComputeConstantFolding) [W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapperindex_select) (function ComputeConstantFolding) ONNX: export failure ❌ 4.1s: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

TensorRT: starting export with TensorRT 8.4.3.1... [10/04/2022-15:08:23] [TRT] [I] [MemUsageChange] Init CUDA: CPU +285, GPU +0, now: CPU 2321, GPU 20679 (MiB) [10/04/2022-15:08:24] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +206, GPU +70, now: CPU 2544, GPU 20749 (MiB) export.py:270: DeprecationWarning: Use set_memory_pool_limit instead. config.max_workspace_size = workspace * 1 << 30 [10/04/2022-15:08:24] [TRT] [I] ---------------------------------------------------------------- [10/04/2022-15:08:24] [TRT] [I] Input filename: yolov5m_coco.onnx [10/04/2022-15:08:24] [TRT] [I] ONNX IR version: 0.0.7 [10/04/2022-15:08:24] [TRT] [I] Opset version: 12 [10/04/2022-15:08:24] [TRT] [I] Producer name: pytorch [10/04/2022-15:08:24] [TRT] [I] Producer version: 1.12.1 [10/04/2022-15:08:24] [TRT] [I] Domain:
[10/04/2022-15:08:24] [TRT] [I] Model version: 0 [10/04/2022-15:08:24] [TRT] [I] Doc string:
[10/04/2022-15:08:24] [TRT] [I] ---------------------------------------------------------------- [10/04/2022-15:08:24] [TRT] [W] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. TensorRT: input "images" with shape(400, 3, 640, 640) DataType.FLOAT TensorRT: output "output0" with shape(400, 25200, 85) DataType.FLOAT TensorRT: building FP32 engine as yolov5m_coco.engine export.py:297: DeprecationWarning: Use build_serialized_network instead. with builder.build_engine(network, config) as engine, open(f, 'wb') as t: [10/04/2022-15:08:24] [TRT] [E] 4: [network.cpp::operator()::3020] Error Code 4: Internal Error (images: kMIN dimensions in profile 0 are [1,3,640,640] but input has static dimensions [400,3,640,640].) TensorRT: export failure ❌ 5.3s: enter

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

ultralytics / yolov5