zhiqwang / yolort

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.
GNU General Public License v3.0
708 stars 153 forks source link

Dynamic batch dimension not working with ONNX export #452

Open timmh opened 1 year ago

timmh commented 1 year ago

🐛 Describe the bug

Following up on #45, I can't get dynamic batch sizes to work with exported ONNX models. My issue should be reproducible using the following code:

import torch
import numpy as np
import onnx
import onnxruntime
from yolort.models import YOLOv5
from yolort.runtime.ort_helper import export_onnx
from yolort.runtime import PredictorORT

input_weights = "https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5n.pt"
output_weights = "out.onnx"

device = torch.device("cpu")
size = (640, 640)  # Used for pre-processing
size_divisible = 64
score_thresh = 0.35
nms_thresh = 0.45
opset_version = 11
batch_size = 2  # works with batch_size = 1

model = YOLOv5.load_from_yolov5(

model = model.eval()
model = model.to(device)

# Export the ONNX model
export_onnx(model=model, onnx_path=output_weights, opset_version=opset_version, batch_size=batch_size, skip_preprocess=True)

# Load the ONNX model
onnx_model = onnx.load(output_weights)

# Check that the model is well formed

# Create dummy input
dummy_input = np.zeros((batch_size, 3, *size), dtype=np.float32)

# Predict using ONNX model
    PredictorORT(output_weights, device="cpu").predict(dummy_input)
except Exception as e:
    print("Exception using PredictorORT:", e)

    onnxruntime.InferenceSession(output_weights, providers=["CPUExecutionProvider"]).run([], {"images": dummy_input})
except Exception as e:
    print("Exception using onnxruntime:", e)

If the batch_size is different from one, inference using both PredictorORT and onnxruntime fails. I would appreciate any help.


PyTorch version: 1.12.0+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64) GCC version: (GCC) 12.1.0 Clang version: 14.0.6 CMake version: version 3.23.2 Libc version: glibc-2.35

Python version: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.18.14-arch1-1-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.7.64 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2070 with Max-Q Design Nvidia driver version: 515.57 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.4.1 /usr/lib/libcudnn_adv_infer.so.8.4.1 /usr/lib/libcudnn_adv_train.so.8.4.1 /usr/lib/libcudnn_cnn_infer.so.8.4.1 /usr/lib/libcudnn_cnn_train.so.8.4.1 /usr/lib/libcudnn_ops_infer.so.8.4.1 /usr/lib/libcudnn_ops_train.so.8.4.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.2 [pip3] torch==1.12.0+cu116 [pip3] torchaudio==0.12.0+cu116 [pip3] torchvision==0.13.0+cu116 [conda] torch 1.12.0+cu116 pypi_0 pypi [conda] torchaudio 0.12.0+cu116 pypi_0 pypi [conda] torchvision 0.13.0+cu116 pypi_0 pypi

timmh commented 1 year ago

Nevermind, I solved the above issue by calling:

-model = YOLOv5.load_from_yolov5(
+model = YOLO.load_from_yolov5(
-    size=size,
-    size_divisible=size_divisible,

The resulting model seems to accept dynamic batch sizes. However, the output scores, labels, and boxes are only returned for the first image in the batch.