zhiqwang / yolort

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.
https://zhiqwang.com/yolort
GNU General Public License v3.0
708 stars 153 forks source link

Dynamic batch dimension not working with ONNX export #452

Open timmh opened 1 year ago

timmh commented 1 year ago

🐛 Describe the bug

Following up on #45, I can't get dynamic batch sizes to work with exported ONNX models. My issue should be reproducible using the following code:

import torch
import numpy as np
import onnx
import onnxruntime
from yolort.models import YOLOv5
from yolort.runtime.ort_helper import export_onnx
from yolort.runtime import PredictorORT

input_weights = "https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5n.pt"
output_weights = "out.onnx"

device = torch.device("cpu")
size = (640, 640)  # Used for pre-processing
size_divisible = 64
score_thresh = 0.35
nms_thresh = 0.45
opset_version = 11
batch_size = 2  # works with batch_size = 1

model = YOLOv5.load_from_yolov5(
    input_weights,
    size=size,
    size_divisible=size_divisible,
    score_thresh=score_thresh,
    nms_thresh=nms_thresh,
)

model = model.eval()
model = model.to(device)

# Export the ONNX model
export_onnx(model=model, onnx_path=output_weights, opset_version=opset_version, batch_size=batch_size, skip_preprocess=True)

# Load the ONNX model
onnx_model = onnx.load(output_weights)

# Check that the model is well formed
onnx.checker.check_model(onnx_model)

# Create dummy input
dummy_input = np.zeros((batch_size, 3, *size), dtype=np.float32)

# Predict using ONNX model
try:
    PredictorORT(output_weights, device="cpu").predict(dummy_input)
except Exception as e:
    print("Exception using PredictorORT:", e)

try:
    onnxruntime.InferenceSession(output_weights, providers=["CPUExecutionProvider"]).run([], {"images": dummy_input})
except Exception as e:
    print("Exception using onnxruntime:", e)

If the batch_size is different from one, inference using both PredictorORT and onnxruntime fails. I would appreciate any help.

Versions

PyTorch version: 1.12.0+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64) GCC version: (GCC) 12.1.0 Clang version: 14.0.6 CMake version: version 3.23.2 Libc version: glibc-2.35

Python version: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.18.14-arch1-1-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 11.7.64 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2070 with Max-Q Design Nvidia driver version: 515.57 cuDNN version: Probably one of the following: /usr/lib/libcudnn.so.8.4.1 /usr/lib/libcudnn_adv_infer.so.8.4.1 /usr/lib/libcudnn_adv_train.so.8.4.1 /usr/lib/libcudnn_cnn_infer.so.8.4.1 /usr/lib/libcudnn_cnn_train.so.8.4.1 /usr/lib/libcudnn_ops_infer.so.8.4.1 /usr/lib/libcudnn_ops_train.so.8.4.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.22.2 [pip3] torch==1.12.0+cu116 [pip3] torchaudio==0.12.0+cu116 [pip3] torchvision==0.13.0+cu116 [conda] torch 1.12.0+cu116 pypi_0 pypi [conda] torchaudio 0.12.0+cu116 pypi_0 pypi [conda] torchvision 0.13.0+cu116 pypi_0 pypi

timmh commented 1 year ago

Nevermind, I solved the above issue by calling:

-model = YOLOv5.load_from_yolov5(
+model = YOLO.load_from_yolov5(
    input_weights,
-    size=size,
-    size_divisible=size_divisible,
    score_thresh=score_thresh,
    nms_thresh=nms_thresh,
)

The resulting model seems to accept dynamic batch sizes. However, the output scores, labels, and boxes are only returned for the first image in the batch.