Open charlieWyatt opened 10 months ago
Hello @charlieWyatt
There are a few things to consider here.
When you export a model to ONNX using the script it is built by default with a static shape of 640x640. That means you can run inference only on images of 640x640. You can adjust the resolution using the appropriate arguments of the export.py
script.
I see that during inference, you are already providing --imgsz 640
. So, this error should not arise.
Can you please provide me with your training command and if possible a few images for me to reproduce the issue on my side?
Thanks @sovit-123 I appreciate the response.
I am printing out the image.shape here in onnx_inference_image.py-
And am getting a value of - torch.Size([1, 3, 640, 640])
Here is an example image I am training on -
And my inference image -
The training command I am using is -
python train.py --model fasterrcnn_mobilenetv3_large_fpn.py --epochs 75 --data data_configs/iNaturalist.yaml --batch 4 --mosaic 0
May I know you ONNX and ONNX Runtime versions? Can you please try with ONNX Runtime 1.14.0 and the corresponding version of ONNX if they are different?
Also, please try to run inference with the same image folder path using the inference.py
script by using the PyTorch trained model. Please check that the PyTorch inference is running as expected.
inference on same image folder path using inference.py -
python inference.py --data data_configs/iNaturalist.yaml --weights outputs/training/fasterrcnn_mobilenetv3_large_fpn_iNaturalist/best_model.pth --input ../input/iNaturalist/inference_images/
Old versions - onnxruntime version - 1.16.3 onnx version - 1.15.0
Tried running it with -
and I am getting the same error.
Interesting, if I try the resnet18 model with all other parameters the same, I am able to export to onnx and make inferences, but would prefer to use the mobilenet model if possible.
That's interesting. In that case, I will check the source code script if it is only running for ResNet18 model. There may be some error.
I have a suspicion it is because of the redefinition of the roi heads in the mobilenet model on line 16 -
Also the inference error seems to indicate it is an issue in the roi heads - " onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/roi_heads/Reshape_2' Status Message: C:\a_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{115,2479}, requested shape:{-1,4} "
Thanks for the observation. I will take a look. Although it may take some time.
Hi. I tested the ResNet backbone models and a few other models. It seems that they are working fine while the the MobineNet one mentioned by you is not. It may be because of some internal resizing. I will need some more time to test it out.
No worries, thanks @sovit-123!
hi @sovit-123 I also encountered the same issue with the mobilenet reshape. Is there any update regarding this matter?
From a first look, it seemed like an internal transform issue of the model. However, the transforms are the same as the ResNet backbone ones. So, I still need to figure this out.
Hi. Can you train again? I have pushed an update where you can export at 640x640 and run inference at any resolution. Basically dynamic export. While exporting do not give any resolution. Only provide the desired resolution while running the inference.
It seems that the problem is in the torchvision implementation. Here is a minimal reproductible example:
obs: The error shows up only in the cropped image. If I comment out img = img[50:200, 150:250]
- it works fine.
import torch
import torchvision
import cv2
import requests
import numpy as np
import onnxruntime
print("Exporting model")
model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(weights='DEFAULT')
torch.onnx.export(model, torch.rand(1, 3, 640, 640), '/tmp/model.onnx',
input_names=['input'], output_names=['boxes', 'scores', 'labels'])
print("Downloading image")
r = requests.get('https://docs.opencv.org/4.x/roi.jpg', allow_redirects=True)
open('/tmp/roi.jpg', 'wb').write(r.content)
img = cv2.imread('/tmp/roi.jpg')
img = img[50:200, 150:250]
cv2.imwrite('/tmp/roi2.jpg', img)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
image_dim_dim = cv2.resize(img, (640, 640))
image_dim_dim = np.array(image_dim_dim, dtype=np.float32) / 255.0
image_bchw = np.transpose(np.expand_dims(image_dim_dim, 0), (0, 3, 1, 2))
print("Running inference")
session = onnxruntime.InferenceSession('/tmp/model.onnx', providers=["CPUExecutionProvider"])
outputs = [o.name for o in session.get_outputs()]
inputs = [o.name for o in session.get_inputs()]
prediction = session.run(outputs, {inputs[0]: image_bchw})
print(prediction)
Error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/roi_heads/Reshape_2' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:39 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{490,363}, requested shape:{-1,4}
Packages: onnxruntime 1.18.1 torch 2.2.2 torchvision 0.17.2
I have successfully trained a fasterrcnn_mobilenetv3_large_fpn and can make inferences on it using python.
I get no errors when converting the model to onnx using export.py -
python export.py --weights outputs/training/fasterrcnn_mobilenetv3_large_fpn_iNaturalist/best_model.pth --data data_configs/iNaturalist.yaml --out model.onnx
However, when I try to make inference on the onnx model I get reshape errors -
python onnx_inference_image.py --input ../input/iNaturalist/inference_images/ --weights weights/model.onnx --data data_configs/iNaturalist.yaml --show --imgsz 640
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/roi_heads/Reshape_2' Status Message: C:\a_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{115,2479}, requested shape:{-1,4}
My data has 620 classes. When I search the node which is causing the error on Netron, the node with the reshape error is occurring in the middle of the network but closer to the end.