sovit-123 / fasterrcnn-pytorch-training-pipeline

PyTorch Faster R-CNN Object Detection on Custom Dataset
MIT License
223 stars 75 forks source link

Reshape error after onnx conversion #123

Open charlieWyatt opened 10 months ago

charlieWyatt commented 10 months ago

I have successfully trained a fasterrcnn_mobilenetv3_large_fpn and can make inferences on it using python.

I get no errors when converting the model to onnx using export.py - python export.py --weights outputs/training/fasterrcnn_mobilenetv3_large_fpn_iNaturalist/best_model.pth --data data_configs/iNaturalist.yaml --out model.onnx

However, when I try to make inference on the onnx model I get reshape errors -

python onnx_inference_image.py --input ../input/iNaturalist/inference_images/ --weights weights/model.onnx --data data_configs/iNaturalist.yaml --show --imgsz 640

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/roi_heads/Reshape_2' Status Message: C:\a_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{115,2479}, requested shape:{-1,4}

My data has 620 classes. When I search the node which is causing the error on Netron, the node with the reshape error is occurring in the middle of the network but closer to the end.

sovit-123 commented 10 months ago

Hello @charlieWyatt There are a few things to consider here. When you export a model to ONNX using the script it is built by default with a static shape of 640x640. That means you can run inference only on images of 640x640. You can adjust the resolution using the appropriate arguments of the export.py script.

I see that during inference, you are already providing --imgsz 640. So, this error should not arise.

Can you please provide me with your training command and if possible a few images for me to reproduce the issue on my side?

charlieWyatt commented 10 months ago

Thanks @sovit-123 I appreciate the response.

I am printing out the image.shape here in onnx_inference_image.py- image And am getting a value of - torch.Size([1, 3, 640, 640])

Here is an example image I am training on - australian_magpie_22133012 australian_ibis_144215602

laughing_kookaburra_153012569

And my inference image - magpie

The training command I am using is - python train.py --model fasterrcnn_mobilenetv3_large_fpn.py --epochs 75 --data data_configs/iNaturalist.yaml --batch 4 --mosaic 0

sovit-123 commented 10 months ago

May I know you ONNX and ONNX Runtime versions? Can you please try with ONNX Runtime 1.14.0 and the corresponding version of ONNX if they are different?

Also, please try to run inference with the same image folder path using the inference.py script by using the PyTorch trained model. Please check that the PyTorch inference is running as expected.

charlieWyatt commented 10 months ago

inference on same image folder path using inference.py - python inference.py --data data_configs/iNaturalist.yaml --weights outputs/training/fasterrcnn_mobilenetv3_large_fpn_iNaturalist/best_model.pth --input ../input/iNaturalist/inference_images/ magpie

Old versions - onnxruntime version - 1.16.3 onnx version - 1.15.0

Tried running it with -

and I am getting the same error.

Interesting, if I try the resnet18 model with all other parameters the same, I am able to export to onnx and make inferences, but would prefer to use the mobilenet model if possible.

sovit-123 commented 10 months ago

That's interesting. In that case, I will check the source code script if it is only running for ResNet18 model. There may be some error.

charlieWyatt commented 10 months ago

I have a suspicion it is because of the redefinition of the roi heads in the mobilenet model on line 16 - image

Also the inference error seems to indicate it is an issue in the roi heads - " onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/roi_heads/Reshape_2' Status Message: C:\a_work\1\s\onnxruntime\core\providers\cpu\tensor\reshape_helper.h:40 onnxruntime::ReshapeHelper::ReshapeHelper size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{115,2479}, requested shape:{-1,4} "

sovit-123 commented 10 months ago

Thanks for the observation. I will take a look. Although it may take some time.

sovit-123 commented 10 months ago

Hi. I tested the ResNet backbone models and a few other models. It seems that they are working fine while the the MobineNet one mentioned by you is not. It may be because of some internal resizing. I will need some more time to test it out.

charlieWyatt commented 10 months ago

No worries, thanks @sovit-123!

soobin508 commented 9 months ago

hi @sovit-123 I also encountered the same issue with the mobilenet reshape. Is there any update regarding this matter?

sovit-123 commented 9 months ago

From a first look, it seemed like an internal transform issue of the model. However, the transforms are the same as the ResNet backbone ones. So, I still need to figure this out.

sovit-123 commented 6 months ago

Hi. Can you train again? I have pushed an update where you can export at 640x640 and run inference at any resolution. Basically dynamic export. While exporting do not give any resolution. Only provide the desired resolution while running the inference.

juliomilani commented 4 months ago

It seems that the problem is in the torchvision implementation. Here is a minimal reproductible example:

obs: The error shows up only in the cropped image. If I comment out img = img[50:200, 150:250] - it works fine.

import torch
import torchvision
import cv2
import requests
import numpy as np
import onnxruntime

print("Exporting model")
model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(weights='DEFAULT')
torch.onnx.export(model, torch.rand(1, 3, 640, 640), '/tmp/model.onnx', 
                    input_names=['input'], output_names=['boxes', 'scores', 'labels'])

print("Downloading image")
r = requests.get('https://docs.opencv.org/4.x/roi.jpg', allow_redirects=True)
open('/tmp/roi.jpg', 'wb').write(r.content)
img = cv2.imread('/tmp/roi.jpg')

img = img[50:200, 150:250]
cv2.imwrite('/tmp/roi2.jpg', img)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
image_dim_dim = cv2.resize(img, (640, 640))
image_dim_dim = np.array(image_dim_dim, dtype=np.float32) / 255.0
image_bchw = np.transpose(np.expand_dims(image_dim_dim, 0), (0, 3, 1, 2))

print("Running inference")
session = onnxruntime.InferenceSession('/tmp/model.onnx', providers=["CPUExecutionProvider"])
outputs = [o.name for o in session.get_outputs()]
inputs = [o.name for o in session.get_inputs()]
prediction = session.run(outputs, {inputs[0]: image_bchw})
print(prediction)

Error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'/roi_heads/Reshape_2' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:39 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) size != 0 && (input_shape_size % size) == 0 was false. The input tensor cannot be reshaped to the requested shape. Input shape:{490,363}, requested shape:{-1,4}

Packages: onnxruntime 1.18.1 torch 2.2.2 torchvision 0.17.2