Closed yizhaoyanbo closed 4 years ago
Have you tried exporting the model from PyTorch 1.2? CC @spandantiwari
Have you tried exporting the model from PyTorch 1.2? CC @spandantiwari
I have updated pytorch version from 1.1 to 1.2 and re-exported onnx model. But onnxruntime crashed when implementing the session.Run() api:
// score model & input tensor, get back output tensor
output_tensors = session.Run(Ort::RunOptions{ nullptr }, input_node_names.data(), &input_tensor, 1, output_node_names.data(), 1);
The onnx model exported from pytorch1.2 is in the attach zip files model_pytorch12.zip
Can you please share your full scoring program ? How does input_tensor
look like ? What shape is it ? What pre-processing is required for this model ? Did ORT return any error message and fail while running Run()
?
As an aside, could you please try running with the 1.0 release to see if some bug that was fixed as part of the release solves this issue ? Thanks!
I have update pytorch to version 1.3 and onnx version 1.6.0. I re-exported the onnx model. It crashed at the following code when using the onnxruntime v1.0 cpu version:
printf("Using Onnxruntime C++ API\n");
Ort::Session session(env, model_path, session_options); //!!!! crashed here
I have put the onnx model and sample code to the zip file.
pytorch version:1.3 onnx version:1.6.0 onnxruntime version: 1.0 cpu version op: windows 10
Thanks for sharing. I will take a look.
Where can I get bike.jpg ? truth.jpg won't work as it has noise apart from bike.jpg...
Also, as a simple sanity check, I tried loading the latest model with all optimizers enabled (just like your C++ code) and creating session in Python and the model loads fine. Let me see if I can get the right results in Python when you share bike.jpg and then we can see what happens in C++...
Also - did you try importing the same ONNX model in Open CV DNN module too? If the same model ONNX got imported via Open CV and yields different results than ORT, then it may be an ORT issue. Otherwise, this could be a model export issue. I saw the model, it seems pretty simple - not too many layers, so I am wondering if this could a PyTorch export isue wherein the graph is not structurally correct...
I found bike.jpg in your initial resources zip. I tried it along with ORT 1.0 and the latest model you shared, I could get repro the original issue - other
label has a softmax score of 1.0. Here is my script -
ORT 1.0
import onnxruntime as rt
import numpy as np
from PIL import Image
def preprocess(image):
image = image.resize((224, 224), Image.BILINEAR)
# Convert to BGR
image = np.array(image)[:, :, [2, 1, 0]].astype('float32')
# HWC -> CHW
image = np.transpose(image, [2, 0, 1])
# Normalize
mean_vec = np.array([0.485, 0.456, 0.406])
std_vec = np.array([0.229, 0.224, 0.225])
for i in range(image.shape[0]):
image[i, :, :] = image[i, :, :] - mean_vec[i]
image[i, :, :] = image[i, :, :] / std_vec[i]
image = np.expand_dims(image, axis=0)
return image
def softmax(arr):
sum = 0
max = np.max(arr)
print(max)
for i in arr:
print(i)
sum += np.exp(i - max) # subtract max in numr and denr to avoid overflow
res = []
for i in arr:
res.append(np.exp(i - max) / sum)
return res
so = rt.SessionOptions()
so.graph_optimization_level = rt.GraphOptimizationLevel.ORT_ENABLE_ALL
img = Image.open(r'bike.jpg')
img_data = preprocess(img)
print (img_data.shape)
sess = rt.InferenceSession(r'mobilenetv2_pov4_test.onnx', so)
input_name = sess.get_inputs()[0].name
pred_onnx = sess.run(None, {input_name: img_data})
print(pred_onnx)
print(softmax(pred_onnx[0][0]))
Could you please share your training PyTorch script and let's see if the model got exported right to ONNX via the PyTorch exporter ? Also, please let us know if you were able to load the same ONNX model in Open CV 4 and get right results, in which case, the issue probably lies with ORT? Thanks.
CC: @spandantiwari @BowenBao
thanks @hariharans29
Great, thanks. Can you also please share how you invoke OpenCV 4's dnn module to consume the ONNX file? Might make it faster for me....
I have put the sample code files using OpenCV4.1 dnn module in the following zip file. The onnx file is loaded in the construction function and the prediction is implemented in the predict() function. opencv_classify.zip
@hariharans29 Does ORT will fix this bug in next version? thanks.
Hi @yizhaoyanbo,
I am working on some stuff right now for the next release. Haven't had time to debug this yet and identify a cause for the divergence. I hope to get to this soon. I think the difficult part is to identify cause for the divergence. The fix (I think) will be of low dev cost. I will keep you updated.
Thanks for checking back.
Hi @yizhaoyanbo,
I took a look at it briefly today - I am still not able to narrow down the possible causes. Meanwhile, I noticed that the exported ONNX model was an opset 9 model. Is it possible to try exporting to a newer opset from PyTorch (opset 10 or opset 11) and give it a try again ?
@yizhaoyanbo Were you able to get this working with the latest ORT release and using latest version of PyTorch? let us know if more assistance is needed
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
This issue has been automatically closed due to inactivity. Please reactivate if further support is needed.
Describe the bug I have trained a mobilenetv2 classification model using pytorch1.1 and exported to onnx model. I have tried to do the prediction using opencv4's dnn module and the result is correct. But I got the wrong result using onnxruntime C++ CPU api with the same image and same preprocessing.
Urgency none
System information
To Reproduce
Expected behavior I set the "bike.jpg" as input image. The correct result is "arm" label with confidence 0.99. But the onnxruntime's result is "arm" label with confidece 0.0008.
Screenshots I have put the model, label file and result's screenshots to following .zip file.
Additional context resources.zip
Thanks.