microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.55k stars 2.91k forks source link

Different detection output values for C++ and Python with onnxruntime #11123

Open omerwer opened 2 years ago

omerwer commented 2 years ago

Hi,

I'm trying to perform an image detection on this image: salad-test

Bug description: When I run it with the following Python code, the detection is fine: ` import numpy as np import cv2 import onnxruntime

image_BGR = cv2.imread('path_of_the_image.jpg') image_RGB= cv2.cvtColor(image_BGR, cv2.COLOR_BGR2RGB) resized = cv2.resize(image_RGB, (320, 320), interpolation = cv2.INTER_AREA) resized_normalized_fl = np.array(resized, np.float32) - 127) * 0.0078125

output_nodes = ['StatefulPartitionedCall:0', 'StatefulPartitionedCall:1', 'StatefulPartitionedCall:2', 'StatefulPartitionedCall:3'] full_model_onnx = onnxruntime.InferenceSession('salad.onnx') inputs = {inp.name : np.expand_dims(resized_normalized_fl, axis=0).astype(np.float32) for inp in full_model_onnx.get_inputs()} onnx_out = full_model_onnx.run(output_nodes, inputs) `

The output values are: [array([25.], dtype=float32), array([[0.7073383 , 0.6937418 , 0.65383303, 0.63405895, 0.62928486, 0.5617206 , 0.5097608 , 0.42426664, 0.39106843, 0.3309952 , 0.28501803, 0.27418488, 0.22237387, 0.21934518, 0.21848157, 0.21627447, 0.21239078, 0.2072804 , 0.19909242, 0.19692048, 0.19249496, 0.1893397 , 0.18886447, 0.18866774, 0.18062428]], dtype=float32), array([[4., 4., 1., 4., 4., 4., 4., 4., 4., 4., 3., 4., 4., 4., 0., 4., 4., 4., 3., 3., 4., 4., 3., 2., 4.]], dtype=float32), array([[[ 2.6536268e-01, -5.0874799e-04, 4.8290437e-01, 2.2682324e-01], [ 1.3203146e-01, 6.4495105e-01, 2.3465361e-01, 8.0632132e-01], [ 1.0321391e-01, -1.3858557e-02, 9.8806810e-01, 1.0007272e+00], [ 6.8402600e-01, 1.2209380e-01, 9.2899990e-01, 4.6162629e-01], [ 1.1299485e-01, 1.4727369e-01, 2.8355274e-01, 3.7982914e-01], [ 6.5955776e-01, -1.2563296e-02, 8.2872158e-01, 1.9798708e-01], [ 1.2324050e-01, 3.7302625e-01, 2.8221276e-01, 5.6010902e-01], [ 6.8642479e-01, 4.0286705e-02, 8.8163620e-01, 3.0838162e-01], [ 1.3024744e-01, 1.8475701e-01, 3.1574792e-01, 5.2967453e-01], [ 7.3650724e-01, 2.0920816e-01, 9.8306054e-01, 7.8827608e-01], [ 1.0321391e-01, -1.3858557e-02, 9.8806810e-01, 1.0007272e+00], [ 1.0321391e-01, -1.3858557e-02, 9.8806810e-01, 1.0007272e+00], [ 7.9829597e-01, 4.4458807e-01, 9.5156658e-01, 7.8481364e-01], [ 8.4526575e-01, 2.7085415e-01, 9.9576116e-01, 7.3343062e-01], [ 1.0321391e-01, -1.3858557e-02, 9.8806810e-01, 1.0007272e+00], [ 2.7537832e-01, 1.0597825e-01, 3.5489050e-01, 2.2212991e-01], [ 1.9314238e-01, 5.4672018e-02, 3.7764362e-01, 3.2572681e-01], [ 6.7706102e-01, 8.8712126e-01, 8.5774964e-01, 1.0020802e+00], [ 1.3203146e-01, 6.4495105e-01, 2.3465361e-01, 8.0632132e-01], [ 1.0506896e-01, 1.4067614e-01, 2.7989385e-01, 3.6557829e-01], [ 7.7638006e-01, 2.2335689e-01, 9.6234071e-01, 5.7003754e-01], [ 7.9946682e-02, 1.1649597e-01, 3.5405028e-01, 6.5007287e-01], [ 1.2727095e-01, 3.8319668e-01, 2.7833617e-01, 5.6334996e-01], [ 9.9618524e-02, 4.2115450e-03, 1.0073204e+00, 1.0165281e+00], [ 7.9993290e-01, 2.8592539e-01, 1.0374128e+00, 8.5811651e-01]]], dtype=float32)]

But when I run the following C++ onnxruntime code:

include

include <opencv2/opencv.hpp>

include <opencv2/highgui.hpp>

include <opencv2/core/matx.hpp>

include <opencv2/imgcodecs.hpp>

include

include //this is the vector include, for some reason it doesn't show

include // this is the string include, for some reason it doesn't show

Ort::Env env; Ort::AllocatorWithDefaultOptions ort_alloc; Ort::Session session{env, "salad.onnx", Ort::SessionOptions{}}; auto input_num = session.GetInputCount(); auto output_num = session.GetOutputCount(); auto memory_info = Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);

char input_names; char output_names;

std::string image_path = "salad-test.jpg"; cv::Mat imageBGR = cv::imread(image_path, cv::ImreadModes::IMREAD_COLOR);

cv::Mat resizedImageBGR, resizedImageRGB, resizedImage, image; cv::resize(imageBGR, resizedImageBGR, cv::Size(320, 320), cv::InterpolationFlags::INTER_AREA); cv::cvtColor(resizedImageBGR, resizedImageRGB, cv::ColorConversionCodes::COLOR_BGR2RGB); resizedImageRGB.convertTo(resizedImage, CV_TYPE, 1.0);

auto mean = cv::Scalar(127, 127, 127); auto stdv = cv::Scalar(128.0, 128.0, 128.0); image = resizedImage - mean; image /= stdv;

std::vector image_vector; image_vector.assign((float)image.data, (float)image.data + image.total()*image.channels());

input_names = (char )malloc( sizeof(char ) input_num ); output_names = (char )malloc( sizeof(char ) output_num );

for (size_t i = 0; i < input_num; i++) { input_names[i] = session.GetInputName(i, ort_alloc); } for (size_t i = 0; i < output_num; i++) { output_names[i] = session.GetOutputName(i, ort_alloc); }

auto shape = session.GetInputTypeInfo(0).GetTensorTypeAndShapeInfo().GetShape(); auto input_tensor = Ort::Value::CreateTensor(memory_info, image_vector.data(), image_vector.size(), shape.data(), shape.size()); assert(input_tensor.IsTensor());

auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_names, &input_tensor, 1, output_names, output_num);

auto scores = output_tensors[0].GetTensorMutableData(); auto scores_count = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount(); auto boxes = output_tensors[1].GetTensorMutableData(); auto boxes_count = output_tensors[1].GetTensorTypeAndShapeInfo().GetElementCount(); auto num_of_detections = output_tensors[2].GetTensorMutableData(); auto classes = output_tensors[3].GetTensorMutableData(); auto classes_count = output_tensors[3].GetTensorTypeAndShapeInfo().GetElementCount();

I get bad detections and totally different values: ` Num of Detections: 25 Scores: 0.66802 0.63157 0.628214 0.625333 0.622578 0.532839 0.503109 0.401712 0.381414 0.361175 0.333994 0.288905 0.256806 0.233869 0.224066 0.218489 0.21286 0.209497 0.207297 0.206325 0.199572 0.198301 0.191611 0.189473 0.18885 Classes: 4 1 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 3 2 3 4 4 3 4 4 Boxes: 0.131621 -6.21855e-14 0.808781 0.236047 -0.0162454 0.131621 1.00127 0.808781 0.122192 0.131621 0.931003 0.808781 0.107451 0.131621 0.357981 0.236047 0.00713299 0.131621 0.476636 0.236047 -0.0151893 0.131621 0.829044 0.808781 0.123748 0.131621 0.562375 0.236047 0.0355353 0.131621 0.881831 0.808781 0.201352 0.645021 0.981813 0.808781 -0.0162454 0.131621 1.00127 0.808781 0.137133 0 0.532777 0.236047 0.00125378 0.131621 1.01202 0.808781 0.0460412 0.131621 0.379707 0.236047 0.124535 0.131621 0.424029 0.236047 0.0980706 0.131621 0.356737 0.236047 0.0229017 0.131621 0.387237 0.236047 0.436212 0.645021 0.945525 0.808781 0.103641 0.131621 0.357535 0.236047 -0.00511163 0.131621 1.01806 0.808781 0.131621 0.236047 0.808781 0.645021 0.423528 0.645021 0.972323 0.808781 0.66426 0 0.993209 0.808781 0.126204 0.131621 0.565085 0.236047 0.101247 0.131621 0.864389 0.808781 0.157368 0 0.403679 0.236047

Does anyone know it this is a bug in the C++ onnxruntime API or is there a more specific way to perform inference in C++? `

System information OS: Linux Ubuntu 18.04 ONNX Runtime installed from (source or binary): https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/1.10.0 ONNX Runtime version: 1.10 Python version: 3.6.9 Visual Studio version (if applicable): 1.58.2

GCC/Compiler version (if compiling from source): gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

To Reproduce

Expected behavior Same inference output values in Python and C++

Actual behavior Different inference output values in Python and C++

skottmckay commented 2 years ago

Have you checked that the float values in the input are exactly the same between the two calls?

This doesn't look equivalent either

resized_normalized_fl = np.array(resized, np.float32) - 127) * 0.0078125 vs

auto mean = cv::Scalar(127, 127, 127);
auto stdv = cv::Scalar(128.0, 128.0, 128.0);
image = resizedImage - mean;
image /= stdv;
omerwer commented 2 years ago

Have you checked that the float values in the input are exactly the same between the two calls?

This doesn't look equivalent either

resized_normalized_fl = np.array(resized, np.float32) - 127) * 0.0078125 vs

auto mean = cv::Scalar(127, 127, 127);
auto stdv = cv::Scalar(128.0, 128.0, 128.0);
image = resizedImage - mean;
image /= stdv;

When I load the image, in both cases the values are the same. The values start changing after I resize - in the Python version (after I convert to float) it looks like this: array([[[220., 241., 255.], [220., 241., 255.], [220., 241., 255.], ... [ 47., 27., 16.], [ 48., 26., 11.], [ 46., 27., 17.]], ... [[220., 241., 255.], [221., 242., 255.], [223., 243., 255.], ... [ 45., 25., 18.], [ 46., 28., 13.], [ 48., 28., 19.]], ... [[227., 245., 255.], [230., 246., 255.], [231., 244., 255.],

While in the C++ version some values change slightly - instead of [220., 241., 255.], [220., 241., 255.], [220., 241., 255.]

I get something like this (before conversion to float):

[220, 241, 255], [219, 241, 255], [219, 242, 255]

Are you sure the normalization is not equivalent? in both cases I:

Is there something I'm missing?

skottmckay commented 2 years ago

Whether you call from python or C++ the exact same C++ code is running to execute the model. We do not change the bytes provided by the user prior to executing the model.

Based on that, if you're getting diffs either the input is not exactly the same (either the bytes or the shape you're saying the tensor has), or the way you're processing the output is not exactly the same.

vade commented 2 years ago

From our experience a lot of this has to do with the re-sizing algorithm being used, and if it is doing bilinear interpolation, or any other sort of interpolation. There was a good write up here

https://blog.zuru.tech/machine-learning/2021/08/09/the-dangers-behind-image-resizing

omerwer commented 2 years ago

Thank you all for your answers. First, indeed the values after the resize was different in the Python script compared to the C++ script.

So, I resized the image using ffmpeg outside of the code to be 320x320x3 and compared the Python results to the C++. Before the onnxruntime inference, the data was the same in both script:

Python: ``[[[ 0.7578125 0.921875 1. ] [ 0.7421875 0.90625 1. ] [ 0.75 0.8984375 1. ] ... [-0.6171875 -0.8125 -0.8671875] [-0.5625 -0.796875 -0.8125 ] [-0.53125 -0.78125 -0.7578125]]

[[ 0.7734375 0.9140625 1. ] [ 0.765625 0.9140625 1. ] [ 0.7734375 0.90625 1. ] ... [-0.6328125 -0.765625 -0.703125 ] [-0.6171875 -0.7734375 -0.84375 ] [-0.625 -0.7890625 -0.953125 ]]

[[ 0.796875 0.9296875 1. ] [ 0.7890625 0.9140625 1. ] [ 0.7578125 0.8828125 1. ] ... [-0.515625 -0.3359375 -0.0703125] [-0.625 -0.5859375 -0.359375 ] [-0.6796875 -0.7109375 -0.53125 ]]

...

[[ 0.25 0.421875 0.7109375] [ 0.25 0.421875 0.7109375] [ 0.2578125 0.4296875 0.71875 ] ... [-0.1015625 0.1015625 0.3125 ] [-0.0859375 0.109375 0.34375 ] [-0.0859375 0.125 0.3515625]]

[[ 0.2421875 0.4140625 0.703125 ] [ 0.25 0.421875 0.7109375] [ 0.2578125 0.4296875 0.71875 ] ... [-0.1640625 0.0390625 0.25 ] [-0.1484375 0.0546875 0.265625 ] [-0.1484375 0.0625 0.2734375]]

[[ 0.2421875 0.4140625 0.703125 ] [ 0.2421875 0.4140625 0.703125 ] [ 0.25 0.421875 0.7109375] ... [-0.203125 0. 0.1953125] [-0.1953125 0.0078125 0.21875 ] [-0.1953125 0.015625 0.2265625]]]``

C++: {0.7578125, 0.921875, 1, 0.7421875, 0.90625, 1, 0.75, 0.8984375, 1, 0.7734375, 0.921875, 1, 0.8125, 0.9453125, 1, 0.84375, 0.96875, 1, 0.8359375, 0.9609375, 1, 0.8359375, 0.9375, 1, 0.8359375, 0.859375, 1, 0.796875, 0.8203125, 0.96875, 0.71875, 0.7421875, 0.890625, 0.6171875, 0.640625, 0.7890625, 0.5078125, 0.53125, 0.6796875, 0.40625, 0.4296875, 0.578125, 0.3359375, 0.359375, 0.5078125, 0.2890625, 0.3125, 0.4609375, 0.1796875, 0.171875, 0.21875, 0.1640625, 0.15625, 0.203125, 0.140625, 0.1328125, 0.1796875, 0.1171875, 0.109375, 0.15625, 0.0859375, 0.078125, 0.125, 0.0625, 0.0546875, 0.1015625, 0.0390625, 0.03125, 0.078125, 0.03125, 0.0234375, 0.0703125, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.015625, -0.0234375, 0.0234375, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.03125, -0.0390625, 0.0078125, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0234375, -0.03125, 0.015625, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, -0.0078125, -0.015625, 0.03125, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0546875, 0.015625, 0.0078125, 0.0703125, 0.0234375, 0.015625, 0.078125, 0.0234375, 0.015625...}

After the inference, the data was different:

Python [array([25.], dtype=float32), array([[0.7179791 , 0.7119217 , 0.62644404, 0.61120206, 0.60349655, 0.59455496, 0.475785 , 0.45168328, 0.42625162, 0.41206586, 0.35856092, 0.33548713, 0.3174196 , 0.30479813, 0.29680106, 0.2750082 , 0.24047014, 0.23226899, 0.22655705, 0.21002227, 0.20035607, 0.19754827, 0.19366431, 0.19108632, 0.18938068]], dtype=float32), array([[4., 4., 4., 4., 4., 1., 4., 4., 4., 4., 4., 4., 4., 3., 4., 4., 3., 4., 3., 4., 2., 4., 4., 0., 4.]], dtype=float32), array([[[ 0.12975946, 0.6441527 , 0.23351136, 0.80738896], [ 0.2736489 , 0.00185148, 0.4825773 , 0.22635683], [ 0.65909886, -0.01258603, 0.83043504, 0.19587263], [ 0.10156354, 0.1404661 , 0.2706743 , 0.3601166 ], [ 0.6712812 , 0.11650395, 0.9390955 , 0.45991158], [ 0.10430276, -0.0154283 , 0.9877626 , 0.99496937], [ 0.68577504, 0.03489889, 0.88403654, 0.30629593], [ 0.1208953 , 0.3588065 , 0.27319536, 0.56323886], [ 0.75791574, 0.28043842, 0.9900408 , 0.8880708 ], [ 0.1284423 , 0.15103848, 0.31055123, 0.47694856], [ 0.7945792 , 0.44237936, 0.9571679 , 0.7761717 ], [ 0.7418754 , 0.23470417, 0.9499682 , 0.6870462 ], [ 0.10430276, -0.0154283 , 0.9877626 , 0.99496937], [ 0.10430276, -0.0154283 , 0.9877626 , 0.99496937], [ 0.27733627, 0.10644138, 0.35180125, 0.22269616], [ 0.8497372 , 0.27455845, 0.99365884, 0.74148035], [ 0.12975946, 0.6441527 , 0.23351136, 0.80738896], [ 0.6577364 , 0.87984174, 0.9097352 , 0.9911242 ], [ 0.1039575 , 0.14878443, 0.2790619 , 0.37039748], [ 0.28879404, -0.00826786, 0.60837495, 0.17147563], [ 0.10789749, 0.0133456 , 1.0126656 , 1.021101 ], [ 0.28280312, -0.00481155, 0.44011647, 0.15008521], [ 0.07400489, 0.11371401, 0.34831053, 0.6553389 ], [ 0.10430276, -0.0154283 , 0.9877626 , 0.99496937], [ 0.15573569, 0.1945535 , 0.27901036, 0.41661817]]], dtype=float32)]

C++: Num of Detections: 25 Scores: 0.710172 0.694793 0.612079 0.600294 0.598423 0.581148 0.462774 0.43772 0.411057 0.349067 0.323653 0.31031 0.284933 0.248059 0.240939 0.235866 0.213165 0.212527 0.211755 0.199356 0.19532 0.180585 0.179348 0.177948 0.174256 Classes: 4 4 1 4 4 4 4 4 4 4 3 4 4 4 4 4 4 3 0 3 2 4 4 4 3 Boxes: 0.130985 -5.59583e-22 0.804696 0.23496 -0.000445321 0.130985 0.486398 0.23496 -0.0163887 0.130985 0.996883 0.804696 -0.0141229 0.130985 0.831537 0.804696 0.113236 0.130985 0.385396 0.23496 0.121371 0.130985 0.932411 0.804696 0.126016 0.130985 0.563229 0.23496 0.0369672 0.130985 0.88424 0.804696 0.134313 0 0.530164 0.23496 0.325595 0.645794 0.979523 0.804696 -0.0163887 0.130985 0.996883 0.804696 0.238456 0.645794 0.948744 0.804696 0.679079 0 1.00041 0.804696 0.452269 0.645794 0.972619 0.804696 0.274299 0.645794 0.995574 0.804696 -0.0163887 0.130985 0.996883 0.804696 0.104204 0.130985 0.357313 0.23496 0.105327 0.130985 0.369661 0.23496 -0.0163887 0.130985 0.996883 0.804696 0.130985 0.23496 0.804696 0.645794 0.00858939 0.130985 1.02182 0.804696 0.0762396 0.130985 0.634973 0.23496 0.106924 0.130985 0.375005 0.23496 0.00952158 0.130985 0.612958 0.23496 -0.000445321 0.130985 0.486398 0.23496

I understand that the Python is also running a C++ code de facto, but I don't know how it's possible that pre-inference data is the same, while the post-inference data is different.

The only thing I changed in the scripts is the resize:

Python: Deleted the line resized = cv2.resize(image_RGB, (320, 320), interpolation = cv2.INTER_AREA)

C++: Deleted the line cv::resize(imageBGR, resizedImageBGR, cv::Size(320, 320), cv::InterpolationFlags::INTER_AREA);

omerwer commented 2 years ago

Whether you call from python or C++ the exact same C++ code is running to execute the model. We do not change the bytes provided by the user prior to executing the model.

Based on that, if you're getting diffs either the input is not exactly the same (either the bytes or the shape you're saying the tensor has), or the way you're processing the output is not exactly the same.

After checking and re-checking again, with a pre-resized input image, the values of the input tensor in both Python and C++ are identical. After the: Python - full_model_onnx.run(output_nodes, inputs) and C++ - auto output_tensors = session.Run(Ort::RunOptions{nullptr}, input_names, &input_tensor, input_num, output_names, output_num);
The output values are different.

If the same code in running on the background in both Python and C++, is the inputs are the same than the outputs should be the same. Since it's not the case, there are two options that I can think of:

  1. There's a difference in the ONNXRuntime "run" function between Python and C++.
  2. There a difference in the way the data is retrieved - In Python, the return value from the "run" functions is a simple list. In C++, the return value from "run" is an Ort::Value, and in order to get a printable / mutable data we need to use "auto scores = output_tensors[0].GetTensorMutableData();".
skottmckay commented 2 years ago

Can you clarify 'different'. The formatting for printing floating point numbers can differ between numpy and C++, especially in terms of rounding.

For example: https://github.com/microsoft/onnxruntime/blob/2a90922f01e2fa9861dda0c7a769cfed09658167/onnxruntime/test/onnx/main.cc#L534-L537

The same bytes should be being returned in the output buffer for the same input, so any difference from the conversion to string should be minor.

omerwer commented 2 years ago

Can you clarify 'different'. The formatting for printing floating point numbers can differ between numpy and C++, especially in terms of rounding.

For example:

https://github.com/microsoft/onnxruntime/blob/2a90922f01e2fa9861dda0c7a769cfed09658167/onnxruntime/test/onnx/main.cc#L534-L537

The same bytes should be being returned in the output buffer for the same input, so any difference from the conversion to string should be minor.

If you look in one of my comments above, I pasted the output values I get from Python and C++. If, for example, we'll take the first 12 output coordinates of the detection boxes from both, these are the values we get:

Python: [0.12975946, 0.6441527 , 0.23351136, 0.80738896], [ 0.2736489 , 0.00185148, 0.4825773 , 0.22635683], [ 0.65909886, -0.01258603, 0.83043504, 0.19587263]

C++: 0.130985 -5.59583e-22 0.804696 0.23496 -0.000445321 0.130985 0.486398 0.23496 -0.0163887 0.130985 0.996883 0.804696

As you can see her, the coordinates x_min, y_min, xmax, y_max are not the same. sometimes its relatively a small diif, and sometimes its not even close. Since, as I showed in the comments above, the input values are identical in Python and C++, it seems that there's an issue in how the data is processed in the ORT "run" function, or in the data retrieval from the output Ort::Value.

ZiyueWangUoB commented 2 years ago

Any updates on this? I'm facing the same issue.

skottmckay commented 2 years ago

I can't reproduce. The same code runs the model so the same input bytes should produce the same output bytes. Any differences would be from the code before or after the InferenceSession Run call.

I did the following to save the input generated by the python code and the output received by the python code in the format our onnx_test_runner test tool uses to validate models. The test tool is written in C++. It loads the bytes from the saved input/output and executes the model using the C++ API. It reports that the results match.

I believe that validates that the same bytes of input produce the same bytes of output.

import cv2
import numpy as np
import onnxruntime as ort

# do all pre-processing so we have the input we're going to provide to the model
image_BGR = cv2.imread(r'D:\temp\salad_test\salad.jpg')
image_RGB= cv2.cvtColor(image_BGR, cv2.COLOR_BGR2RGB)
resized = cv2.resize(image_RGB, (320, 320), interpolation = cv2.INTER_AREA)
resized_normalized_fl = (np.array(resized, np.float32) - 127) * 0.0078125
input = np.expand_dims(resized_normalized_fl, 0)

# save that input in a protobuf file so we can also use it from the onnx_test_runner C++ code
#
# import a helper script from ORT to write out the bytes for the input and output as protobuf files
# Can probably download this from github as it's standalone
# https://github.com/microsoft/onnxruntime/blob/master/tools/python/onnx_test_data_utils.py
ort_repo_python_tools = r'D:\onnxruntime_repo_from_github\tools\python'
import sys
sys.path.append(ort_repo_python_tools)
import onnx_test_data_utils

onnx_test_data_utils.numpy_to_pb('serving_default_images:0', input,
                                 r'D:\temp\salad_test\test_data_set_0\input_0.pb')

output_names = ['StatefulPartitionedCall:0', 'StatefulPartitionedCall:1', 'StatefulPartitionedCall:2', 'StatefulPartitionedCall:3']
session = ort.InferenceSession(r'D:\temp\salad_test\salad.onnx')
outputs = session.run(output_names, {'serving_default_images:0': input})

# save the output data to validate the output from the C API
i = 0
for o in outputs:
    print(f"Output {i}:{output_names[i]} has shape {np.shape(o)}")
    print(o)
    onnx_test_data_utils.numpy_to_pb(output_names[i], o, r'D:\temp\salad_test\test_data_set_0\output_' + str(i) + '.pb')
    i += 1

Output from python:

Output 0:StatefulPartitionedCall:0 has shape (1,)
[25.]
Output 1:StatefulPartitionedCall:1 has shape (1, 25)
[[0.7073383  0.6937418  0.65383303 0.63405895 0.62928486 0.5617206
  0.5097608  0.42426664 0.39106843 0.3309952  0.28501803 0.27418488
  0.22237387 0.21934518 0.21848157 0.21627447 0.21239078 0.2072804
  0.19909242 0.19692048 0.19249496 0.1893397  0.18886447 0.18866774
  0.18062428]]
Output 2:StatefulPartitionedCall:2 has shape (1, 25)
[[4. 4. 1. 4. 4. 4. 4. 4. 4. 4. 3. 4. 4. 4. 0. 4. 4. 4. 3. 3. 4. 4. 3. 2.
  4.]]
Output 3:StatefulPartitionedCall:3 has shape (1, 25, 4)
[[[ 2.6536268e-01 -5.0874799e-04  4.8290437e-01  2.2682324e-01]
  [ 1.3203146e-01  6.4495105e-01  2.3465361e-01  8.0632132e-01]
  [ 1.0321391e-01 -1.3858557e-02  9.8806810e-01  1.0007272e+00]
  [ 6.8402600e-01  1.2209380e-01  9.2899990e-01  4.6162629e-01]
  [ 1.1299485e-01  1.4727369e-01  2.8355274e-01  3.7982914e-01]
  [ 6.5955776e-01 -1.2563296e-02  8.2872158e-01  1.9798708e-01]
  [ 1.2324050e-01  3.7302625e-01  2.8221276e-01  5.6010902e-01]
  [ 6.8642479e-01  4.0286705e-02  8.8163620e-01  3.0838162e-01]
  [ 1.3024744e-01  1.8475701e-01  3.1574792e-01  5.2967453e-01]
  [ 7.3650724e-01  2.0920816e-01  9.8306054e-01  7.8827608e-01]
  [ 1.0321391e-01 -1.3858557e-02  9.8806810e-01  1.0007272e+00]
  [ 1.0321391e-01 -1.3858557e-02  9.8806810e-01  1.0007272e+00]
  [ 7.9829597e-01  4.4458807e-01  9.5156658e-01  7.8481364e-01]
  [ 8.4526575e-01  2.7085415e-01  9.9576116e-01  7.3343062e-01]
  [ 1.0321391e-01 -1.3858557e-02  9.8806810e-01  1.0007272e+00]
  [ 2.7537832e-01  1.0597825e-01  3.5489050e-01  2.2212991e-01]
  [ 1.9314238e-01  5.4672018e-02  3.7764362e-01  3.2572681e-01]
  [ 6.7706102e-01  8.8712126e-01  8.5774964e-01  1.0020802e+00]
  [ 1.3203146e-01  6.4495105e-01  2.3465361e-01  8.0632132e-01]
  [ 1.0506896e-01  1.4067614e-01  2.7989385e-01  3.6557829e-01]
  [ 7.7638006e-01  2.2335689e-01  9.6234071e-01  5.7003754e-01]
  [ 7.9946682e-02  1.1649597e-01  3.5405028e-01  6.5007287e-01]
  [ 1.2727095e-01  3.8319668e-01  2.7833617e-01  5.6334996e-01]
  [ 9.9618524e-02  4.2115450e-03  1.0073204e+00  1.0165281e+00]
  [ 7.9993290e-01  2.8592539e-01  1.0374128e+00  8.5811651e-01]]]

Output from onnx_test_runner:

onnx_test_runner.exe D:\temp\salad_test\
result:
        Models: 1
        Total test cases: 1
                Succeeded: 1
                Not implemented: 0
                Failed: 0

To be absolutely sure onnx_test_runner is doing what it's supposed to, I added this to the output processing to make the first entry in the 3rd output incorrect:

for o in outputs:
    if i == 3:
        o[0, 0, 0] = o[0, 0, 0] + 10

And that resulted in the expected failure from onnx_test_runner

2022-08-04 19:13:38.5316034 [E:onnxruntime:Default, dataitem_request.cc:212 onnxruntime::test::DataTaskRequestContext::RunImpl] salad_test:output=StatefulPartitionedCall:3:expected 10.2654 (41243eed), got 0.265363 (3e87dd9e), diff: 10, tol=0.0112654 idx=0. 1 of 100 differ
2022-08-04 19:13:38.5391684 [E:onnxruntime:Default, testcase_request.cc:194 onnxruntime::test::TestCaseRequestContext::CalculateAndLogStats] salad_test: result differs. Dataset:D:\temp\salad_test\test_data_set_0

result:
        Models: 1
        Total test cases: 1
                Succeeded: 0
                Not implemented: 0
                Failed: 1
                        Result differs: 1

So the python input bytes -> saved -> read by C++ code -> executed using C++ code produces output that matches the output bytes saved from python.

JHC521PJJ commented 2 years ago

I have the same problem now. Anyone can help me?

omerwer commented 2 years ago

It seems that the issue comes from the way the image is loaded and handled. By loading the image like so:

imageBGR = cv::imread(imagePath, cv::ImreadModes::IMREAD_COLOR); 

cv::Mat resizedImageBGR, resizedImageRGB, resizedImage, image;
cv::resize(imageBGR, resizedImageBGR, cv::Size(HEIGHT, WIDTH), cv::InterpolationFlags::INTER_CUBIC);
cv::cvtColor(resizedImageBGR, resizedImageRGB, cv::ColorConversionCodes::COLOR_BGR2RGB);
resizedImageRGB.convertTo(resizedImage, CV_32F, 1.0 / 255);

And applying the correct normalization, channels switching (HWC to CHW using cv::dnn::blobFromImage()) and post-process, I was able to get correct results on two different models (classification and detection).