Open debugmenot opened 8 months ago
Just to note: the issue looks independent of OpenVino version - got experiments with different. Also built all from scratch many times on different systems - same results.
Update: 1.14.1 also works, but the performance is about 10-15% lower. 1.15 and higher affected by issue.
+@sfatimar, @preetha-intel
any update?
Can we have access to the model. It seems there are 11 subgraphs being formed and 167 nodes are being placed on CPUEP. But it is hard to debug without the model.
@sfatimar
dumbmodel.onnx.zip Dumb model is in attachment. To visualize issue there is also small log of test run:
Here i'm iterating over the same image. All result except first are broken.
`f1race@build_server_nvidia:/opt/ort_dev$ ./test --image images/test/dumb100x100text.jpg [info] Wellcome to first 0.0.1 [info] Available provider: CUDAExecutionProvider [info] Available provider: OpenVINOExecutionProvider [info] Available provider: XnnpackExecutionProvider [info] Available provider: CPUExecutionProvider [-] Selected provider: OpenVINOExecutionProvider Input 0 : name=input.1 Output 0 : name=1389 [-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: A, CLASS: 13, CONF: -0.5359584 [info] CHAR: 4, CLASS: 7, CONF: -2.073846 [info] CHAR: 6, CLASS: 9, CONF: -2.010087 [info] CHAR: 6, CLASS: 9, CONF: -1.8180711 [info] CHAR: D, CLASS: 16, CONF: -2.448421 [info] CHAR: S, CLASS: 31, CONF: -2.7345552 [info] CHAR: , CLASS: 2, CONF: -0.009441723 [info] CHAR: , CLASS: 2, CONF: -0.05160664 [info] CHAR: , CLASS: 2, CONF: -0.097647004
[-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: B, CLASS: 14, CONF: -2.1106374 [info] CHAR: , CLASS: 2, CONF: -2.3829944 [info] CHAR: , CLASS: 0, CONF: -0.31160322 [info] CHAR: , CLASS: 0, CONF: -2.2568073 [info] CHAR: , CLASS: 0, CONF: -2.5611315 [info] CHAR: , CLASS: 0, CONF: -2.2948604 [info] CHAR: , CLASS: 0, CONF: -2.2516015 [info] CHAR: , CLASS: 0, CONF: -2.5611215 [info] CHAR: , CLASS: 0, CONF: -2.294854
[-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: B, CLASS: 14, CONF: -2.1106374 [info] CHAR: , CLASS: 2, CONF: -2.3829944 [info] CHAR: , CLASS: 0, CONF: -0.31160322 [info] CHAR: , CLASS: 0, CONF: -2.2568073 [info] CHAR: , CLASS: 0, CONF: -2.5611315 [info] CHAR: , CLASS: 0, CONF: -2.2948604 [info] CHAR: , CLASS: 0, CONF: -2.2516015 [info] CHAR: , CLASS: 0, CONF: -2.5611215 [info] CHAR: , CLASS: 0, CONF: -2.294854`
Once again, this happened ONLY with OpenVINO EP with Onnxruntime >= 1.15 and any version of OpenVino.
No issues with Onnxruntime 1.13.1 and 1.14.1 (lower not tested).
CPUEP, XnnpackEP, CudaEP works well with this model and same inference code in any version of ORT including the latest one.
I'm seeing a similar issue that occurs in Python with onnxruntime-openvino
version 1.16.0. I am currently stuck using python 3.8, so I cannot test 1.17, but see the following for a test script with three very simple models that show how one of them (BrokenModel
) generates different results than PyTorch when using onnxruntime. If this behavior is different enough from this issue, I'm happy to open another issue to track it.
import numpy as np
import onnxruntime as rt
import torch
from torch import nn
class BrokenModel(nn.Module):
def __init__(self):
super().__init__()
self.conv_1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
self.conv_2 = nn.Conv2d(64, 1, kernel_size=1, stride=1, padding=0)
def forward(self, x):
x = self.conv_1(x)
output = self.conv_2(x)
return output.mean(dim=(1, 2, 3))
class BatchMeanModel(nn.Module):
def __init__(self):
super().__init__()
self.conv_1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
self.conv_2 = nn.Conv2d(64, 1, kernel_size=1, stride=1, padding=0)
def forward(self, x):
x = self.conv_1(x)
output = self.conv_2(x)
return output.mean(dim=(1, 2, 3)), output.mean()
class FewChannelModel(nn.Module):
def __init__(self):
super().__init__()
self.conv_1 = nn.Conv2d(3, 3, kernel_size=3, stride=1, padding=1)
self.conv_2 = nn.Conv2d(3, 1, kernel_size=1, stride=1, padding=0)
def forward(self, x):
x = self.conv_1(x)
output = self.conv_2(x)
return output.mean(dim=(1, 2, 3))
def run_model_pytorch_onnxruntime(arch, path):
model = arch()
model.eval()
print("=" * 80)
print(model)
data = torch.ones(2, 3, 224, 224)
data[0] *= 0
print("Torch:")
for _ in range(2):
result = model(data)
print(result)
print()
torch.onnx.export(
model,
data,
path,
input_names=["input"],
output_names=["output"],
export_params=True,
dynamic_axes={name: {0: "batch_size"} for name in ("input", "output")},
verbose=False,
)
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL
print("Onnxruntime:")
rt_sess = rt.InferenceSession(
path, sess_options, providers=["OpenVINOExecutionProvider"], provider_options=[{"device_id": "GPU"}]
)
for _ in range(2):
outputs = rt_sess.run(None, {"input": data.numpy()})
print(outputs)
print()
if __name__ == "__main__":
run_model_pytorch_onnxruntime(BrokenModel, "broken_model.onnx")
print()
run_model_pytorch_onnxruntime(BatchMeanModel, "batch_mean_model.onnx")
print()
run_model_pytorch_onnxruntime(FewChannelModel, "few_channel_model.onnx")
You'll need to install torch
, onnxruntime-openvino
, and numpy
to run this script.
@sfatimar, Hi! Any updates? I've uploaded the model for bug investigation.
Hi @debugmenot , I have tested the script suggested by @henxing using OpenVINO Toolkit v2024.1 (w_openvino_toolkit_windows_2024.1.0.dev20240405_x86_64) and OVEP v1.18.0 (this version update is now merged and available on the latest main of microsoft/onnruntime repo) on a Windows machine. I ran inference for 5 iterations and the PyTorch vs ORT OpenVINO EP results for every inference iterations were same and OVEP results were quite accurate upto 3 decimal precision against torch results. Please find the below run log for the same -
================================================================================
BrokenModel(
(conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_2): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
)
Torch:
tensor([-0.1026, -0.0569], grad_fn=
Onnxruntime: [array([-0.1026001 , -0.05670166], dtype=float32)] [array([-0.1026001 , -0.05670166], dtype=float32)] [array([-0.1026001 , -0.05670166], dtype=float32)] [array([-0.1026001 , -0.05670166], dtype=float32)] [array([-0.1026001 , -0.05670166], dtype=float32)]
================================================================================
BatchMeanModel(
(conv_1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_2): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
)
Torch:
(tensor([0.1573, 0.1438], grad_fn=
Onnxruntime: [array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)] [array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)] [array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)] [array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)] [array([0.1573365 , 0.14381096], dtype=float32), array(0.15057378, dtype=float32)]
================================================================================
FewChannelModel(
(conv_1): Conv2d(3, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(conv_2): Conv2d(3, 1, kernel_size=(1, 1), stride=(1, 1))
)
Torch:
tensor([-0.1036, -0.1638], grad_fn=
Onnxruntime: [array([-0.10357666, -0.16418457], dtype=float32)] [array([-0.10357666, -0.16418457], dtype=float32)] [array([-0.10357666, -0.16418457], dtype=float32)] [array([-0.10357666, -0.16418457], dtype=float32)] [array([-0.10357666, -0.16418457], dtype=float32)]
@sfatimar
dumbmodel.onnx.zip Dumb model is in attachment. To visualize issue there is also small log of test run:
Here i'm iterating over the same image. All result except first are broken.
`f1race@build_server_nvidia:/opt/ort_dev$ ./test --image images/test/dumb100x100text.jpg [info] Wellcome to first 0.0.1 [info] Available provider: CUDAExecutionProvider [info] Available provider: OpenVINOExecutionProvider [info] Available provider: XnnpackExecutionProvider [info] Available provider: CPUExecutionProvider [-] Selected provider: OpenVINOExecutionProvider Input 0 : name=input.1 Output 0 : name=1389 [-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: A, CLASS: 13, CONF: -0.5359584 [info] CHAR: 4, CLASS: 7, CONF: -2.073846 [info] CHAR: 6, CLASS: 9, CONF: -2.010087 [info] CHAR: 6, CLASS: 9, CONF: -1.8180711 [info] CHAR: D, CLASS: 16, CONF: -2.448421 [info] CHAR: S, CLASS: 31, CONF: -2.7345552 [info] CHAR: , CLASS: 2, CONF: -0.009441723 [info] CHAR: , CLASS: 2, CONF: -0.05160664 [info] CHAR: , CLASS: 2, CONF: -0.097647004
[-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: B, CLASS: 14, CONF: -2.1106374 [info] CHAR: , CLASS: 2, CONF: -2.3829944 [info] CHAR: , CLASS: 0, CONF: -0.31160322 [info] CHAR: , CLASS: 0, CONF: -2.2568073 [info] CHAR: , CLASS: 0, CONF: -2.5611315 [info] CHAR: , CLASS: 0, CONF: -2.2948604 [info] CHAR: , CLASS: 0, CONF: -2.2516015 [info] CHAR: , CLASS: 0, CONF: -2.5611215 [info] CHAR: , CLASS: 0, CONF: -2.294854
[-] Output tensor element count: 390 [info] CHAR: A, CLASS: 13, CONF: -0.11442014 [info] CHAR: B, CLASS: 14, CONF: -2.1106374 [info] CHAR: , CLASS: 2, CONF: -2.3829944 [info] CHAR: , CLASS: 0, CONF: -0.31160322 [info] CHAR: , CLASS: 0, CONF: -2.2568073 [info] CHAR: , CLASS: 0, CONF: -2.5611315 [info] CHAR: , CLASS: 0, CONF: -2.2948604 [info] CHAR: , CLASS: 0, CONF: -2.2516015 [info] CHAR: , CLASS: 0, CONF: -2.5611215 [info] CHAR: , CLASS: 0, CONF: -2.294854`
We are investigating the issues faced while running your model using OpenVINO EP execution provider.
@ankitm3k hi! Did you confirm the bug? If so, any ETA for patch?
Hi @debugmenot, I have investigated the issues with your given onnx model file i.e. dumbmodel.onnx. When performing inference with your model, there were many subgraph partitions with your model due to which most of the nodes were falling back to CPU EP. This causes lower performance as the model graph is completely not running with OpenVINO EP. The above fix enables the whole model to be supported on OpenVINOExecutionProvider and improves performance for your model.
I recommend you to use latest OpenVINO Toolkit v2024.1 along with the above patch to fix the same. I also have investigated the tensor outputs as a result of multiple inference iterations over the same input data and they were found to be consistent / accurate with the first inference results for my build.
@ankitm3k Hi. Update: issue still not fixed... Just checked. Performance is better now... but: Onnxruntime 1.14.1 + OV: [02:42:09.361] [I] [74706] [4] [car] HOMEP: T454BE199 [02:42:11.675] [I] [74706] [6] [car] HOMEP: X212EX197 [02:42:14.785] [I] [74706] [13] [car] HOMEP: O353XM199 [02:42:16.420] [I] [74706] [16] [car] HOMEP: H002XC199 [02:42:17.709] [I] [74706] [18] [car] HOMEP: P346AB197 [02:42:18.525] [I] [74706] [20] [car] HOMEP: A001OT197 [02:42:19.709] [I] [74706] [21] [car] HOMEP: E072MK199 [02:42:21.144] [I] [74706] [23] [car] HOMEP: B797HK197 [02:42:22.028] [I] [74706] [25] [car] HOMEP: O369CX177 [02:42:24.947] [I] [74706] [30] [car] HOMEP: B410KA17 [02:42:25.968] [I] [74706] [33] [car] HOMEP: K558AT197 [02:42:36.141] [I] [74706] [52] [car] HOMEP: C159XT199 [02:42:41.442] [I] [74706] [60] [car] HOMEP: O905OT190 [02:42:43.093] [I] [74706] [63] [car] HOMEP: Y902OA190 [02:42:46.568] [I] [74706] [68] [car] HOMEP: E159YY150 [02:42:47.770] [I] [74706] [71] [car] HOMEP: M181YA197
Onnxruntime 1.18.1 + OpenVinoEP 2024.3 + your GRU OP Patch: [01:58:34.495] [I] [7342] [6] [car] HOMEP: T454BE199 [01:58:36.900] [I] [7342] [10] [car] HOMEP: X2XXX22X22 [01:58:39.927] [I] [7342] [20] [car] HOMEP: O333O33O33 [01:58:41.637] [I] [7342] [23] [car] HOMEP: H000H00H00 [01:58:42.832] [I] [7342] [27] [car] HOMEP: P333P33P33 [01:58:43.725] [I] [7342] [29] [car] HOMEP: A000A00A00 [01:58:44.849] [I] [7342] [30] [car] HOMEP: E000E00E00 [01:58:46.330] [I] [7342] [34] [car] HOMEP: B777B77B77 [01:58:51.137] [I] [7342] [51] [car] HOMEP: K555K55K55 [01:59:01.337] [I] [7342] [63] [car] HOMEP: C1CCC11C11 [01:59:06.587] [I] [7342] [70] [car] HOMEP: O999O99O99 [01:59:08.301] [I] [7342] [74] [car] HOMEP: Y999Y99Y99 [01:59:11.775] [I] [7342] [80] [car] HOMEP: E1EEE11E11 [01:59:13.006] [I] [7342] [85] [car] HOMEP: M111M11M11
i can prepare test project (source+model+image) for you. can you share your email please?
But with patch behaviour is slightly different - results after first is a little bit differs with results without patch but looks approx the same (incorrect)... Is there a dirty fix possible if i change supported ops in data_ops.cc to 1.14.1 version, or something like this? How to do this properly? Cant use legacy ort versions in new build of our software because of API incompatibility.
@ankitm3k I've finally found an issue, at least WHERE it is EXACTLY. If
//{"Unsqueeze", V_2020_4, {"CPU", "GPU"}}, //is commented in data_ops.cc
all works as expected :) Issue needs an investigation.
its strange because unsqueeze is defined exactly same way as in 1.14.1 and 1.13.1 versions...
Describe the issue
When running inference session ONLY with OpenVino EP and ORT > 1.13.1 any results except first are incorrect. There are no issues with ORT == 1.13.1 or CPU/CUDA/XNNPACK on any ORT version.
Getting this issue only on one model (Attention OCR) - model structure you can find at the bottom, other models works fine. seems there are some layers/functions in it that was broken after 1.13.1 build...
Description:
Ubuntu 22.04, Onnxruntime 1.17.1, OpenVino 2023.3, C++ Model: sort of Attention Decoder OCR, converted to onnx from pytorch.
Issue: im inferencing the same image (also tried on sequence of different images durning the inference session). Only the FIRST result is correct. Second result and so on looks like partially "cropped" first result doesnt matter if next input data is new... For example inferencing sequence of images with text "1234567890", "ABCDEFGHJK", "7777777777". Getting: "1234567890", "1200120012", "1200120012"...
Downgrade to ORT 1.13.1 solved the issue, but seems that something is broken after 1.13.1 build. All other EP (CPU, CUDA, XNNPACK) works well with the same code.
Found one reference to similar issue in OpenVino github: https://github.com/openvinotoolkit/openvino/issues/12966
Enabled verbose mode and found that node placements are differ between 1.17.1 (incorrect) and 1.13.1(correct) inference sessions, maybe it's matters, but doesn't explain why first result is always correct...:
correct inference session with node placements(1.13.1):
Incorrect inference result node placement (1.17.1)
as you can see the difference is only on last 8 lines (matmuls token ids differs). Hope it'll help...
F
To reproduce
Look description.
Urgency
Urgent
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.1 release
ONNX Runtime API
C++
Architecture
X64
Execution Provider
OpenVINO
Execution Provider Library Version
2023.3