Open justinchuby opened 11 months ago
Note that this is not necessarily a correct model. I just expect it fails gracefully.
The model that is produced by google.protobuf.text_format
has bad initializers that has an extra trailing bytes for some values.
Specifically, for the initializer _val_4
it has a trailing byte resulting in a float with 5 bytes which is refused by ORT.
If we take the onnx model and convert it back to text, the files are not the same with respect to the initializer values.
I am using
Name: protobuf Version: 4.25.1
ir_version: 8
producer_name: "pytorch"
producer_version: "2.2.0"
graph {
node {
output: "_val_0"
name: "Constant_5"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 7
raw_data: "\000\000\000\000\000\000\000\000"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_0"
output: "_val_1"
name: "Cast_6"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_2"
name: "Constant_7"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 7
raw_data: "\001\000\000\000\000\000\000\000"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_2"
output: "_val_3"
name: "Cast_8"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_4"
name: "Constant_9"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 1
raw_data: "\000\000\000\303\200"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_4"
output: "_val_5"
name: "Cast_10"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_6"
name: "Constant_11"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 7
raw_data: "\303\275\303\277\303\277\303\277\303\277\303\277\303\277\303\277"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_6"
output: "_val_7"
name: "Cast_12"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_8"
name: "Constant_13"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 7
raw_data: "\001\000\000\000\000\000\000\000"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_8"
output: "_val_9"
name: "Cast_14"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
input: "_val_1"
input: "_val_9"
input: "_val_3"
output: "_val_10"
name: "Range_15"
op_type: "Range"
doc_string: ""
}
node {
input: "_val_5"
input: "_val_7"
output: "_val_11"
name: "CastLike_16"
op_type: "CastLike"
doc_string: ""
}
node {
input: "_val_7"
input: "_val_11"
output: "_val_12"
name: "Sub_17"
op_type: "Sub"
doc_string: ""
}
node {
input: "_val_9"
input: "_val_3"
output: "_val_13"
name: "Sub_18"
op_type: "Sub"
doc_string: ""
}
node {
input: "_val_12"
input: "_val_13"
output: "_val_14"
name: "Div_19"
op_type: "Div"
doc_string: ""
}
node {
input: "_val_10"
input: "_val_14"
output: "_val_15"
name: "Mul_20"
op_type: "Mul"
doc_string: ""
}
node {
input: "_val_15"
input: "_val_11"
output: "_val_16"
name: "Add_21"
op_type: "Add"
doc_string: ""
}
name: "main_graph"
output {
name: "_val_16"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_value: 1
}
}
}
}
}
value_info {
name: "_val_16"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_value: 1
}
}
}
}
}
value_info {
name: "_val_0"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_1"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_2"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_3"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_4"
type {
tensor_type {
elem_type: 1
shape {
}
}
}
}
value_info {
name: "_val_5"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_6"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_7"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_8"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_9"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_10"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_param: "unk__0"
}
}
}
}
}
value_info {
name: "_val_11"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_12"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_13"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_14"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_15"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_param: "unk__0"
}
}
}
}
}
}
opset_import {
domain: ""
version: 18
}
opset_import {
domain: "pkg.onnxscript.torch_lib.common"
version: 1
}
functions {
name: "Rank"
input: "input"
output: "return_val"
node {
input: "input"
output: "tmp"
name: "n0"
op_type: "Shape"
domain: ""
}
node {
input: "tmp"
output: "return_val"
name: "n1"
op_type: "Size"
domain: ""
}
doc_string: "Take the rank of the input tensor."
opset_import {
domain: ""
version: 18
}
domain: "pkg.onnxscript.torch_lib.common"
}
functions {
name: "IsScalar"
input: "input"
output: "return_val"
node {
input: "input"
output: "tmp"
name: "n0"
op_type: "Shape"
domain: ""
}
node {
input: "tmp"
output: "tmp_0"
name: "n1"
op_type: "Size"
domain: ""
}
node {
output: "tmp_1"
name: "n2"
op_type: "Constant"
attribute {
name: "value_int"
i: 0
type: INT
}
domain: ""
}
node {
input: "tmp_0"
input: "tmp_1"
output: "return_val"
name: "n3"
op_type: "Equal"
domain: ""
}
doc_string: "Return whether the input has rank 0, or is a scalar."
opset_import {
domain: ""
version: 18
}
domain: "pkg.onnxscript.torch_lib.common"
}
Note that this is not necessarily a correct model. I just expect it fails gracefully.
This is what I am seeing as a failure, please, let me know if this is graceful enogh.
D:\memory>python .\gh_repro.py Traceback (most recent call last): File "D:\memory\gh_repro.py", line 488, in
session = ort.InferenceSession(onnx_model.SerializeToString(), session_options, providers=("CPUExecutionProvider",)) File "C:\Users\dmitrism\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\dmitrism\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor _val_6 failed.source and destination buffer size mismatch
There was a segfault. I will adjust the repro script to recreate the error
Ok, so far I can only improve an error message. Otherwise, it errors out due to the model not being valid.
import google.protobuf.text_format
import numpy as np
from numpy import array, float16, float32, float64, int32, int64
import onnx
import onnxruntime as ort
# Run n times
N = 1
onnx_model_text = """
ir_version: 8
producer_name: "pytorch"
producer_version: "2.2.0"
graph {
node {
output: "_val_0"
name: "Constant_5"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 7
raw_data: "\000\000\000\000\000\000\000\000"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_0"
output: "_val_1"
name: "Cast_6"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_2"
name: "Constant_7"
op_type: "Constant"
attribute {
name: "value"
t {
data_type: 7
raw_data: "\001\000\000\000\000\000\000\000"
}
type: TENSOR
}
doc_string: ""
}
node {
input: "_val_2"
output: "_val_3"
name: "Cast_8"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_4"
name: "Constant_9"
op_type: "Constant"
attribute {
name: "value_float"
f: -2.0
type: FLOAT
}
doc_string: ""
}
node {
input: "_val_4"
output: "_val_5"
name: "Cast_10"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_6"
name: "Constant_11"
op_type: "Constant"
attribute {
name: "value_int"
i: -3
type: INT
}
doc_string: ""
}
node {
input: "_val_6"
output: "_val_7"
name: "Cast_12"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
output: "_val_8"
name: "Constant_13"
op_type: "Constant"
attribute {
name: "value_int"
i: 1
type: INT
}
doc_string: ""
}
node {
input: "_val_8"
output: "_val_9"
name: "Cast_14"
op_type: "Cast"
attribute {
name: "to"
i: 6
type: INT
}
doc_string: ""
}
node {
input: "_val_1"
input: "_val_9"
input: "_val_3"
output: "_val_10"
name: "Range_15"
op_type: "Range"
doc_string: ""
}
node {
input: "_val_5"
input: "_val_7"
output: "_val_11"
name: "CastLike_16"
op_type: "CastLike"
doc_string: ""
}
node {
input: "_val_7"
input: "_val_11"
output: "_val_12"
name: "Sub_17"
op_type: "Sub"
doc_string: ""
}
node {
input: "_val_9"
input: "_val_3"
output: "_val_13"
name: "Sub_18"
op_type: "Sub"
doc_string: ""
}
node {
input: "_val_12"
input: "_val_13"
output: "_val_14"
name: "Div_19"
op_type: "Div"
doc_string: ""
}
node {
input: "_val_10"
input: "_val_14"
output: "_val_15"
name: "Mul_20"
op_type: "Mul"
doc_string: ""
}
node {
input: "_val_15"
input: "_val_11"
output: "_val_16"
name: "Add_21"
op_type: "Add"
doc_string: ""
}
name: "main_graph"
output {
name: "_val_16"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_value: 1
}
}
}
}
}
value_info {
name: "_val_16"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_value: 1
}
}
}
}
}
value_info {
name: "_val_0"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_1"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_2"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_3"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_4"
type {
tensor_type {
elem_type: 1
shape {
}
}
}
}
value_info {
name: "_val_5"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_6"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_7"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_8"
type {
tensor_type {
elem_type: 7
shape {
}
}
}
}
value_info {
name: "_val_9"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_10"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_param: "unk__0"
}
}
}
}
}
value_info {
name: "_val_11"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_12"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_13"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_14"
type {
tensor_type {
elem_type: 6
shape {
}
}
}
}
value_info {
name: "_val_15"
type {
tensor_type {
elem_type: 6
shape {
dim {
dim_param: "unk__0"
}
}
}
}
}
}
opset_import {
domain: ""
version: 18
}
opset_import {
domain: "pkg.onnxscript.torch_lib.common"
version: 1
}
functions {
name: "Rank"
input: "input"
output: "return_val"
node {
input: "input"
output: "tmp"
name: "n0"
op_type: "Shape"
domain: ""
}
node {
input: "tmp"
output: "return_val"
name: "n1"
op_type: "Size"
domain: ""
}
doc_string: "Take the rank of the input tensor."
opset_import {
domain: ""
version: 18
}
domain: "pkg.onnxscript.torch_lib.common"
}
functions {
name: "IsScalar"
input: "input"
output: "return_val"
node {
input: "input"
output: "tmp"
name: "n0"
op_type: "Shape"
domain: ""
}
node {
input: "tmp"
output: "tmp_0"
name: "n1"
op_type: "Size"
domain: ""
}
node {
output: "tmp_1"
name: "n2"
op_type: "Constant"
attribute {
name: "value_int"
i: 0
type: INT
}
domain: ""
}
node {
input: "tmp_0"
input: "tmp_1"
output: "return_val"
name: "n3"
op_type: "Equal"
domain: ""
}
doc_string: "Return whether the input has rank 0, or is a scalar."
opset_import {
domain: ""
version: 18
}
domain: "pkg.onnxscript.torch_lib.common"
}
"""
ort_inputs = {}
# Set up the inference session
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
onnx_model = onnx.ModelProto()
google.protobuf.text_format.Parse(onnx_model_text, onnx_model)
# Uncomment this line to save the model to a file for examination
# onnx.save_model(onnx_model, "test_output_match_opinfo__linspace_cpu_int32.onnx")
onnx.checker.check_model(onnx_model)
session = ort.InferenceSession(onnx_model.SerializeToString(), session_options, providers=("CPUExecutionProvider",))
# Run the model
for _ in range(N):
ort_outputs = session.run(None, ort_inputs)
[1] 65692 floating point exception (core dumped) /home/justinchu/anaconda3/envs/onnx/bin/python
Well, during the optimization (constant folding) the root of the problem becomes clear. The constant inputs are arranged in a way that Div_19
causes division by zero which is fed as a result of Sub_18
.
We are not detecting it because we are using Eigen
for it in a tight broadcast loop. Catching the Signal on Linux or a SEH on windows is out of the question. (It is not a C++ exception).
The bottom line the model is not valid, I will consider the options, but checking tensors for zeros is also a perf penalty.
Describe the issue
Summary
ONNX Runtime raises memory error when executing test
ops_test.TestOutputConsistencyFullGraphCPU.test_output_match_opinfo__linspace_cpu_int32
in ONNX ScriptTorchLib
.To recreate this report, use
To reproduce
Full error stack
The ONNX model text for visualization
Environment
To reproduce
Above
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.0.dev20231129002
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response