microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.65k stars 2.93k forks source link

ORT memory error with the graph from linspace #18648

Open justinchuby opened 11 months ago

justinchuby commented 11 months ago

Describe the issue

[!NOTE] This is not necessarily a correct model. I just expect ORT to fail gracefully.

Summary

ONNX Runtime raises memory error when executing test ops_test.TestOutputConsistencyFullGraphCPU.test_output_match_opinfo__linspace_cpu_int32 in ONNX Script TorchLib.

To recreate this report, use

CREATE_REPRODUCTION_REPORT=1 python -m pytest onnxscript/tests/function_libs/torch_lib/ops_test.py -k test_output_match_opinfo__linspace_cpu_int32

To reproduce

import google.protobuf.text_format
import numpy as np
from numpy import array, float16, float32, float64, int32, int64
import onnx
import onnxruntime as ort

# Run n times
N = 1

onnx_model_text = """
ir_version: 8
producer_name: "pytorch"
producer_version: "2.2.0"
graph {
  node {
    output: "_val_0"
    name: "Constant_5"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\000\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_0"
    output: "_val_1"
    name: "Cast_6"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_2"
    name: "Constant_7"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\001\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_2"
    output: "_val_3"
    name: "Cast_8"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_4"
    name: "Constant_9"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 1
        raw_data: "\000\000\000\300"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_4"
    output: "_val_5"
    name: "Cast_10"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_6"
    name: "Constant_11"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\375\377\377\377\377\377\377\377"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_6"
    output: "_val_7"
    name: "Cast_12"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_8"
    name: "Constant_13"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\001\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_8"
    output: "_val_9"
    name: "Cast_14"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    input: "_val_1"
    input: "_val_9"
    input: "_val_3"
    output: "_val_10"
    name: "Range_15"
    op_type: "Range"
    doc_string: ""
  }
  node {
    input: "_val_5"
    input: "_val_7"
    output: "_val_11"
    name: "CastLike_16"
    op_type: "CastLike"
    doc_string: ""
  }
  node {
    input: "_val_7"
    input: "_val_11"
    output: "_val_12"
    name: "Sub_17"
    op_type: "Sub"
    doc_string: ""
  }
  node {
    input: "_val_9"
    input: "_val_3"
    output: "_val_13"
    name: "Sub_18"
    op_type: "Sub"
    doc_string: ""
  }
  node {
    input: "_val_12"
    input: "_val_13"
    output: "_val_14"
    name: "Div_19"
    op_type: "Div"
    doc_string: ""
  }
  node {
    input: "_val_10"
    input: "_val_14"
    output: "_val_15"
    name: "Mul_20"
    op_type: "Mul"
    doc_string: ""
  }
  node {
    input: "_val_15"
    input: "_val_11"
    output: "_val_16"
    name: "Add_21"
    op_type: "Add"
    doc_string: ""
  }
  name: "main_graph"
  output {
    name: "_val_16"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_value: 1
          }
        }
      }
    }
  }
  value_info {
    name: "_val_16"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_value: 1
          }
        }
      }
    }
  }
  value_info {
    name: "_val_0"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_1"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_2"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_3"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_4"
    type {
      tensor_type {
        elem_type: 1
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_5"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_6"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_7"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_8"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_9"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_10"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_param: "unk__0"
          }
        }
      }
    }
  }
  value_info {
    name: "_val_11"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_12"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_13"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_14"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_15"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_param: "unk__0"
          }
        }
      }
    }
  }
}
opset_import {
  domain: ""
  version: 18
}
opset_import {
  domain: "pkg.onnxscript.torch_lib.common"
  version: 1
}
functions {
  name: "Rank"
  input: "input"
  output: "return_val"
  node {
    input: "input"
    output: "tmp"
    name: "n0"
    op_type: "Shape"
    domain: ""
  }
  node {
    input: "tmp"
    output: "return_val"
    name: "n1"
    op_type: "Size"
    domain: ""
  }
  doc_string: "Take the rank of the input tensor."
  opset_import {
    domain: ""
    version: 18
  }
  domain: "pkg.onnxscript.torch_lib.common"
}
functions {
  name: "IsScalar"
  input: "input"
  output: "return_val"
  node {
    input: "input"
    output: "tmp"
    name: "n0"
    op_type: "Shape"
    domain: ""
  }
  node {
    input: "tmp"
    output: "tmp_0"
    name: "n1"
    op_type: "Size"
    domain: ""
  }
  node {
    output: "tmp_1"
    name: "n2"
    op_type: "Constant"
    attribute {
      name: "value_int"
      i: 0
      type: INT
    }
    domain: ""
  }
  node {
    input: "tmp_0"
    input: "tmp_1"
    output: "return_val"
    name: "n3"
    op_type: "Equal"
    domain: ""
  }
  doc_string: "Return whether the input has rank 0, or is a scalar."
  opset_import {
    domain: ""
    version: 18
  }
  domain: "pkg.onnxscript.torch_lib.common"
}

"""

ort_inputs = {}

# Set up the inference session
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
onnx_model = onnx.ModelProto()
google.protobuf.text_format.Parse(onnx_model_text, onnx_model)

# Uncomment this line to save the model to a file for examination
# onnx.save_model(onnx_model, "test_output_match_opinfo__linspace_cpu_int32.onnx")

onnx.checker.check_model(onnx_model)
session = ort.InferenceSession(onnx_model.SerializeToString(), session_options, providers=("CPUExecutionProvider",))

# Run the model
for _ in range(N):
    ort_outputs = session.run(None, ort_inputs)

Full error stack


  File "/home/justinchu/dev/onnx-script/onnxscript/tests/function_libs/torch_lib/ops_test_common.py", line 538, in _capture_graph_and_evaluate_torch_script_evaluator
    return _safe_ort_session_run(onnx_model.SerializeToString(), ort_inputs)
  File "/home/justinchu/dev/onnx-script/onnxscript/tests/function_libs/torch_lib/ops_test_common.py", line 351, in _safe_ort_session_run
    raise OrtAbortedError()

The ONNX model text for visualization

<
   ir_version: 8,
   opset_import: ["" : 18, "pkg.onnxscript.torch_lib.common" : 1],
   producer_name: "pytorch",
   producer_version: "2.2.0"
>
main_graph () => (int32[1] _val_16) 
   <int32[1] _val_16, int64 _val_0, int32 _val_1, int64 _val_2, int32 _val_3, float _val_4, int32 _val_5, int64 _val_6, int32 _val_7, int64 _val_8, int32 _val_9, int32[unk__0] _val_10, int32 _val_11, int32 _val_12, int32 _val_13, int32 _val_14, int32[unk__0] _val_15>
{
   _val_0 = Constant <value: tensor = int64 {0}> ()
   _val_1 = Cast <to: int = 6> (_val_0)
   _val_2 = Constant <value: tensor = int64 {1}> ()
   _val_3 = Cast <to: int = 6> (_val_2)
   _val_4 = Constant <value: tensor = float {-2}> ()
   _val_5 = Cast <to: int = 6> (_val_4)
   _val_6 = Constant <value: tensor = int64 {-3}> ()
   _val_7 = Cast <to: int = 6> (_val_6)
   _val_8 = Constant <value: tensor = int64 {1}> ()
   _val_9 = Cast <to: int = 6> (_val_8)
   _val_10 = Range (_val_1, _val_9, _val_3)
   _val_11 = CastLike (_val_5, _val_7)
   _val_12 = Sub (_val_7, _val_11)
   _val_13 = Sub (_val_9, _val_3)
   _val_14 = Div (_val_12, _val_13)
   _val_15 = Mul (_val_10, _val_14)
   _val_16 = Add (_val_15, _val_11)
}
<
  domain: "pkg.onnxscript.torch_lib.common",
  opset_import: ["" : 18]
>
Rank (input) => (return_val)
{
   tmp = Shape (input)
   return_val = Size (tmp)
}
<
  domain: "pkg.onnxscript.torch_lib.common",
  opset_import: ["" : 18]
>
IsScalar (input) => (return_val)
{
   tmp = Shape (input)
   tmp_0 = Size (tmp)
   tmp_1 = Constant <value_int: int = 0> ()
   return_val = Equal (tmp_0, tmp_1)
}

Environment

OS: Linux-6.2.0-1017-azure-x86_64-with-glibc2.35
Python version: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
onnx==1.16.0.dev20231106
onnxruntime==1.17.0
numpy==1.25.1
torch==2.2.0.dev20231128+cpu

To reproduce

Above

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0.dev20231129002

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

justinchuby commented 11 months ago

Note that this is not necessarily a correct model. I just expect it fails gracefully.

yuslepukhin commented 11 months ago

The model that is produced by google.protobuf.text_format has bad initializers that has an extra trailing bytes for some values. Specifically, for the initializer _val_4 it has a trailing byte resulting in a float with 5 bytes which is refused by ORT.

If we take the onnx model and convert it back to text, the files are not the same with respect to the initializer values.

I am using

Name: protobuf Version: 4.25.1

ir_version: 8
producer_name: "pytorch"
producer_version: "2.2.0"
graph {
  node {
    output: "_val_0"
    name: "Constant_5"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\000\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_0"
    output: "_val_1"
    name: "Cast_6"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_2"
    name: "Constant_7"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\001\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_2"
    output: "_val_3"
    name: "Cast_8"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_4"
    name: "Constant_9"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 1
        raw_data: "\000\000\000\303\200"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_4"
    output: "_val_5"
    name: "Cast_10"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_6"
    name: "Constant_11"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\303\275\303\277\303\277\303\277\303\277\303\277\303\277\303\277"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_6"
    output: "_val_7"
    name: "Cast_12"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_8"
    name: "Constant_13"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\001\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_8"
    output: "_val_9"
    name: "Cast_14"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    input: "_val_1"
    input: "_val_9"
    input: "_val_3"
    output: "_val_10"
    name: "Range_15"
    op_type: "Range"
    doc_string: ""
  }
  node {
    input: "_val_5"
    input: "_val_7"
    output: "_val_11"
    name: "CastLike_16"
    op_type: "CastLike"
    doc_string: ""
  }
  node {
    input: "_val_7"
    input: "_val_11"
    output: "_val_12"
    name: "Sub_17"
    op_type: "Sub"
    doc_string: ""
  }
  node {
    input: "_val_9"
    input: "_val_3"
    output: "_val_13"
    name: "Sub_18"
    op_type: "Sub"
    doc_string: ""
  }
  node {
    input: "_val_12"
    input: "_val_13"
    output: "_val_14"
    name: "Div_19"
    op_type: "Div"
    doc_string: ""
  }
  node {
    input: "_val_10"
    input: "_val_14"
    output: "_val_15"
    name: "Mul_20"
    op_type: "Mul"
    doc_string: ""
  }
  node {
    input: "_val_15"
    input: "_val_11"
    output: "_val_16"
    name: "Add_21"
    op_type: "Add"
    doc_string: ""
  }
  name: "main_graph"
  output {
    name: "_val_16"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_value: 1
          }
        }
      }
    }
  }
  value_info {
    name: "_val_16"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_value: 1
          }
        }
      }
    }
  }
  value_info {
    name: "_val_0"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_1"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_2"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_3"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_4"
    type {
      tensor_type {
        elem_type: 1
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_5"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_6"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_7"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_8"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_9"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_10"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_param: "unk__0"
          }
        }
      }
    }
  }
  value_info {
    name: "_val_11"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_12"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_13"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_14"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_15"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_param: "unk__0"
          }
        }
      }
    }
  }
}
opset_import {
  domain: ""
  version: 18
}
opset_import {
  domain: "pkg.onnxscript.torch_lib.common"
  version: 1
}
functions {
  name: "Rank"
  input: "input"
  output: "return_val"
  node {
    input: "input"
    output: "tmp"
    name: "n0"
    op_type: "Shape"
    domain: ""
  }
  node {
    input: "tmp"
    output: "return_val"
    name: "n1"
    op_type: "Size"
    domain: ""
  }
  doc_string: "Take the rank of the input tensor."
  opset_import {
    domain: ""
    version: 18
  }
  domain: "pkg.onnxscript.torch_lib.common"
}
functions {
  name: "IsScalar"
  input: "input"
  output: "return_val"
  node {
    input: "input"
    output: "tmp"
    name: "n0"
    op_type: "Shape"
    domain: ""
  }
  node {
    input: "tmp"
    output: "tmp_0"
    name: "n1"
    op_type: "Size"
    domain: ""
  }
  node {
    output: "tmp_1"
    name: "n2"
    op_type: "Constant"
    attribute {
      name: "value_int"
      i: 0
      type: INT
    }
    domain: ""
  }
  node {
    input: "tmp_0"
    input: "tmp_1"
    output: "return_val"
    name: "n3"
    op_type: "Equal"
    domain: ""
  }
  doc_string: "Return whether the input has rank 0, or is a scalar."
  opset_import {
    domain: ""
    version: 18
  }
  domain: "pkg.onnxscript.torch_lib.common"
}
yuslepukhin commented 11 months ago

Note that this is not necessarily a correct model. I just expect it fails gracefully.

This is what I am seeing as a failure, please, let me know if this is graceful enogh.

D:\memory>python .\gh_repro.py Traceback (most recent call last): File "D:\memory\gh_repro.py", line 488, in session = ort.InferenceSession(onnx_model.SerializeToString(), session_options, providers=("CPUExecutionProvider",)) File "C:\Users\dmitrism\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\dmitrism\AppData\Local\Programs\Python\Python39\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Deserialize tensor _val_6 failed.source and destination buffer size mismatch

justinchuby commented 11 months ago

There was a segfault. I will adjust the repro script to recreate the error

yuslepukhin commented 11 months ago

Ok, so far I can only improve an error message. Otherwise, it errors out due to the model not being valid.

justinchuby commented 11 months ago

Updated

import google.protobuf.text_format
import numpy as np
from numpy import array, float16, float32, float64, int32, int64
import onnx
import onnxruntime as ort

# Run n times
N = 1

onnx_model_text = """
ir_version: 8
producer_name: "pytorch"
producer_version: "2.2.0"
graph {
  node {
    output: "_val_0"
    name: "Constant_5"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\000\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_0"
    output: "_val_1"
    name: "Cast_6"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_2"
    name: "Constant_7"
    op_type: "Constant"
    attribute {
      name: "value"
      t {
        data_type: 7
        raw_data: "\001\000\000\000\000\000\000\000"
      }
      type: TENSOR
    }
    doc_string: ""
  }
  node {
    input: "_val_2"
    output: "_val_3"
    name: "Cast_8"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_4"
    name: "Constant_9"
    op_type: "Constant"
    attribute {
      name: "value_float"
      f: -2.0
      type: FLOAT
    }
    doc_string: ""
  }
  node {
    input: "_val_4"
    output: "_val_5"
    name: "Cast_10"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_6"
    name: "Constant_11"
    op_type: "Constant"
    attribute {
      name: "value_int"
      i: -3
      type: INT
    }
    doc_string: ""
  }
  node {
    input: "_val_6"
    output: "_val_7"
    name: "Cast_12"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    output: "_val_8"
    name: "Constant_13"
    op_type: "Constant"
    attribute {
      name: "value_int"
      i: 1
      type: INT
    }
    doc_string: ""
  }
  node {
    input: "_val_8"
    output: "_val_9"
    name: "Cast_14"
    op_type: "Cast"
    attribute {
      name: "to"
      i: 6
      type: INT
    }
    doc_string: ""
  }
  node {
    input: "_val_1"
    input: "_val_9"
    input: "_val_3"
    output: "_val_10"
    name: "Range_15"
    op_type: "Range"
    doc_string: ""
  }
  node {
    input: "_val_5"
    input: "_val_7"
    output: "_val_11"
    name: "CastLike_16"
    op_type: "CastLike"
    doc_string: ""
  }
  node {
    input: "_val_7"
    input: "_val_11"
    output: "_val_12"
    name: "Sub_17"
    op_type: "Sub"
    doc_string: ""
  }
  node {
    input: "_val_9"
    input: "_val_3"
    output: "_val_13"
    name: "Sub_18"
    op_type: "Sub"
    doc_string: ""
  }
  node {
    input: "_val_12"
    input: "_val_13"
    output: "_val_14"
    name: "Div_19"
    op_type: "Div"
    doc_string: ""
  }
  node {
    input: "_val_10"
    input: "_val_14"
    output: "_val_15"
    name: "Mul_20"
    op_type: "Mul"
    doc_string: ""
  }
  node {
    input: "_val_15"
    input: "_val_11"
    output: "_val_16"
    name: "Add_21"
    op_type: "Add"
    doc_string: ""
  }
  name: "main_graph"
  output {
    name: "_val_16"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_value: 1
          }
        }
      }
    }
  }
  value_info {
    name: "_val_16"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_value: 1
          }
        }
      }
    }
  }
  value_info {
    name: "_val_0"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_1"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_2"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_3"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_4"
    type {
      tensor_type {
        elem_type: 1
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_5"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_6"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_7"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_8"
    type {
      tensor_type {
        elem_type: 7
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_9"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_10"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_param: "unk__0"
          }
        }
      }
    }
  }
  value_info {
    name: "_val_11"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_12"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_13"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_14"
    type {
      tensor_type {
        elem_type: 6
        shape {
        }
      }
    }
  }
  value_info {
    name: "_val_15"
    type {
      tensor_type {
        elem_type: 6
        shape {
          dim {
            dim_param: "unk__0"
          }
        }
      }
    }
  }
}
opset_import {
  domain: ""
  version: 18
}
opset_import {
  domain: "pkg.onnxscript.torch_lib.common"
  version: 1
}
functions {
  name: "Rank"
  input: "input"
  output: "return_val"
  node {
    input: "input"
    output: "tmp"
    name: "n0"
    op_type: "Shape"
    domain: ""
  }
  node {
    input: "tmp"
    output: "return_val"
    name: "n1"
    op_type: "Size"
    domain: ""
  }
  doc_string: "Take the rank of the input tensor."
  opset_import {
    domain: ""
    version: 18
  }
  domain: "pkg.onnxscript.torch_lib.common"
}
functions {
  name: "IsScalar"
  input: "input"
  output: "return_val"
  node {
    input: "input"
    output: "tmp"
    name: "n0"
    op_type: "Shape"
    domain: ""
  }
  node {
    input: "tmp"
    output: "tmp_0"
    name: "n1"
    op_type: "Size"
    domain: ""
  }
  node {
    output: "tmp_1"
    name: "n2"
    op_type: "Constant"
    attribute {
      name: "value_int"
      i: 0
      type: INT
    }
    domain: ""
  }
  node {
    input: "tmp_0"
    input: "tmp_1"
    output: "return_val"
    name: "n3"
    op_type: "Equal"
    domain: ""
  }
  doc_string: "Return whether the input has rank 0, or is a scalar."
  opset_import {
    domain: ""
    version: 18
  }
  domain: "pkg.onnxscript.torch_lib.common"
}

"""

ort_inputs = {}

# Set up the inference session
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
onnx_model = onnx.ModelProto()
google.protobuf.text_format.Parse(onnx_model_text, onnx_model)

# Uncomment this line to save the model to a file for examination
# onnx.save_model(onnx_model, "test_output_match_opinfo__linspace_cpu_int32.onnx")

onnx.checker.check_model(onnx_model)
session = ort.InferenceSession(onnx_model.SerializeToString(), session_options, providers=("CPUExecutionProvider",))

# Run the model
for _ in range(N):
    ort_outputs = session.run(None, ort_inputs)

Output

[1]    65692 floating point exception (core dumped)  /home/justinchu/anaconda3/envs/onnx/bin/python 
yuslepukhin commented 11 months ago

Well, during the optimization (constant folding) the root of the problem becomes clear. The constant inputs are arranged in a way that Div_19 causes division by zero which is fed as a result of Sub_18.

We are not detecting it because we are using Eigen for it in a tight broadcast loop. Catching the Signal on Linux or a SEH on windows is out of the question. (It is not a C++ exception).

The bottom line the model is not valid, I will consider the options, but checking tensors for zeros is also a perf penalty.