microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.44k stars 2.9k forks source link

Got segmentation fault error when using 'InferenceSession' API #11964

Open baoachun opened 2 years ago

baoachun commented 2 years ago

Describe the bug I'm using onnxruntime Python API to do inference, but there is segmentation fault error when using 'InferenceSession'. image

Urgency emergency

System information

To Reproduce

import onnx
import onnxruntime as ort
import torch
import torchvision

model = torchvision.models.alexnet()
model.eval()
input_names = ['input']
output_names = ['output']
x = torch.randn(1,3,224,224, requires_grad=False)
torch.onnx.export(model, x, 'alexnet.onnx', input_names=input_names, output_names=output_names, verbose='True', opset_version=12)

model_onnx = onnx.load('alexnet.onnx')
onnx.checker.check_model(model_onnx)
session = ort.InferenceSession('alexnet.onnx')

Expected behavior A clear and concise description of what you expected to happen.

Screenshots gdb message image

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

faxu commented 2 years ago

CC @pranavsharma

TTrapper commented 2 years ago

Any updates? I am experiencing the same problem on onnxruntime==1.12.0. When using onnxruntime==1.11.0 it just hangs as described here:

https://github.com/microsoft/onnxruntime/issues/10166

pranavsharma commented 2 years ago

I cannot repro the issue. I used the exact same python script you've pasted in this issue. I get no segfault.

(mypython3) [pranav@pranav-dev-centos79 ~]$ cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) (mypython3) [pranav@pranav-dev-centos79 ~]$ python -V Python 3.8.13 (mypython3) [pranav@pranav-dev-centos79 ~]$ pip list | grep onnx onnx 1.12.0 onnxruntime 1.12.0

TTrapper commented 2 years ago

CentOS Linux release 7.6.1810 (Core) Python 3.8.1 onnx 1.12.0 onnxruntime 1.12.0

Here is the full output I am getting from the above script. No segfault here, but it does crash:

Exported graph: graph(%input : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu),
      %features.0.weight : Float(64, 3, 11, 11, strides=[363, 121, 11, 1], requires_grad=1, device=cpu),
      %features.0.bias : Float(64, strides=[1], requires_grad=1, device=cpu),
      %features.3.weight : Float(192, 64, 5, 5, strides=[1600, 25, 5, 1], requires_grad=1, device=cpu),
      %features.3.bias : Float(192, strides=[1], requires_grad=1, device=cpu),
      %features.6.weight : Float(384, 192, 3, 3, strides=[1728, 9, 3, 1], requires_grad=1, device=cpu),
      %features.6.bias : Float(384, strides=[1], requires_grad=1, device=cpu),
      %features.8.weight : Float(256, 384, 3, 3, strides=[3456, 9, 3, 1], requires_grad=1, device=cpu),
      %features.8.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %features.10.weight : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=1, device=cpu),
      %features.10.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %classifier.1.weight : Float(4096, 9216, strides=[9216, 1], requires_grad=1, device=cpu),
      %classifier.1.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.4.weight : Float(4096, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.4.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.6.weight : Float(1000, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.6.bias : Float(1000, strides=[1], requires_grad=1, device=cpu)):
  %input.1 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides
=[4, 4], onnx_name="Conv_0"](%input, %features.0.weight, %features.0.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_18 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_1"](%input.1) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.4 : Float(1, 64, 27, 27, strides=[46656, 729, 27, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_n
ame="MaxPool_2"](%onnx::MaxPool_18) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.8 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[2, 2, 2, 2], strides=[
1, 1], onnx_name="Conv_3"](%input.4, %features.3.weight, %features.3.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_21 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_4"](%input.8) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.12 : Float(1, 192, 13, 13, strides=[32448, 169, 13, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx
_name="MaxPool_5"](%onnx::MaxPool_21) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.16 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_6"](%input.12, %features.6.weight, %features.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.
py:453:0
  %onnx::Conv_24 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_7"](%input.16) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.20 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_8"](%onnx::Conv_24, %features.8.weight, %features.8.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/
conv.py:453:0
  %onnx::Conv_26 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_9"](%input.20) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.24 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_10"](%onnx::Conv_26, %features.10.weight, %features.10.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modul
es/conv.py:453:0
  %onnx::MaxPool_28 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_11"](%input.24) # /mnt/nlu/users/michael_traynor/onn
xbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.28 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_name
="MaxPool_12"](%onnx::MaxPool_28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %onnx::Flatten_30 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::AveragePool[kernel_shape=[1, 1], strides=[1, 1], onnx_name="AveragePool_13"](%
input.28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1214:0
  %input.32 : Float(1, 9216, strides=[9216, 1], requires_grad=1, device=cpu) = onnx::Flatten[axis=1, onnx_name="Flatten_14"](%onnx::Flatten_30) # /mnt/nlu/users/michael_traynor/onnxbu
g/venv_torch_onnx/lib/python3.8/site-packages/torchvision/models/alexnet.py:50:0
  %input.36 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_15"](%input.32, %classifier.1.weight, %classifie
r.1.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-

packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_33 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_16"](%input.36) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.40 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_17"](%onnx::Gemm_33, %classifier.4.weight, %clas
sifier.4.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_35 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_18"](%input.40) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %output : Float(1, 1000, strides=[1000, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_19"](%onnx::Gemm_35, %classifier.6.weight, %classi
fier.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  return (%output)

Traceback (most recent call last):
  File "example_github.py", line 15, in <module>
    session = ort.InferenceSession('alexnet.onnx')
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_sessio
n
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolI
nterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:
Alexander-Mark commented 3 months ago

Still encountering this issue and can recreate with different numpy versions (I just pin onnxruntime = "==1.16.3" for Cent OS 7 compatibility).

This produces seg fault:

numpy = "==2.0.0"
onnxruntime = "==1.16.3"

This does not:

numpy = "==1.26.4"
onnxruntime = "==1.16.3"

The relevant trace is:

Fatal Python error: Segmentation fault

Current thread 0x000078c3488bb000 (most recent call first):
File ".../python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220 in run

I trigger this error with the following:

session = onnxruntime.InferenceSession("models/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("models/")
inputs = tokenizer(
            texts,
            padding=True,
            truncation=True,
            return_attention_mask=True,
            return_token_type_ids=True,
            return_tensors="np",
        )
preds = session.run(None, dict(inputs))[0]

Sorry I don't have time to dig into this issue further for you.