Open baoachun opened 2 years ago
CC @pranavsharma
Any updates? I am experiencing the same problem on onnxruntime==1.12.0. When using onnxruntime==1.11.0 it just hangs as described here:
I cannot repro the issue. I used the exact same python script you've pasted in this issue. I get no segfault.
(mypython3) [pranav@pranav-dev-centos79 ~]$ cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) (mypython3) [pranav@pranav-dev-centos79 ~]$ python -V Python 3.8.13 (mypython3) [pranav@pranav-dev-centos79 ~]$ pip list | grep onnx onnx 1.12.0 onnxruntime 1.12.0
CentOS Linux release 7.6.1810 (Core) Python 3.8.1 onnx 1.12.0 onnxruntime 1.12.0
Here is the full output I am getting from the above script. No segfault here, but it does crash:
Exported graph: graph(%input : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu),
%features.0.weight : Float(64, 3, 11, 11, strides=[363, 121, 11, 1], requires_grad=1, device=cpu),
%features.0.bias : Float(64, strides=[1], requires_grad=1, device=cpu),
%features.3.weight : Float(192, 64, 5, 5, strides=[1600, 25, 5, 1], requires_grad=1, device=cpu),
%features.3.bias : Float(192, strides=[1], requires_grad=1, device=cpu),
%features.6.weight : Float(384, 192, 3, 3, strides=[1728, 9, 3, 1], requires_grad=1, device=cpu),
%features.6.bias : Float(384, strides=[1], requires_grad=1, device=cpu),
%features.8.weight : Float(256, 384, 3, 3, strides=[3456, 9, 3, 1], requires_grad=1, device=cpu),
%features.8.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
%features.10.weight : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=1, device=cpu),
%features.10.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
%classifier.1.weight : Float(4096, 9216, strides=[9216, 1], requires_grad=1, device=cpu),
%classifier.1.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
%classifier.4.weight : Float(4096, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
%classifier.4.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
%classifier.6.weight : Float(1000, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
%classifier.6.bias : Float(1000, strides=[1], requires_grad=1, device=cpu)):
%input.1 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides
=[4, 4], onnx_name="Conv_0"](%input, %features.0.weight, %features.0.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
%onnx::MaxPool_18 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_1"](%input.1) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%input.4 : Float(1, 64, 27, 27, strides=[46656, 729, 27, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_n
ame="MaxPool_2"](%onnx::MaxPool_18) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
%input.8 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[2, 2, 2, 2], strides=[
1, 1], onnx_name="Conv_3"](%input.4, %features.3.weight, %features.3.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
%onnx::MaxPool_21 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_4"](%input.8) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%input.12 : Float(1, 192, 13, 13, strides=[32448, 169, 13, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx
_name="MaxPool_5"](%onnx::MaxPool_21) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
%input.16 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_6"](%input.12, %features.6.weight, %features.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.
py:453:0
%onnx::Conv_24 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_7"](%input.16) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%input.20 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_8"](%onnx::Conv_24, %features.8.weight, %features.8.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/
conv.py:453:0
%onnx::Conv_26 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_9"](%input.20) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%input.24 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_10"](%onnx::Conv_26, %features.10.weight, %features.10.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modul
es/conv.py:453:0
%onnx::MaxPool_28 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_11"](%input.24) # /mnt/nlu/users/michael_traynor/onn
xbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%input.28 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_name
="MaxPool_12"](%onnx::MaxPool_28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
%onnx::Flatten_30 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::AveragePool[kernel_shape=[1, 1], strides=[1, 1], onnx_name="AveragePool_13"](%
input.28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1214:0
%input.32 : Float(1, 9216, strides=[9216, 1], requires_grad=1, device=cpu) = onnx::Flatten[axis=1, onnx_name="Flatten_14"](%onnx::Flatten_30) # /mnt/nlu/users/michael_traynor/onnxbu
g/venv_torch_onnx/lib/python3.8/site-packages/torchvision/models/alexnet.py:50:0
%input.36 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_15"](%input.32, %classifier.1.weight, %classifie
r.1.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-
packages/torch/nn/modules/linear.py:114:0
%onnx::Gemm_33 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_16"](%input.36) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%input.40 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_17"](%onnx::Gemm_33, %classifier.4.weight, %clas
sifier.4.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
%onnx::Gemm_35 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_18"](%input.40) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
%output : Float(1, 1000, strides=[1000, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_19"](%onnx::Gemm_35, %classifier.6.weight, %classi
fier.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
return (%output)
Traceback (most recent call last):
File "example_github.py", line 15, in <module>
session = ort.InferenceSession('alexnet.onnx')
File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_sessio
n
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolI
nterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:
Still encountering this issue and can recreate with different numpy
versions (I just pin onnxruntime = "==1.16.3"
for Cent OS 7 compatibility).
This produces seg fault:
numpy = "==2.0.0"
onnxruntime = "==1.16.3"
This does not:
numpy = "==1.26.4"
onnxruntime = "==1.16.3"
The relevant trace is:
Fatal Python error: Segmentation fault
Current thread 0x000078c3488bb000 (most recent call first):
File ".../python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220 in run
I trigger this error with the following:
session = onnxruntime.InferenceSession("models/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("models/")
inputs = tokenizer(
texts,
padding=True,
truncation=True,
return_attention_mask=True,
return_token_type_ids=True,
return_tensors="np",
)
preds = session.run(None, dict(inputs))[0]
Sorry I don't have time to dig into this issue further for you.
Describe the bug I'm using onnxruntime Python API to do inference, but there is segmentation fault error when using 'InferenceSession'.
Urgency emergency
System information
To Reproduce
Expected behavior A clear and concise description of what you expected to happen.
Screenshots gdb message
Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.