microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.34k stars 2.87k forks source link

Segfault when using IO binding to CUDA tensor with CPU execution provider #21865

Open adamreeve opened 1 month ago

adamreeve commented 1 month ago

Describe the issue

When running an inference session with a binding to GPU memory and using the CPU execution provider, onnxruntime will segfault. I would expect a helpful error message instead.

To reproduce

import onnxruntime as ort
import torch
import torch.nn as nn
import numpy as np

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc1 = nn.Linear(4, 2)

    def forward(self, x):
        return self.fc1(x)

torch_model = MyModel()
torch_input = torch.randn(1, 4)
onnx_model = torch.onnx.dynamo_export(torch_model, torch_input)

onnx_model.save("model.onnx")

sess = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
binding = sess.io_binding()

for n in sess.get_inputs():
    binding.bind_ortvalue_input(
            n.name,
            ort.OrtValue.ortvalue_from_numpy(
                np.zeros(shape=n.shape, dtype=np.float32)))

for n in sess.get_outputs():
    binding.bind_ortvalue_output(
            n.name,
            ort.OrtValue.ortvalue_from_numpy(
                np.zeros(shape=n.shape, dtype=np.float32),
                device_type='cuda'))

sess.run_with_iobinding(binding)

This exits with code 139 and prints:

Segmentation fault (core dumped)

If device_type is changed to cpu then it runs correctly.

Urgency

Not urgent.

Platform

Linux

OS Version

Fedora 40

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.