microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.66k stars 2.78k forks source link

[C#] Enable copying of GPU OrtValue to CPU #21244

Open guigzzz opened 2 weeks ago

guigzzz commented 2 weeks ago

Describe the issue

Hey guys,

I can't seem to figure out an easy way to copy an OrtValue that's been allocated on the GPU, back to the CPU.

OrtValue has a really convenient GetTensorDataAsSpan API, which just seems to wrap the raw pointer into a span which obviously won't work when the pointer is for memory on the GPU.

The python API has a nice copy_outputs_to_cpu API, which is exactly what I need.

Can we have the same thing added to the dotnet API ? Either the GetTensorDataAsSpan API could be updated to do the copying automatically, or a new CopyOutputsToCpu API could be added to the IOBinding class, similar to python.

To reproduce

using Microsoft.ML.OnnxRuntime;

var session = new InferenceSession("model.onnx", SessionOptions.MakeSessionOptionWithCudaProvider());

var binding = session.CreateIoBinding();

var alloc = new OrtMemoryInfo(OrtMemoryInfo.allocatorCUDA,
    OrtAllocatorType.DeviceAllocator, 0, OrtMemType.Default);
// var alloc = OrtMemoryInfo.DefaultInstance;

binding.BindOutputToDevice("output", alloc);

var input = new float[4];
var inputValue = OrtValue.CreateTensorValueFromMemory(
    OrtMemoryInfo.DefaultInstance, input.AsMemory(), new long[] { 4 });
binding.BindInput("input", inputValue);

session.RunWithBinding(new RunOptions(), binding);

var output = binding.GetOutputValues().ToArray().First();

var outputSpan = output.GetTensorDataAsSpan<float>();
Console.Out.WriteLine($"Got {outputSpan[0]}");

Fails with:

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Program.<Main>$(System.String[])

Urgency

Not urgent, feature request.

Platform

Linux

OS Version

Ubuntu 20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.2

yuslepukhin commented 2 weeks ago

IOBinding is deprecated.

Unless otherwise instructed, the output OrtValues are created and copied to CPU memory at the end of inferencing.

https://onnxruntime.ai/docs/tutorials/csharp/basic_csharp.html

guigzzz commented 1 week ago

Can you elaborate on 'IOBinding is deprecated', this is news to me. How else are we supposed to efficiently reuse output OrtValues? If it truly is deprecated, then the documentation should be updated to reflect that.

The 'unless otherwise instructed' part is the crucial bit here. I have my output tensors being allocated on the GPU and I only sometimes want to copy them back to the host (time series model, so outputs feed back into the input, this is more efficient if everything stays on the GPU), but currently can't.