[CUDA] Acquiring a CUDA allocator without loading a session.

gedoensmax commented 8 months ago

Describe the issue

I am aware that I can create and register an allocator to the active environment so that my session does not create it's own allocator, but rather uses the already attached allocator.

static auto ort_env = Ort::Env(ORT_LOGGING_LEVEL_WARNING); 
static auto ort_api = Ort::GetApi(); 
const Ort::ArenaCfg arena_cfg(0, -1, -1, -1); 
std::vector<const char *> dummy; 
ort_api.CreateAndRegisterAllocatorV2(ort_env, "cuda", memory_info_cuda, arena_cfg, dummy.data(), dummy.data(), 0);

After registering with the above code I should be able to reuse this allocator with multiple sessions (Note: How would I go about doing this with C++ API ? I am not able to find any docs on this) My problem is I would like to use this allocator to allocate tensors before loading an ONNX file. As far as I can tell there is no option to do this right ?

To reproduce

Usage of the C++ API.

Urgency

No response

Platform

Windows

OS Version

Winows and Linux

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.17

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU, CUDA, TensorRT

Execution Provider Library Version

No response

tianleiwu commented 8 months ago

There is no explicit API to get cuda allocator from Env. Current allocator API is associated with session.

You can take a look at OrtValue or Ort::Value, which need not associate with session, and it can be used to create tensor. Then you can bind the tensors with inputs/outputs of one or mutliple sessions.

https://onnxruntime.ai/docs/api/c/struct_ort_1_1_value.html

gedoensmax commented 3 months ago

@tianleiwu this would require usage of specific backend APIs which ORT tries to mitigate.

microsoft / onnxruntime