microsoft / onnxruntime-inference-examples

Examples for using ONNX Runtime for machine learning inferencing.
MIT License
1.21k stars 337 forks source link

Ort::MemoryInfo with CUDA in c++? #94

Open RandomPrototypes opened 2 years ago

RandomPrototypes commented 2 years ago

Hello, I'm trying bind some output values to CUDA to avoid copying back to CPU.

I want to do the c++ equivalent of

io.bind_output(name, 'cuda')

and in the following frames, bind the input to the value from the output of the previous frame.

I guess I should do it by the function BindOutput and configure memoryInfo to use CUDA memory.

io_binding.BindOutput(name, memoryInfo);

But I'm not sure how to configure memoryInfo to use CUDA memory because I couldn't find any example doing it.

By searching on other issues, I saw some code saying it's not possible (unless added recently)

//as of 4th Feb 2022 Onnx only supports allocation on the CPU
Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu( 
    OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);

but also saw some code that seems to do it :

Ort::MemoryInfo memoryInfo("Cuda", OrtAllocatorType::OrtDeviceAllocator, 0, OrtMemType::OrtMemTypeDefault);

Due to the lack of documentation, I could not be sure if this second code is really setting for CUDA memory without CPU copy or not. Does the parameter name ("Cuda") corresponds to the type of memory or is it just a name we give to the memoryInfo object without effect? Does OrtDeviceAllocator means GPU/CUDA memory? Is OrtMemTypeDefault the good value?

Can anyone confirm it?

Many thanks

smsver2 commented 1 year ago

@RandomPrototypes , Same Issue, Have you found any solution to this?

ashwin-999 commented 1 month ago

On a similar boat. I’ve a class which loads model once and runs inference by calling session->run. However in this process I call createTensor each time before run. Any help with an example of how to bind IO for non-dynamic and dynamic input/output would be greatly appreciated