microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.15k stars 2.86k forks source link

[Feature Request] #14796

Open nullhd2 opened 1 year ago

nullhd2 commented 1 year ago

Describe the feature request

I am using onnxruntime version 1.14.0 with the programming language C#. When I try to use env.CreateAndRegisterAllocator(memInfo, arenaCfg), I encounter the following error: [ErrorCode:InvalidArgument] Only CPU devices are supported for now.

using (var memInfo = new OrtMemoryInfo(OrtMemoryInfo.allocatorCUDA, OrtAllocatorType.ArenaAllocator, 0, OrtMemType.Default))
using (var arenaCfg = new OrtArenaCfg(1024 * 1024 * 1024, 0, 512 * 1024 * 1024, 0))
{
      var env = OrtEnv.Instance();
      env.CreateAndRegisterAllocator(memInfo, arenaCfg);
 }

I would like to request for support for CUDA configuration in the onnxruntime library for C#. Alternatively, I would also like to request for a solution to clear the GPU memory after a model run without clearing the model itself.

Describe scenario use case

My scenario is that I need to use a lot of models on the GPU, and the memory is not enough. I want to clear the calculation cache after each model run, but for the overall time, I don't want to clear the model itself. I have already read - https://github.com/microsoft/onnxruntime/issues/9509#issuecomment-951546580 and - https://github.com/microsoft/onnxruntime/issues/13936; by upgrading to 1.14.0 I can already use runOp.AddRunConfigEntry("memory.enable_memory_arena_shrinkage", "gpu:0"), but now I cannot fully implement the functionality in C# due to the above feature not being supported.

It's also possible that my method is incorrect, so I sincerely request help. 😔🤞

nullhd2 commented 1 year ago

Dear @skottmckay , I was hoping you could assist me with a technical issue I am facing with onnxruntime. I am using C# and onnxruntime version 1.14.0 and I am encountering an error when using env.CreateAndRegisterAllocator with CUDA. I would greatly appreciate your help in resolving this issue. Thank you!

pranavsharma commented 1 year ago

Each session can be associated with one device id only + there is a 1-1 relationship b/w a model and a session. Sharing the allocator b/w multiple sessions might not even work here since the memory would need to be allocated from the device on which the session is running which becomes equivalent to a separate per-device allocator that is already achievable via separate sessions. I'm interested in learning more.

nullhd2 commented 1 year ago

Each session can be associated with one device id only + there is a 1-1 relationship b/w a model and a session. Sharing the allocator b/w multiple sessions might not even work here since the memory would need to be allocated from the device on which the session is running which becomes equivalent to a separate per-device allocator that is already achievable via separate sessions. I'm interested in learning more.

Thank you for your reply! I may not have explained clearly. I need to control a GPU device and clear the computation cache after RUN(), but not clear the loaded model. My implementation idea is to refer to->(https://github.com/microsoft/onnxruntime/issues/9509#issuecomment-951546580), but I encountered difficulties in c# and creating custom user-defined control cache operators is not effective for CUDA.

pranavsharma commented 1 year ago

Yes, https://github.com/microsoft/onnxruntime/issues/9509#issuecomment-951546580 is the right idea for your use case. What difficulties did you encounter in C#?

nullhd2 commented 1 year ago

Yes, #9509 (comment) is the right idea for your use case. What difficulties did you encounter in C#?

C++: OrtArenaCfg* arena_cfg = nullptr; ASSERT_TRUE(api.CreateArenaCfgV2(keys, values, 5, &arena_cfg) == nullptr); As shown in my original request, the only c# implementation I could find was: env.CreateAndRegisterAllocator(memInfo, arenaCfg). I encounter the following error: [ErrorCode:InvalidArgument] Only CPU devices are supported for now.

hosea7456 commented 1 year ago

Each session can be associated with one device id only + there is a 1-1 relationship b/w a model and a session. Sharing the allocator b/w multiple sessions might not even work here since the memory would need to be allocated from the device on which the session is running which becomes equivalent to a separate per-device allocator that is already achievable via separate sessions. I'm interested in learning more.

I am facing the same problem. I created an Ort::Session then do reasoning once, it used 1.6G GPU memory ( 0.9G for reasoning, 0.7G for model loading and others). But I want to clear the GPU memory which just for reasoning because I need the session to wait in the background. I try to use "Ort::OrtRelease(session.release())" cleaning the memory, I make it ( free 0.9G ). But I couldn‘t run the Ort::Session anymore. What can I do to solve it?