I'm not sure if my issue is related to the issue 446 but here is what I experienced. The first time I load an ONNXRuntime-genai model into GPU memory (CUDA) it is not freed even after I call the function OGADestroyModel(). After that, I have to deal with an occupied memory if I have to load another model during the same execution. The problem doesn't seem to exist after the first loading and the memory is freed correctly, but I don't understand why the first loading occupies the GPU memory permanently.
Here is a code in C that would reproduce the issue, it's inspired from following example
After the compilation, you run the executable with the config path of the genai model as an argument. It's mandatory to link onnxruntime-genai and cuda libraries.
Language : C/C++
ONNXRuntime version : 1.18.0
ONNXRuntime-genai version : 0.4.0-dev
Execution provider : CUDA (v12.3)
I'm not sure if my issue is related to the issue 446 but here is what I experienced. The first time I load an ONNXRuntime-genai model into GPU memory (CUDA) it is not freed even after I call the function OGADestroyModel(). After that, I have to deal with an occupied memory if I have to load another model during the same execution. The problem doesn't seem to exist after the first loading and the memory is freed correctly, but I don't understand why the first loading occupies the GPU memory permanently.
Here is a code in C that would reproduce the issue, it's inspired from following example
After the compilation, you run the executable with the config path of the genai model as an argument. It's mandatory to link onnxruntime-genai and cuda libraries.
Language : C/C++ ONNXRuntime version : 1.18.0 ONNXRuntime-genai version : 0.4.0-dev Execution provider : CUDA (v12.3)