Open tianleiwu opened 1 year ago
I was thinking we already had update_in_place
that took an OrtValue
but apparently not. Another idea is to have an OrtValue creation interface that took buffer pointer, shape, and type, and device info and add update_in_place
that takes an OrtValue
. With OrtValue.update_inplace_from_buffer(source_ptr, bytes)
, we need to pass in source device info (some metadata that is available in an OrtValue
instance) and it seems incomplete to not have an interface that supported creating OrtValue
using a raw buffer in the first place (something that C/C++ API does)
Describe the feature request
For CUDA Graph, the graph inputs shall be in fixed memory.
Currently, there is a python API
OrtValue.update_inplace(np_arr)
which accepts numpy ndarray as source. That means the source shall be in CPU.When the source data (like encoder output) is already in GPU, we have to use external API to copy memory from device to device. It's not convenient.
There are two ways to improve that: (1) add a Python API like
OrtValue.update_inplace_from_buffer(source_ptr, bytes)
(2) Do memory copy internally in ONNX Runtime when users bind an input in device and CUDA graph is enable. We shall copy the input to a fixed address before launching cuda graph.Describe scenario use case
In stable diffusion, there are multiple models. When we use cuda graph, the inputs for models are in GPU, and we need to copy inputs to same memory address to launch cuda graph.
Current solution is that we install cuda-python and use cudaMemcpy API. However, that need install a big library. It is better that this could be supported in ORT internally.