Open balenamiaa opened 5 months ago
If we expose direct creation of GPU tensors we'd have to have a data copy somewhere. I'm currently not sure where is the best place for that copy, and if the GPU tensors should be different in the type system somehow to make it clear that they don't have direct access to the memory.
Is the use case in a pipeline of models all resident on the GPU?
Yup, the models are on the GPU. In my case, it's not a direction creation of GPU tensor, but rather wrapping a raw resource, a D3D11 texture, converted into a CUDA resource with CUDA's Direct3D 11 Interoperability. CreateTensorWithDataAsOrtValue is used for this. I don't think the CSharp API has any distinction between an OrtValue that is backed by a GPU buffer or a CPU buffer. In fact, calling GetTensorMutableRawData on a GPU backed OrtValue gives a System.AccessViolationException. I personally think having a distinction in the type system would be nice.
Ok, so if you've made the tensor elsewhere via JNI what kind of type would it be to allow wrapping in an OrtValue? Is it a bare pointer/long?
Do not forget about disposing native resources.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
The C# API:
It allows for creating a tensor from a raw GPU pointer, thus avoiding copying the data to CPU, which is important in cases where even 1 extra millisecond of latency matters. No such API is exposed on the JVM side of things. I can get around it currently by doing FFI with Project Panama with the c-api for the onnxruntime, but it should probably be exposed through JNI and have a Java API for it.
To reproduce
~
Urgency
No response
Platform
Windows
OS Version
~
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
~
ONNX Runtime API
Java
Architecture
X64
Execution Provider
CUDA, TensorRT
Execution Provider Library Version
No response