Closed pmj110119 closed 11 months ago
@agirault @grlee77 @jjomier I'm sorry to at you, thanks so much if anyone can help!!
Hi @pmj110119, this is unfortunately not as easy to do this from the C++ API at the moment, but is definitely possible. I can provide some example guidance later today.
If you need to get an existing image from the host to the device, you can use standard CUDA runtime APIs like cudaMalloc
to allocate device memory and then cudaMemcpy
to transfer data from the host to the device. Once you have a pointer to the device memory, it will be possible to wrap it as a Tensor without making a copy. It is not currently very obvious or well documented on how to do that, so let me find a concrete example to help.
One case that is currently fairly easy from C++ is when you are working with data that is already in a 3rd party library that supports exporting a DLPack DLManagedTensor*
. An example of such a library is NVIDIA's MatX . In that case you can call a method that exports the pointer and pass it directly to the Tensor constructor as in this public Holoscan SDK example:
https://github.com/nvidia-holoscan/holohub/blob/main/applications/multiai_endoscopy/cpp/post-proc-matx-gpu/multi_ai.cu#L131C69-L131C84
An example using cudaMalloc
, cudaMemcpy
, cudaFree
and underlying NVIDIA GXF library APIs is the following example compute method which generates synthetic data, copies it to the device and emits a device tensor.:
void SendTensorTxOp::compute(InputContext&, OutputContext& op_output, ExecutionContext& context) {
// Define the dimensions for the CUDA memory (64 x 32, uint8).
int rows = 768;
int columns = 1024;
int channels = 3;
// Available types below are:
// kInt8
// kUnsigned8
// kInt16
// kUnsigned16
// kInt32
// kUnsigned32
// kInt64
// kUnsigned64
// kFloat32
// kFloat64
// kComplex64
// kComplex128
nvidia::gxf::PrimitiveType element_type = nvidia::gxf::PrimitiveType::kUnsigned8;
int element_size = nvidia::gxf::PrimitiveTypeSize(element_type);
// Shape does not have to be 3D, could be 1D, 2D, etc. instead
nvidia::gxf::Shape shape = nvidia::gxf::Shape{rows, columns, channels};
size_t nbytes = rows * columns * channels * element_size;
// Create a shared pointer for the CUDA memory with a custom deleter that will
// free the device memory via cudaFree when done.
auto pointer = std::shared_ptr<void*>(new void*, [](void** pointer) {
if (pointer != nullptr) {
if (*pointer != nullptr) { CUDA_TRY(cudaFree(*pointer)); }
delete pointer;
}
});
// Allocate and initialize the CUDA memory.
CUDA_TRY(cudaMalloc(pointer.get(), nbytes));
// Replace this intiailization of synthetic host `data` with however your application gets data
// into host memory.
std::vector<uint8_t> data(nbytes);
for (size_t index = 0; index < data.size(); ++index) {
data[index] = (index_ + index) % 256;
}
// copy the data from host to device
CUDA_TRY(cudaMemcpy(*pointer, data.data(), nbytes, cudaMemcpyKind::cudaMemcpyHostToDevice));
// Holoscan Tensor doesn't support direct memory allocation.
// Thus, create an Entity and use GXF tensor to wrap the CUDA memory.
auto out_message = nvidia::gxf::Entity::New(context.context());
auto gxf_tensor = out_message.value().add<nvidia::gxf::Tensor>("out_tensor");
gxf_tensor.value()->wrapMemory(shape,
element_type,
element_size,
nvidia::gxf::ComputeTrivialStrides(shape, element_size),
// change to nvidia::gxf::MemoryStorageType::kCPU if using CPU memory
nvidia::gxf::MemoryStorageType::kDevice,
*pointer,
[orig_pointer = pointer](void*) mutable {
orig_pointer.reset(); // decrement ref count
return nvidia::gxf::Success;
});
// Emit the tensor.
op_output.emit(out_message.value(), "out");
}
where you would need to include at least the following up top to use the CUDA runtime APIs and underlying nvidia::gxf::Tensor
API.
#include <cuda_runtime.h> // probably also automatically pulled in by holoscan/holoscan.hpp
#include <holoscan/holoscan.hpp>
// #include "gxf/std/tensor.hpp" // pulled in automatically by #include <holoscan/holoscan.hpp>
#define CUDA_TRY(stmt) \
({ \
cudaError_t _holoscan_cuda_err = stmt; \
if (cudaSuccess != _holoscan_cuda_err) { \
HOLOSCAN_LOG_ERROR("CUDA Runtime call {} in line {} of file {} failed with '{}' ({}).", \
#stmt, \
__LINE__, \
__FILE__, \
cudaGetErrorString(_holoscan_cuda_err), \
_holoscan_cuda_err); \
} \
_holoscan_cuda_err; \
})
It works, thanks!!!!!!
How to manually load an image into cuda using C++ and display with HolovizOp?
My main doubt is that I don't know what
Class
should use and how to store image data into it.The python version can be easily displayed using
numpy.ndarray
, but C++ version is very difficult to learn. All examples only show how to get the image directly from theVideoStreamReplayerOp
operator.I encountered obstacles. Please help.