nlesc-recruit / cudawrappers

C++ wrapper for the Nvidia C libraries (e.g. CUDA driver, nvrtc, cuFFT etc.)
https://cudawrappers.readthedocs.io/en/latest/
Apache License 2.0
5 stars 5 forks source link

Added support for 2D operations and C2R/R2C FFT #305

Closed wvbbreu closed 3 weeks ago

wvbbreu commented 1 month ago

Description

This pull request includes support for various 2D operations, namely memset2D, memcpyHtoD2DAsync, memcpyDtoH2DAsync, and introduces FFT1DRealToComplex and FFT1DComplexToReal classes for FFT transformations. Finally, the cu::Device::getOrdinal() method is introduced to retrieve the current device ID. Related tests are included in this pull request.

Related issues:

None

Instructions to review the pull request

wvbbreu commented 3 weeks ago

I came up with a solution that relaxes the requirements of cu::DeviceMemory::operator* a bit. Instead of only allowing de-referencing on managed memory, access to non-managed/device memory is also allowed (see example). This change has no impact on the existing cudawrappers API. Access to unallocated memory will still be handled through a sanity check (checkPointerAccess).

Before

cu::DeviceMemory(1024) dev_mem;
cu::DeviceMemory(reinterpret_cast<CUdeviceptr>(static_cast<float*>(static_cast<CUdeviceptr>(dev_mem) + 2));

After

cu::DeviceMemory(1024) dev_mem;
cu::DeviceMemory(reinterpret_cast<CUdeviceptr>(static_cast<float*>(dev_mem) + 2));