vincefn / pyvkfft

Python interface to VkFFT
MIT License
51 stars 6 forks source link

Cupy interop #7

Closed kkotyk closed 2 years ago

kkotyk commented 3 years ago

I found this suggestion works to be able to pass a Cupy array to pyvkfft: https://gist.github.com/gmarkall/09e4cbfe6fda4f7a35be07c8dd926f36

I found that some of my inputs are causing cupy.cuda.runtime.deviceSynchronize() to throw illegal memory access exceptions, but is this a helpful place to start @vincefn?

vincefn commented 3 years ago

Sounds promising indeed. In the end what vkfft needs is the pointer to the data, and optionally the pointer to the stream. For in-place R2C transforms the ability to create a view of an array with a different dtype would also be needed.

For the memory access exceptions you should check with cuda-memcheck.

leofang commented 3 years ago

For the simple purpose of interoperability, pyvkfft just needs to support either __cuda_array_interface__ (for CUDA only) or the newer __dlpack__ (for CUDA/HIP and maybe OpenCL/SYCL later) protocol.

But this would be cool IMHO: Integrate vkFFT as an alternative FFT backend with CuPy...

Not that I have ample time to give it a shot. I've been simply interested in vkFFT for a while and was taking a curious look earlier. The first silly question I have is that it seems to be a header-only implementation?! I couldn't immediately recognize what APIs it exposes to end users so I wasn't able to conceive a wrapper design. Maybe @vincefn could give me a few pointers? 🙂

Disclaimer: I'm a CuPy core contributor.

vincefn commented 3 years ago

For the simple purpose of interoperability, pyvkfft just needs to support either __cuda_array_interface__ (for CUDA only) or the newer __dlpack__ (for CUDA/HIP and maybe OpenCL/SYCL later) protocol.

Thanks for the pointers. I'd like to have this available, but even if it's probably easy I don't have much time for that. PR welcome !

But this would be cool IMHO: Integrate vkFFT as an alternative FFT backend with CuPy...

Not that I have ample time to give it a shot. I've been simply interested in vkFFT for a while and was taking a curious look earlier. The first silly question I have is that it seems to be a header-only implementation?! I couldn't immediately recognize what APIs it exposes to end users so I wasn't able to conceive a wrapper design. Maybe @vincefn could give me a few pointers? 🙂

Yep, it's juste a single -very long- header, C99. Hard to read but there are enough comments to follow how it works. Once you get beyond the fright of the huge header without a modern (doxygen-style) documentation, the meta-programming approach is really elegant and the performance amazing.

As for pointers to include this in CuPy you can look at vkfft-cuda-minimal.cu to start - that's the first thing I wrote to understand how the library works (there may be errors there, it was just a quick shot to get started).

But I think you'd better look directly at the actual compiled module in vkfft_cuda.cu, it's also pretty simple.

Key structures are VkFFTConfiguration, VkFFTApplication, and functions to use initializeVkFFT and VkFFTAppend.

kkotyk commented 3 years ago

@leofang I fully support the idea of adding vkfft to cupy. I think that would be a huge feature to include!

vincefn commented 3 years ago

Hi @kkotyk @leofang , the devel branch now supports CuPY arrays as well as PyCUDA ones, using __cuda_array_interface__.

You'll need an up-to-date vkfft.h as the devel branch also supports dct and Bluestein transforms.

I have only made quick tests (see the cupy/pycuda notebook) so let me know if you see any issue (CuPY is not used in the automated tests yet, and I have not tried supplying a CuPY stream either though the code to support this is there)

vincefn commented 3 years ago

I've also updated the main CUDA test notebook to work either with PyCUDA or CuPY, and everything seems to work fine.

kkotyk commented 2 years ago

@leofang I just wanted to follow up and say that I've now worked with this and it works great. And vkfft is faster and uses way less memory than cupyx.fft.fft. This would be a great addition if the team could integrate this into cupy.

leofang commented 2 years ago

Thank you @vincefn @kkotyk. It's very nice to know. Unfortunately I've moved to NVIDIA and there'd be potential conflict of interest even if I initiate this conversation off hour as an individual open source contributor. I'd suggest you to open an issue in the CuPy issue tracker to discuss with the team directly 🙂