Closed cphyc closed 1 year ago
@apontzen any chance of getting this merged? With a student of mine (@AnatoleStorck) we've found significant improvement for large FFT computations.
The speedup is on the order of 2 when using A40 GPUs compared to performing the FFT in parallel on the processors.
Sorry, yes. I guess there is nothing to test, really, since cuFFTW is supposed to be a drop-in replacement?
Yes. As long as one is able to compile -- which might not be straightforward -- the outputs should be exactly the same.
This adds the minimal changes to compile using cuFFT instead of FFTW3 as a backend for performing the Fourier transforms.
This may help when the FFT is dominating the cost, but is otherwise not offering any speedup (nor does it slow down the code).