jipolanco / PencilFFTs.jl

Fast Fourier transforms of MPI-distributed Julia arrays
https://jipolanco.github.io/PencilFFTs.jl/dev/
MIT License
77 stars 7 forks source link

Make transforms work on CUDA arrays #48

Closed jipolanco closed 2 years ago

jipolanco commented 2 years ago

This PR allows performing FFTs on CuArrays.

Concretely, it's now possible to create and apply PencilFFTPlans on CUDA arrays.

Here is a short example:

using PencilFFTs
using PencilArrays
using CUDA
using MPI
using FFTW
using Random

MPI.Init()
comm = MPI.COMM_WORLD
dims = (16, 32, 12)

pen = Pencil(CuArray, dims, comm)
u = PencilArray{Float32}(undef, pen)
@assert parent(u) isa CuArray

plan = PencilFFTPlan(u, Transforms.RFFT())
randn!(u)

uhat = plan * u  # forwards transform
@assert parent(uhat) isa CuArray

v = plan \ uhat  # backwards transform
@assert parent(v) isa CuArray

# Compare with serial FFTW transforms on the CPU
U = gather(u)
Uhat = gather(uhat)
V = gather(v)
if U !== nothing
    @assert U isa Array
    uhat_serial = rfft(U)
    v_serial = irfft(uhat_serial, size(U, 1))
    @show uhat_serial ≈ Uhat
    @show v_serial ≈ V
end

For now this has only been tested on a few CPU processes and single GPU using CUDA. I'm not yet sure if this actually gives correct results, but it's a start. For correct results, this also needs https://github.com/jipolanco/PencilArrays.jl/pull/65, which will be included in PencilArrays v0.17.5.

It would be great if this could be tested on multi-GPU configurations.

Note that an explicit dependency on CUDA.jl is not needed, since functions like plan_rfft(::CuArray, args...) automatically dispatch to CuFFT functions. The only minor issue is that those functions don't support keyword arguments (such as flags = FFTW.MEASURE, which may be passed to FFTW plans), which is one of the reasons why creating plans on CuArrays used to fail before this PR.

This should hopefully close #3.

codecov[bot] commented 2 years ago

Codecov Report

Merging #48 (439dca3) into master (cbe89dc) will decrease coverage by 0.19%. The diff coverage is 95.23%.

@@            Coverage Diff             @@
##           master      #48      +/-   ##
==========================================
- Coverage   98.32%   98.13%   -0.20%     
==========================================
  Files           9        9              
  Lines         419      428       +9     
==========================================
+ Hits          412      420       +8     
- Misses          7        8       +1     
Impacted Files Coverage Δ
src/Transforms/Transforms.jl 94.44% <ø> (ø)
src/plans.jl 96.59% <91.66%> (-0.51%) :arrow_down:
src/Transforms/c2c.jl 100.00% <100.00%> (ø)
src/Transforms/r2c.jl 92.85% <100.00%> (ø)
src/Transforms/r2r.jl 100.00% <100.00%> (ø)
src/operations.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cbe89dc...439dca3. Read the comment docs.

Lightup1 commented 2 years ago

Hi @jipolanco , I wonder whether it needs CUDA-aware MPI or just normal MPI?

jipolanco commented 2 years ago

Hi, you definitely need CUDA-aware MPI if you're using CuArrays.