jipolanco / PencilFFTs.jl

Fast Fourier transforms of MPI-distributed Julia arrays
https://jipolanco.github.io/PencilFFTs.jl/dev/
MIT License
77 stars 7 forks source link

Combining cpu and gpu #50

Closed Lightup1 closed 2 years ago

Lightup1 commented 2 years ago

Since the package is now compatible with CUDA. Is it possible to combine cpu and gpu together to get ultimate performance? There is a similar project in python implemented for quantum simulation using Trotter expansion https://github.com/trotter-suzuki-mpi/trotter-suzuki-mpi

Hope it can be done in Julia!

jipolanco commented 2 years ago

I'm not sure I understand. Do you mean partitioning a domain such that some subdomains are on CPUs and others on GPUs?

I guess it shouldn't be too hard to do. The only thing is that, not being that familiar with CUDA-aware MPI, I'm not sure how MPI handles communications between CPUs and GPUs. I know I had some issues when sending GPU arrays and receiving into CPU arrays (in PencilArrays.gather). And I guess these communications would be quite costly, so I'd need to be sure that it's worth the effort...

Lightup1 commented 2 years ago

Sorry for that. I may misunderstand how trotter-suzuki-mpi works, since the comunication between gpu and cpu is quite costly, it may not benfit from a hybrid kernal.

image image

This clearly shows that a hybrid kernal is slower. But I'm not sure whether the hybrid kernal here means distributing FFT between CPU and GPU or distributing Trotter steps into these two.

ref: Calderaro, Luca. "Large-scale Classical Simulation of Quantum Systems Using the Trotter-Suzuki Decomposition."

jipolanco commented 2 years ago

That looks interesting, thanks! I couldn't find any information on hybrid decompositions on their documentation, but I'll take a look at the ~paper~ thesis.