hemelb-codes / hemelb-gpu

GPU-accelerated version of the HemeLB lattice Boltzmann code for large scale fluid flow in complex geometries.
GNU Lesser General Public License v3.0
0 stars 1 forks source link

Optimize GPU boundary exchanges via NVSHMEM #1

Open bentsherman opened 3 years ago

bentsherman commented 3 years ago

NVSHMEM is an implementation of OpenSHMEM for Nvidia GPUs:

https://developer.nvidia.com/nvshmem https://docs.nvidia.com/hpc-sdk/nvshmem/api/docs/index.html

It is essentially an alternative to MPI that allows the GPUs to communicate directly with the interconnect, instead of going through the CPU for MPI communications. The API is very similar to MPI but with slightly different terminology (init, finalize, PEs, teams, put/get, collective ops). Additionally, the memory model is slightly different.

This would be a great way to optimize the boundary exchanges, which currently represent the majority of communication overhead in the multi-GPU scenario. A big downside is that you probably can't have MPI and NSHMEM in the same program. You might be able to have a wrapper library that defers to either MPI or NVSHMEM based on whether or not GPUs are enabled, but more likely you will need to have separate binaries for cpu/gpu.

bentsherman commented 3 years ago

This repo has code examples for all the different ways to implement a Jacobi solver with multi-GPU:

https://github.com/NVIDIA/multi-gpu-programming-models