In the existing MPI implementation, the grid is split between MPI tasks evenly along the longest axis. This is the optimal solution for 1D problems, and acceptable in 2D and 3D when the number of MPI tasks is small. However, for larger numbers of MPI tasks, this results in strips (2D) or slabs (3D) that have a large surface area relative to the enclosed volume.
The optimal solution is to divide the domain into volumes as close to squares/cubes as possible. In this new implementation, this is achieved by finding the prime factors of the number of available MPI tasks, and successively dividing the grid along the longest axis by these factors.
This implementation will use all available MPI tasks. For unusual choices for the number of MPI tasks, such as prime numbers, this can still result in inefficient domain decomposition. However, in practice, the choice is usually the number of CPU cores, which is commonly multiples of 2 (and sometimes 3).
During setup, the code prints out the efficiency of the domain decomposition. This is the number of real cells compared to the number of ghost cells used for communication. The efficiency is defined as:
efficiency = number of real cells / (number of real cells + number of ghost cells)
In the existing MPI implementation, the grid is split between MPI tasks evenly along the longest axis. This is the optimal solution for 1D problems, and acceptable in 2D and 3D when the number of MPI tasks is small. However, for larger numbers of MPI tasks, this results in strips (2D) or slabs (3D) that have a large surface area relative to the enclosed volume.
The optimal solution is to divide the domain into volumes as close to squares/cubes as possible. In this new implementation, this is achieved by finding the prime factors of the number of available MPI tasks, and successively dividing the grid along the longest axis by these factors.
This implementation will use all available MPI tasks. For unusual choices for the number of MPI tasks, such as prime numbers, this can still result in inefficient domain decomposition. However, in practice, the choice is usually the number of CPU cores, which is commonly multiples of 2 (and sometimes 3).
During setup, the code prints out the efficiency of the domain decomposition. This is the number of real cells compared to the number of ghost cells used for communication. The efficiency is defined as: