icl-utk-edu / heffte

BSD 3-Clause "New" or "Revised" License
20 stars 15 forks source link

MPICH + CUDA #10

Open mkstoyanov opened 1 year ago

mkstoyanov commented 1 year ago

Some tests, e.g., long long fail when using mpich and CUDA-aware GPU.

mkstoyanov commented 1 year ago

Some issues resolved in #11 but alltoall (no-v) still fails when using empty boxes.

The test is disabled, since it is a fringe use-case (subcomm implies few ranks, so p2p should work better).

Testing should be passing under mpich + CUDA-aware, but further investigation of the alltoall is needed.

ax3l commented 3 weeks ago

Thanks for testing this. We (WarpX & ImpactX) use GPU-aware MPI heavily on DOE Exascale machines, which are currently all HPE/Cray and thus MPICH. With the current releases, anything we should look out for?

We do R2C FW and C2R BW FFTs for 1D to 3D.

mkstoyanov commented 3 weeks ago

This should not affect you. The problem happens when we use alltoall (no-v) which means that we pad the MPI messages to the same size. There appears to be an MPI specific issue if the boxes are empty and we only pad (i.e., we push around fake data). I doubt it will affect you and it may be no-issue on newer installations of mpich. We found this in the version installed from apt on Ubuntu 22.04.

Other than that, check the Cray documentation about GPU-aware MPI. ROCm machines require special env and compiler flags to enable this, sometimes both compile time and runtime.

ax3l commented 3 weeks ago

Thank you for the summary!

Other than that, check the Cray documentation about GPU-aware MPI. ROCm machines require special env and compiler flags to enable this, sometimes both compile time and runtime.

Yes, that's correct. For Cray/HPE machines, we control/request it at compile time so we can activate it at runtime.