paboyle / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
153 stars 106 forks source link

Legacy Comms Merge #285

Open goracle opened 4 years ago

goracle commented 4 years ago

Hi,

I've been maintaining a separate Grid version for a long time (and periodically merging in the develop branch). The primary reason for this has to do with needing 4 processes per node on the KNL, since this the A2A contractions in CPS are optimized for 4 ppn (on the KNL). When introduced, Mpi-3 (now just MPI) (I believe) did not allow us to allocate a huge pages comms buffer that was separate for each process. That is, the comms setup before the move to MPI-3 had us copy into a simple, independent huge pages buffer, perform the MPI call, then copy back.

From my (perhaps flawed) recollection: at one point, I discussed with @paboyle about implementing something to allow us to continue to do this. In response, he added the map anon option for the mmap call, but, for reasons I don't remember, this did not work. I thus had to add some hacks to avoid using the new comms system.

Perhaps disturbing the status quo is not a wise idea, given that merges with develop are going smoothly and maintaining this separate version is by now fairly painless. I'm also still fairly confident that this was all necessary, but given how long it's been since I looked at this I'm becoming less so (and I don't really feel like winning such an argument would be that productive). Anyway, for what it's worth, I'm open to discuss any and all solutions, including leaving things the way they are. This is certainly not an urgent problem, so please respond as your convenience/interest level dictate.

Thanks.

paboyle commented 4 years ago

AC_ARG_ENABLE([shm],[AC_HELP_STRING([--enable-shm=shmopen|shmget|hugetlbfs|shmnone], [Select SHM allocation technique])],[ac_SHM=${enable_shm}],[ac_SHM=shmopen])

case ${ac_SHM} in

 shmopen)
 AC_DEFINE([GRID_MPI3_SHMOPEN],[1],[GRID_MPI3_SHMOPEN] )
 ;;

 shmget)
 AC_DEFINE([GRID_MPI3_SHMGET],[1],[GRID_MPI3_SHMGET] )
 ;;

 shmnone)
 AC_DEFINE([GRID_MPI3_SHM_NONE],[1],[GRID_MPI3_SHM_NONE] )
 ;;

 hugetlbfs)
 AC_DEFINE([GRID_MPI3_SHMMMAP],[1],[GRID_MPI3_SHMMMAP] )
 ;;

 *)
 AC_MSG_ERROR([${ac_SHM} unsupported --enable-shm option]);
 ;;

esac

You have a number of choices for how to get the shared memory region.

Enables different code paths in

Grid/communicator/SharedMemoryMPI.cc

goracle commented 4 years ago

Yes, I know about these. I added an option to that list, which is the legacy option I'm using.

paboyle commented 4 years ago

Which option?

goracle commented 4 years ago

Never mind, I was mistaken: I added the option here:

AC_ARG_ENABLE([comms],[AC_HELP_STRING([--enable-comms=none|mpi|mpi-auto],
              [Select communications])],[ac_COMMS=${enable_comms}],[ac_COMMS=none])

case ${ac_COMMS} in
     none)
        AC_DEFINE([GRID_COMMS_NONE],[1],[GRID_COMMS_NONE] )
        comms_type='none'
     ;;
     mpi3*)
        AC_DEFINE([GRID_COMMS_MPI3],[1],[GRID_COMMS_MPI3] )
        comms_type='mpi3'
     ;;
     mpi*)
        AC_DEFINE([GRID_COMMS_MPI],[1],[GRID_COMMS_MPI] )
        comms_type='mpi'
     ;;
     *)
        AC_MSG_ERROR([${ac_COMMS} unsupported --enable-comms option]);
     ;;

In your list I use shmnone.

paboyle commented 4 years ago

MPI and MPI3 are the same now. There is no difference.

goracle commented 4 years ago

Yes, I heard this happened awhile ago. The option name is just something I chose to distinguish the legacy comms code from the current comms options. I should also mention another requirement (that the legacy system solves) that I forgot: There is a whole set of MPI calls that are performed within the A2A code in CPS (associated with meson field memory management), so I need to (at some point after the CG), take the (per MPI process) huge page comms buffer from Grid and give it to CPS. In my Grid version, I pass this buffer via Grid_finalize.