lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
286 stars 94 forks source link

MPI communication should use persistent communicators #84

Closed maddyscientist closed 11 years ago

maddyscientist commented 11 years ago

Currently, MPI communicators are created on demand. This will not play nicely with peer-2-peer MPI between devices since the cost to set up the handshaking is significant. Thus all MPI communications should be done persistent communicators, that are set up when the FaceBuffer is created. This also aligns better with the QMP model.

maddyscientist commented 11 years ago

Having spoken to Rolf vanderVaart (OpenMPI maintainer) about this, I found out that these are essential if we want to be able to use GPUDirect and actually reduce latency. We could conceivably leave a fall-back option if this is needed.

maddyscientist commented 11 years ago

This issue was closed with 50eef47ae631cb6339af587cc7066ae53278e010.

drossetti commented 11 years ago

do you guys think that now it is a good moment to plug APEnet P2P+RDMA support ?

maddyscientist commented 11 years ago

Almost. That's why I have working on cleaning up the comms.

I wonder if the best way is to provide an APEnet comm backend (like for the qmp, mpi and single backends that already exist) or whether to a qmp-APEnet interface? See include/comm_quda.h.

What are your thoughts Davide?

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

drossetti commented 11 years ago

il like that comm_declare_receive_relative() and its twin. those are easily ported to a RDMA GPU-aware API. If the frame communication were implemented in terms of those two functions, it'd perfect.

maddyscientist commented 11 years ago

Let's got with this the, Davide, can you provide a comm_apenet backend then that implements this (and the other functions in comm_quda.h). You only need to implement the functions that comm_qmp and comm_single implement, eventually all of the additional ones that comm_mpi implements will be deprecated and the code that uses them will be converted to use persistent message handlers like QMP does.