upperwal / EntangledMPI

Fault Tolerance framework for High Performance Computing [Supports ULFM, replication and checkpointing]
MIT License
2 stars 1 forks source link

MPI_Wait hangs forever when using comm dup in MPI call #31

Open upperwal opened 6 years ago

upperwal commented 6 years ago

The following code will hang on MPI_Wait call as MPI_Irecv and MPI_Send are using duplicate communicator. This is happening in MPI_Send, MPI_Recv, MPI_Isend and MPI_Irecv implementations.

MPI_Comm rep_comm;
MPI_Comm_dup(MPI_COMM_WORLD, &rep_comm);

MPI_Irecv(recv, 100, MPI_INT, 1 - rank, 0, rep_comm, req);
MPI_Send(send, 100, MPI_INT, 1 - rank, 0, MPI_COMM_WORLD);

MPI_Status stat;
MPI_Wait(req, &stat);

Possible Solution: Use

MPI_Comm *comm_to_use;

//PMPI_Comm_dup(node.rep_mpi_comm_world, &comm_to_use);
comm_to_use = &(node.rep_mpi_comm_world);

PMPI_Irecv(buf, ... , *comm_to_use, req);