mpickpt / mana

MANA for MPI
35 stars 24 forks source link

[MANA-279] Return a virtual request ID in Irecv #288

Closed dahongli closed 1 year ago

dahongli commented 1 year ago

Application may call MPI_Waitany after it calls MPI_Irecv. Waitany depends the request ID to decide which message is complete. This change creates a virtual request when Irecv is serviced from buffered packets.

The situation that causes Waitany problem is, [1] Nimrod post several irecv and get the requests of irecv [2] At checkpoint, MANA drains irecv. [3] MANA restart replays irecv. The Irecv sets request to MPI_REQUEST_NULL when it consumes the buffer saved in the above checkpoint step. [4] Nimrod calls waitany. At this moment all requests are NULL because of step #3. It returns MPI_UNDEFINED as the index. This is wrong. It should return the index of complete Irecv.

dahongli commented 1 year ago

Merged the four commits into one and update the code comments.