huttered40 / critter

Critical path analysis of MPI parallel programs
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Bug in MPI_Waitall #21

Closed huttered40 closed 5 years ago

huttered40 commented 5 years ago

Found the bug in MPI_Waitall/Waitany using CTF's examples/matmul.cxx:

MPI_Waitall actually calls MPI_Waitany inside a loop. But there was a massive bug in MPI_Waitany:

The MPI standard says the following: "If the request was allocated by a nonblocking communication operation, then it is deallocated and the request handle is set to MPI_REQUEST_NULL."

Essentially once the PMPI_Waitany call returned, I could not truly dereference the request pointer via req[*idx] because it had already been deallocated, and the value changed to MPI_REQUEST_NULL (no seg fault because the deallocation is essentially changing the request ID by value to MPI_REQUEST_NULL). We need that request ID value to index into the request map so that we can call critter's stop() method on it.

I'm fixing the bug by saving the request indices into a separate array, and then using that with the index given by PMPI_Waitany (indicating which request has been closed) to reference the MPI_Request map.

@solomonik