Found the bug in MPI_Waitall/Waitany using CTF's examples/matmul.cxx:
MPI_Waitall actually calls MPI_Waitany inside a loop. But there was a massive bug in MPI_Waitany:
The MPI standard says the following: "If the request was allocated by a nonblocking communication operation, then it is deallocated and the request handle is set to MPI_REQUEST_NULL."
Essentially once the PMPI_Waitany call returned, I could not truly dereference the request pointer via req[*idx] because it had already been deallocated, and the value changed to MPI_REQUEST_NULL (no seg fault because the deallocation is essentially changing the request ID by value to MPI_REQUEST_NULL). We need that request ID value to index into the request map so that we can call critter's stop() method on it.
I'm fixing the bug by saving the request indices into a separate array, and then using that with the index given by PMPI_Waitany (indicating which request has been closed) to reference the MPI_Request map.
Found the bug in MPI_Waitall/Waitany using CTF's examples/matmul.cxx:
MPI_Waitall
actually callsMPI_Waitany
inside a loop. But there was a massive bug inMPI_Waitany
:The MPI standard says the following: "If the request was allocated by a nonblocking communication operation, then it is deallocated and the request handle is set to MPI_REQUEST_NULL."
Essentially once the
PMPI_Waitany
call returned, I could not truly dereference the request pointer viareq[*idx]
because it had already been deallocated, and the value changed toMPI_REQUEST_NULL
(no seg fault because the deallocation is essentially changing the request ID by value to MPI_REQUEST_NULL). We need that request ID value to index into the request map so that we can call critter's stop() method on it.I'm fixing the bug by saving the request indices into a separate array, and then using that with the index given by
PMPI_Waitany
(indicating which request has been closed) to reference the MPI_Request map.@solomonik