Closed huttered40 closed 3 years ago
I have a potential fix: Simply use blocking collectives during the blocking stage (when MPI_Wait is intercepted) (not nonblocking, because it would be needless; we'd need the MPI_Request to be completed before returning to the user program anyways).
I realized this could work because according to the MPI standard, collectives are blocking, not synchronous. This means that a root of a broadcast does not need to handshake each process its broadcasting to; that would be ridiculous. But unlike blocking p2p, there is no possibility of deadlock. This does mean, however, that our propagation strategy for blocking collectives is synchronous, even if the internal communication in the propagation strategy is blocking.
Would want to test this out on CTF first, to make sure all is well the nonblocking collectives they use.
Unfortunately, because symbol-tracking requires multi-communication-step information with different datatypes, I cannot simply perform the same trick as I did for synchronous collective support via a complicated custom MPI_Op.
Currently, we silently do not propagate critical path information for schedule dependencies among processes involved in nonblocking collective routines.