Closed abouteiller closed 6 years ago
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Solved
Still a problem
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Fixed issues with turning off the failure detector.
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
The issue is demoted to minor as it now only impact networks that are not flushed correctly upon reaching finalize.
Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Issue is now resolved with ESS modifications that consider all processes fault from the same node
Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
MPI_FINALIZE deadlock because pmix_fence still counts the dead processes as participants.
It is simple enough to replace pmix_fence at the MPI level with an ompi_agree, but the orted still deadlock (for the same reason).