ulfm-devel / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
0 stars 0 forks source link

pmix_fence deadlock in MPI_FINALIZE #6

Closed abouteiller closed 6 years ago

abouteiller commented 8 years ago

Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


MPI_FINALIZE deadlock because pmix_fence still counts the dead processes as participants.

It is simple enough to replace pmix_fence at the MPI level with an ompi_agree, but the orted still deadlock (for the same reason).

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


Solved

Still a problem

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


Fixed issues with turning off the failure detector.

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


The issue is demoted to minor as it now only impact networks that are not flushed correctly upon reaching finalize.

abouteiller commented 6 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


Issue is now resolved with ESS modifications that consider all processes fault from the same node