ulfm-devel / ompi

Open MPI main development repository
https://www.open-mpi.org
Other
0 stars 0 forks source link

MPI_Abort kills only MPI processes after a fault #47

Open abouteiller opened 5 years ago

abouteiller commented 5 years ago

Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


After a fault, MPI_Abort does not kill 'orted' daemons and the mpirun remains stuck

abouteiller commented 3 years ago

Original comment by Bitbucket user (Bitbucket: wangh0a, GitHub: wangh0a).


Hi Aurelien,

Do you have a solution for it?

Right now, I’m replacing MPI_Abort(…) with kill(getppid(), SIGTERM) to kill the 'orted' process. However, it might not clean up all the mpi processes, right?

abouteiller commented 3 years ago

Original comment by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).


Killing the parent daemon will cause all dependent processes to abort themselves (achieving cleanup, in an oblique way).

abouteiller commented 3 years ago

Original comment by Bitbucket user (Bitbucket: wangh0a, GitHub: wangh0a).


Hi Aurelien,

Thank you for the prompt reply. Do you think there would be cases where dependent processes not being killed (result to defunct processes) even if the parent daemon is killed?