pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
560 stars 279 forks source link

Taking down two complete nodes causes all of MPICH to crash #2187

Closed mpichbot closed 5 years ago

mpichbot commented 8 years ago

Originally by wbland on 2014-10-14 14:54:47 -0500


Brad Griglione reports that when using the new ULFM features, he can kill individual processes forever without a problem, but when he tries rebooting the machines to simulate total node failures, after rebooting the second machine, the entire job crashes.

pavanbalaji commented 5 years ago

FT is no longer supported on ch3.