Developers of Open MPI components (BTL, PML, MTL, etc) need guidance as to what they have to fulfill in order for their component to support fault tolerant operation. We need to produce a document that describes a clear way forward and compliance verification methodologies.
Original report by Aurelien Bouteiller (Bitbucket: abouteiller, GitHub: abouteiller).
Developers of Open MPI components (BTL, PML, MTL, etc) need guidance as to what they have to fulfill in order for their component to support fault tolerant operation. We need to produce a document that describes a clear way forward and compliance verification methodologies.