Open abouteiller opened 2 years ago
I believe this is the set of changes for the no-no vote on 2022-09-30:
This issue had a "no-no" vote on 2022-09-30, which passed:
Yes | No | Abstain |
28 | 0 | 1 |
This passed a first vote on 2022-12-07.
Yes | No | Abstain |
28 | 0 | 5 |
This passed a second vote on 2023-02-08.
Yes | No | Abstain |
25 | 0 | 6 |
Problem
The monolithic ULFM proposal has been split in morsels so that the MPI Forum can focus on individual topics.
Main topic issue https://github.com/mpi-forum/mpi-issues/issues/20
Proposal
The first topic slice contains the following concepts for communicators:
Changes to the Text
Addition of an FT chapter containing the proposed constructs
Impact on Implementations
Implementations optionally to implement fault tolerance. Implementations to add procedures MPI_COMM_REVOKE, MPI_COMM_GET_FAILED, MPI_COMM_ACK_FAILED (implementations that do not support FT can provide stubs that are not fault tolerant).
Impact on Users
Users can receive fault events, write Manager-Worker applications, and run-through (e.g., Stencil, ABFT) type workloads that use only P2P operations (slice2 and 3 will add features for repairing communicators as needed to use collective and process respawning after a fault).
References and Pull Requests
https://github.com/mpi-forum/mpi-standard/pull/665