mpiwg-ft / ft-issues

Repository to discuss issues and host the FTWG wiki
5 stars 0 forks source link

Simplification of FAILURE_ACK/GET_ACKED #15

Closed abouteiller closed 2 years ago

abouteiller commented 4 years ago

Users and the MPI Forum have complained that FAILURE_ACK/GET_ACKED are difficult to understand. Another point to note is that MPI_COMM_FAILURE_ACK()... MPI_COMM_FAILURE_GET_ACKED is not symmetrical with MPI_WIN_GET_FAILED, which is much more simple to use.

This RFC for ULFM change proposes to replace the ACK/GET_ACKED couple with a variation based on MPI_COMM_GET_FAILED, and MPI_COMM_ACK_FAILED. This permits the same level of control on the acknowledgement of failures and recovery of ANY_SOURCE recvs, without the odd post-facto behavior in GET_ACKED. Now you look at the group, and you ack what you saw, not the opposite.

See the following diff for textual details: https://github.com/mpiwg-ft/mpi-standard/compare/ulfm/master...mpiwg-ft:ulfm/ackfailed

hzhou commented 2 years ago

The diff link isn't accessible to me. Is there an accessible link/update?

bosilca commented 2 years ago

There is nothing to compare anymore, the proposed text is now incorporated into the ulfm/main branch.

hzhou commented 2 years ago

Can someone give me the access to mpiwg-ft repo?

hzhou commented 2 years ago

@abouteiller Is MPIX_Comm_get_failed(comm, &fgrp) in the latest working proposal?

hzhou commented 2 years ago

Is there a public accessible read on the latest ULFM proposal?

abouteiller commented 2 years ago

Current version is updated periodically here https://github.com/mpi-forum/mpi-issues/issues/20#issuecomment-1021458776

abouteiller commented 2 years ago

This is now part of the main proposal text