Open mpiforumbot opened 8 years ago
Originally by jhammond on 2014-03-14 09:30:35 -0500
Relaxing the same-operation constraint by default doesn't preclude a hardware implementation since one can emulate any atomic with compare-and-swap, albeit very inefficiently in the contended case. If one does not have compare-and-swap in hardware, it seems reasonable to assume that the other operations won't be available in hardware either and thus the hardware-only scenario is not relevant.
Originally by rsthakur on 2014-03-14 11:03:41 -0500
As far as I know, MPI 2.2 allowed only same op. MPI 3.0 changed this by allowing same_op_no_op, which is the default unless the info key is changed to same_op (which gives MPI 2.2 behavior).
MPI 2.2, pg 365, ln 29-33, says
<A location in a window must not be accessed as a target of an RMA operation once an update to that location has started, until the update becomes visible in the public window copy. There is one exception to this rule, in the case where the same variable is updated by two concurrent accumulates that use the same operation, with the same predefined datatype, on the same window.>
Originally by jhammond on 2014-06-25 11:00:21 -0500
FYI: This is related to #399
Originally by rsthakur on 2014-06-25 11:28:07 -0500
My comment above says there is no breakage from MPI-2 semantics, so the first paragraph of the ticket needs to be changed. The ticket is about relaxing further what was slightly relaxed in MPI-3.
Originally by gropp on 2014-12-10 13:00:07 -0600
The working group requests a compelling use case and a clear response to Rajeev's comment. Specifics are needed for both (1) the options provided and (2) the defaults.
Originally by jhammond on 2014-12-10 13:19:20 -0600
Replying to rsthakur:
My comment above says there is no breakage from MPI-2 semantics, so the first paragraph of the ticket needs to be changed. The ticket is about relaxing further what was slightly relaxed in MPI-3.
I removed this comment. This feature can still be justified by usage needs.
Originally by gropp on 2015-04-04 08:35:16 -0500
This ticket is consistent with the Forum's current approach to favor generality over performance, particularly in the defaults. I would add advice to users and implementers to that effect - users that they should specify the accumulate_ops as tightly as possible to remain backward compatible in terms of performance, and implementers to pay attention to this if their hardware is such that they can optimize for the special case of certain operations.
Originally by jhammond on 2015-04-04 11:34:30 -0500
There was a ticket to allow the user to specify very explicitly what ops and types were to be used (#399), but I withdrew it in favor of this one.
Do you think it is worth revisiting the more explicit ticket or is the input too cumbersome?
Originally by gropp on 2015-04-04 12:25:32 -0500
I don't think there is a need for that one yet. The issue, as I understood it from the original discussions, is that in some cases, if some of the operations have to be done in software, then they may all need to be done in software in order to ensure the semantics. Having fine grain control might allow the implementation to decide whether the hardware could handle them all, but without a clear example, I don't think it is worth adding at this time, especially since it could be added later.
Originally by rsthakur on 2015-06-03 13:48:15 -0500
From the June 2015 Forum meeting: Would like to see specific proposal text and discussion of potential performance issues.
Originally by balaji on 2014-03-14 00:20:15 -0500
In MPI-3, the default value for the
accumulate_ops
info key issame_op_no_op
. This means that two concurrent accumulate operations to the same target location using different operations is erroneous.-Proposal:*
accumulate_ops
to be empty ornone
, which would stand for concurrent accumulate operations to the same target with different ops are allowed. In this case, the MPI implementation might need to use mutexes to provide atomicity across all operations.same_op
,same_op_no_op
, andsame_op_no_op_replace
, that allow the user to restrict the kind of concurrent accumulate operations that can happen at the target. The MPI implementation can utilize some of these hints to use hardware atomics rather than mutexes to optimize accumulate operations.-Backward Compatibility:*
This proposal is backward compatible with both MPI-2 and MPI-3. It provides a more relaxed semantics compared to both (from the perspective of the user), but allows us to get the same efficiency given appropriate info key values.
-Impact on Implementations:*
Implementations will need to support the default case of allowing multiple concurrent accumulate with different ops at the same target, while maintaining atomicity.