This was a long-standing omission in the implementation. ARMCI nonblocking handles are similar to MPI RMA requests but are not 1:1 because aggregate request handles are 1:N.
This implements request handles using RMA requests, which replaces the prior implementation that just did flush(_all) instead of individual handle completion. The old implementation is preserved via the preprocessor.
This also adds a feature to switch to Rget_accumulate for atomics (all of which are blocking), which avoids a flush in this code path that might be slowed down by the need to complete more expensive, potentially non-hardware, operations.
This has not been tested thoroughly. It will be merged after sufficient testing.
This was a long-standing omission in the implementation. ARMCI nonblocking handles are similar to MPI RMA requests but are not 1:1 because aggregate request handles are 1:N.
This implements request handles using RMA requests, which replaces the prior implementation that just did
flush(_all)
instead of individual handle completion. The old implementation is preserved via the preprocessor.This also adds a feature to switch to
Rget_accumulate
for atomics (all of which are blocking), which avoids a flush in this code path that might be slowed down by the need to complete more expensive, potentially non-hardware, operations.This has not been tested thoroughly. It will be merged after sufficient testing.
Tested with: