openshmem-org / specification

OpenSHMEM Application Programming Interface
http://www.openshmem.org
51 stars 41 forks source link

How to read shmem_atomic updates? #395

Closed naveen-rn closed 4 years ago

naveen-rn commented 4 years ago

With the new OpenSHMEM-1.5 changes, did we clarify or modify the semantics for reading the OpenSHMEM atomic updates at the target side? Did we introduce any semantics requiring the use of point-to-point sync operations for reading the updates.

// source
shmem_atomic_fetch_inc(dst, target)

// target
shmem_wait_until(..., LESS/GREATER/, value)
x = load(dst)

I vaguely remember some change happening in the relation between atomic updates and the p2p-sync routines - not sure what it is. Specifically, the need for moving the notes to implementers section to the main description in the p2p-sync routines.

naveen-rn commented 4 years ago

Specifically, I'm trying to understand whether the following usage is valid?

// source
shmem_atomic_fetch_inc(dst, target)

// target
while (dst != something) {
   pause()
}
x = load(dst)
// source
shmem_atomic_fetch_inc(dst, target)

// target
while (dst != something) {
   pause()
}
x = shmem_atomic_fetch(dst, me)
nspark commented 4 years ago

@naveen-rn After today's WG call, do you want to keep this issue open—if so, should we add something in the spec?—or close it?

jdinan commented 4 years ago

The usage above has undefined behavior, as discussed in the WG meeting. @naveen-rn Could you take a look at Memory Model -> Atomicity Guarantees and make sure this comes across clearly?

naveen-rn commented 4 years ago

Thanks @nspark, @jdinan and @manjugv.

There were two questions in this issue:

  1. How/Whether to use p2p-sync routines in target to verify the completion of the atomic updates?
  2. How/Whether to use atomic operations to fetch the atomic updates on the target?

Based on the WG meeting and the text updates. For (1), while from the user perspective, I agree that p2p-sync routines are expected to be used to obtain portability. From implementation perspective, they can decide on the type of p2p-sync implementation based on the supported platform. That said, I have a slight concern in the "Notes to Implementer" section in shmem_wait_until from #394. We can decide the change there.

But, the text change for clarifying (2) seems incomplete. It is not clear whether we need another atomic update to itself on the target-side to fetch the results after returning from p2p-sync operations or can we use platform specific load operations? Or are we specifically leaving it undefined for fixing it in future?

Is the following a valid usage model:

// source
shmem_atomic_fetch_inc(dst, target)

// target
shmem_wait_until(..., LESS/GREATER/, value)
x = load(dst)

Or only the following is supported?

// source
shmem_atomic_fetch_inc(dst, target)

// target
shmem_wait_until(..., LESS/GREATER/, value)
x = shmem_atomic_fetch(dst)
nspark commented 4 years ago

Yes, I think both examples are well-defined in how they read dst—as long as the program is done updating dst after the shmem_wait_until condition is satisfied.

naveen-rn commented 4 years ago

@nspark I'm sorry, I'm missing the text change. I'm reading the Memory Models -> Atomicty Guarantees section. Where are you reading this usage? I don't see a clear definition on how to retrieve the updates on the atomic operations on a local symmetric heap. Even all the examples used in the section, shows the issue with the concurrent usage.

  1. I'm ok with leaving this semantics - ambiguous.
  2. If this is not the intension - then there are few questions:
    • what are we trying to achieve by mandating the use of another atomics to retrieve the atomic update. Are we worried about memory corruption or data race? If atomic updates are RMW - shouldn't the load be valid? Data race is a defn issue (resulted from other concurrent shmem_atomics or RMAs) - but using a shmem_atomics will not resolve it either.
nspark commented 4 years ago

I don't necessarily think there's anything that says "you can do this", but it is not prohibited by any of the four scenarios under "Atomicity Guarantees."

Specifically, I think your concern is over:

OpenSHMEM atomic operations do not guarantee exclusivity [... when] OpenSHMEM atomic operations and non-OpenSHMEM operations (e.g. load and store operations) are used to access the same location concurrently.

But, a program like the following obeys that restriction:

static int counter = 0;

// ...all PEs do work...

// ...each PE signals when done...
shmem_inc(&counter, 0);

// PE 0 waits for signals from all of the PEs,
// then aggregates and reports results...
if (shmem_my_pe() == 0) {
  shmem_wait_until(&counter, SHMEM_CMP_EQ, shmem_n_pes());
  // PE 0 can safely read `counter`, as no other PEs are updating it
  printf("Completed work from %d PEs\n", counter); // valid
}

In other words, the atomicity requirements do not say you may never mix those operations on the same objects in memory; it only says you cannot do so concurrently.

One could modify the safe example and make it unsafe:

static int counter = 0;

// ...all PEs do work...

// ...each PE signals when done...
shmem_inc(&counter, 0);

// PE 0 waits for signals from at least half of the PEs,
// then aggregates and reports results...
if (shmem_my_pe() == 0) {
  shmem_wait_until(&counter, SHMEM_CMP_GE, shmem_n_pes() / 2);
  // PE 0 *cannot* safely read `counter`, as other PEs may be updating it
  printf("Completed work from %d PEs\n", counter); // UB
}
naveen-rn commented 4 years ago

Thanks @nspark. This clarification looks fine. As I understand, it means, the spec doesn't restrict the use of any specific atomic operation to read the atomic update on a local variable. So, it is not as strict as put-with-signal, where we need to use shmem_signal_wait_until or shmem_signal_fetch to read the signal update.

In fact, the spec doesn't even specify on how to consume the atomic update. It is upto the application to handle it based on its usage model.

However, the following statement in the atomicity guarantees is miss-leading.

For example, during the execution of an atomic remote integer increment, i.e. shmem_atomic_inc, operation on a symmetric variable X, no other OpenSHMEM atomic operation may access X. After the increment, X will have increased its value by 1 on the destination PE, at which point other atomic operations may then modify that X. However, access to the symmetric object X with non-atomic operations, such as one-sided put or get operations, will invalidate the atomicity guarantees.

Instead if possible, we need to change this as:

For example, during the execution of an atomic remote integer increment, i.e. shmem_atomic_inc, operation on a symmetric variable X, no other OpenSHMEM atomic operation or a non-atomic operation, such as one-sided put or get may access the symmetric object X. After the increment, X will have increased its value by 1 on the destination PE, at which point other atomic operations may then modify that X.

jdinan commented 4 years ago

@naveen-rn This text is really trying to give an example of how atomic operations are performed by the OpenSHMEM implementation. It's not meant to specify how applications should use the API.

jdinan commented 4 years ago

@naveen-rn Anything we need to fix here in OpenSHMEM 1.5?

jdinan commented 4 years ago

Paging @naveen-rn @nspark could you check up on this issue to see if it needs attention in OpenSHMEM 1.5?

naveen-rn commented 4 years ago

Sorry for not replying sooner. I was thinking about the change requested in the comment. My only gripe is that the second statement in the example seems ambiguous and it kind of looks incorrect.

When I read it, it seems that we provide an example that prohibits the usage of non-atomic operation on a variable which is used for atomic operation. It is not clear, whether we speak about this semantics during the atomic update, before, or after the update. The suggested change kind of clarifies it.

But, if you feel that it is just my reading. Please go ahead and close the issue.

jdinan commented 4 years ago

I think I'd prefer to delete this paragraph rather than try to fix it. It's a "for example" that contains an imprecise description of atomicity that is also confusing for the reasons you stated. I think the paragraph may have either been written by or supported by @manjugv. Wondering what he thinks?

nspark commented 4 years ago

I think we should remove this paragraph. The following examples more directly and clearly address the specific requirements for atomicity.