Closed naveen-rn closed 4 years ago
Specifically, I'm trying to understand whether the following usage is valid?
// source
shmem_atomic_fetch_inc(dst, target)
// target
while (dst != something) {
pause()
}
x = load(dst)
// source
shmem_atomic_fetch_inc(dst, target)
// target
while (dst != something) {
pause()
}
x = shmem_atomic_fetch(dst, me)
@naveen-rn After today's WG call, do you want to keep this issue open—if so, should we add something in the spec?—or close it?
The usage above has undefined behavior, as discussed in the WG meeting. @naveen-rn Could you take a look at Memory Model -> Atomicity Guarantees and make sure this comes across clearly?
Thanks @nspark, @jdinan and @manjugv.
There were two questions in this issue:
Based on the WG meeting and the text updates. For (1), while from the user perspective, I agree that p2p-sync routines are expected to be used to obtain portability. From implementation perspective, they can decide on the type of p2p-sync implementation based on the supported platform. That said, I have a slight concern in the "Notes to Implementer" section in shmem_wait_until from #394. We can decide the change there.
But, the text change for clarifying (2) seems incomplete. It is not clear whether we need another atomic update to itself on the target-side to fetch the results after returning from p2p-sync operations or can we use platform specific load operations? Or are we specifically leaving it undefined for fixing it in future?
Is the following a valid usage model:
// source
shmem_atomic_fetch_inc(dst, target)
// target
shmem_wait_until(..., LESS/GREATER/, value)
x = load(dst)
Or only the following is supported?
// source
shmem_atomic_fetch_inc(dst, target)
// target
shmem_wait_until(..., LESS/GREATER/, value)
x = shmem_atomic_fetch(dst)
Yes, I think both examples are well-defined in how they read dst
—as long as the program is done updating dst
after the shmem_wait_until
condition is satisfied.
@nspark I'm sorry, I'm missing the text change. I'm reading the Memory Models -> Atomicty Guarantees section. Where are you reading this usage? I don't see a clear definition on how to retrieve the updates on the atomic operations on a local symmetric heap. Even all the examples used in the section, shows the issue with the concurrent usage.
I don't necessarily think there's anything that says "you can do this", but it is not prohibited by any of the four scenarios under "Atomicity Guarantees."
Specifically, I think your concern is over:
OpenSHMEM atomic operations do not guarantee exclusivity [... when] OpenSHMEM atomic operations and non-OpenSHMEM operations (e.g. load and store operations) are used to access the same location concurrently.
But, a program like the following obeys that restriction:
static int counter = 0;
// ...all PEs do work...
// ...each PE signals when done...
shmem_inc(&counter, 0);
// PE 0 waits for signals from all of the PEs,
// then aggregates and reports results...
if (shmem_my_pe() == 0) {
shmem_wait_until(&counter, SHMEM_CMP_EQ, shmem_n_pes());
// PE 0 can safely read `counter`, as no other PEs are updating it
printf("Completed work from %d PEs\n", counter); // valid
}
In other words, the atomicity requirements do not say you may never mix those operations on the same objects in memory; it only says you cannot do so concurrently.
One could modify the safe example and make it unsafe:
static int counter = 0;
// ...all PEs do work...
// ...each PE signals when done...
shmem_inc(&counter, 0);
// PE 0 waits for signals from at least half of the PEs,
// then aggregates and reports results...
if (shmem_my_pe() == 0) {
shmem_wait_until(&counter, SHMEM_CMP_GE, shmem_n_pes() / 2);
// PE 0 *cannot* safely read `counter`, as other PEs may be updating it
printf("Completed work from %d PEs\n", counter); // UB
}
Thanks @nspark. This clarification looks fine. As I understand, it means, the spec doesn't restrict the use of any specific atomic operation to read the atomic update on a local variable. So, it is not as strict as put-with-signal, where we need to use shmem_signal_wait_until
or shmem_signal_fetch
to read the signal update.
In fact, the spec doesn't even specify on how to consume the atomic update. It is upto the application to handle it based on its usage model.
However, the following statement in the atomicity guarantees is miss-leading.
For example, during the execution of an atomic remote integer increment, i.e. shmem_atomic_inc, operation on a symmetric variable X, no other OpenSHMEM atomic operation may access X. After the increment, X will have increased its value by 1 on the destination PE, at which point other atomic operations may then modify that X. However, access to the symmetric object X with non-atomic operations, such as one-sided put or get operations, will invalidate the atomicity guarantees.
Instead if possible, we need to change this as:
For example, during the execution of an atomic remote integer increment, i.e. shmem_atomic_inc, operation on a symmetric variable X, no other OpenSHMEM atomic operation or a non-atomic operation, such as one-sided put or get may access the symmetric object X. After the increment, X will have increased its value by 1 on the destination PE, at which point other atomic operations may then modify that X.
@naveen-rn This text is really trying to give an example of how atomic operations are performed by the OpenSHMEM implementation. It's not meant to specify how applications should use the API.
@naveen-rn Anything we need to fix here in OpenSHMEM 1.5?
Paging @naveen-rn @nspark could you check up on this issue to see if it needs attention in OpenSHMEM 1.5?
Sorry for not replying sooner. I was thinking about the change requested in the comment. My only gripe is that the second statement in the example seems ambiguous and it kind of looks incorrect.
When I read it, it seems that we provide an example that prohibits the usage of non-atomic operation on a variable which is used for atomic operation. It is not clear, whether we speak about this semantics during the atomic update, before, or after the update. The suggested change kind of clarifies it.
But, if you feel that it is just my reading. Please go ahead and close the issue.
I think I'd prefer to delete this paragraph rather than try to fix it. It's a "for example" that contains an imprecise description of atomicity that is also confusing for the reasons you stated. I think the paragraph may have either been written by or supported by @manjugv. Wondering what he thinks?
I think we should remove this paragraph. The following examples more directly and clearly address the specific requirements for atomicity.
With the new OpenSHMEM-1.5 changes, did we clarify or modify the semantics for reading the OpenSHMEM atomic updates at the target side? Did we introduce any semantics requiring the use of point-to-point sync operations for reading the updates.
I vaguely remember some change happening in the relation between atomic updates and the p2p-sync routines - not sure what it is. Specifically, the need for moving the notes to implementers section to the main description in the p2p-sync routines.