jorendorff / atomicbox

Safe atomic Box types for Rust
Other
22 stars 11 forks source link

AtomicOptionBox::take should allow Acquire ordering? #9

Open elidupree opened 2 years ago

elidupree commented 2 years ago

Currently, AtomicOptionBox::take only allows AcqRel and SeqCst, just like AtomicOptionBox::swap. But AtomicOptionBox::take is always putting a None into the box, which doesn't reference any other data. It still needs to be Acquire (so that it can safely drop the possibly-not-None value that it takes out), but I don't think it needs to be Release as well.

tay64 commented 2 years ago

Without Release, wouldn't another thread executing take concurrently get a copy of the previously-stored pointer before it has a chance to see the change made by the first thread?

elidupree commented 2 years ago

No: That is always guaranteed by the fact that the operation is atomic at all, not by the ordering.

tay64 commented 2 years ago

Are you sure about that? (better yet, can you point to any sources confirming that?)

I do not claim to be an expert, but my understanding is that atomicity only guarantees that other threads won't see a partially-changed value (i.e. they either see the value before your store or a value after your store, and not some intermediate state).

If you do a Relaxed store, there are no restrictions on how it will be ordered relative to another thread's load, regardless of the load's ordering.

Here is an illustartion.

Initial value: &42

Thread 1                             Thread 2
(A)  ptr1 = load (Acquire)
(B)  store(None, Relaxed)           (C)  ptr2 = load(Acquire)
                                    (D)  store(None, Relaxed)

Since store is Relaxed, there is nothing requiring (B) to happen before (C) for Thread 2, so I claim that ptr2 will contain &42. Am I wrong?

elidupree commented 2 years ago

Your claim fits the code you provided, but that isn't the code that's used by take. take performs a single atomic swap, not an atomic load followed by a separate atomic store. Load-store would be buggy in the same way even with the strictest ordering (the loads may both get the same pointer).

tay64 commented 2 years ago

Atomic swap changes nothing for this question, because from the hardware point of view it consists of a load and a store. Viewing load and store as a single operation only makes difference for compare-and-swap. In this case the question is about ordering, not about atomicity.

tay64 commented 2 years ago

Here, I drawn a better illustration.

Thread 1                                  Thread 2
(A)  store(&42, Release)
(B)  ptr1 = load_store(None, Acquire)     (C)  ptr2 = load_store(None, Acquire)

Because (A) is Release and both (B) and (C) are Acquire, both are guaranteed to see the effect of (A) (well, (B) also happens in the same thread as (A) so of course it sees the effect of (A), but it's not the point).

Because the stores are Relaxed in (B) and (C), their relative ordering is undefined, so (C) is not guaranteed to see the effects of (B), and vice versa.

My claim, again, is that it is possible to get ptr1 == ptr2 == &42.

Why stores are Relaxed:

Note that using Acquire makes the store part of this operation Relaxed

(from AtomicPtr::swap documentation)

elidupree commented 2 years ago

Atomic swap changes nothing for this question, because from the hardware point of view it consists of a load and a store. Viewing load and store as a single operation only makes difference for compare-and-swap. In this case the question is about ordering, not about atomicity.

I didn't think I would need to cite a source for "atomic swap is an atomic operation", but I guess here we are. Unfortunately I wasn't able to find the exact location where the core::intrinsics::atomic_xchg_* intrinsics (which are used by AtomicPtr::swap) are defined, but I would assume they lower to instructions like x86's XCHG, which is documented to be atomic across the whole operation:

If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation

tay64 commented 2 years ago

I didn't think I would need to cite a source for "atomic swap is an atomic operation", but I guess here we are.

Atomicity and memory ordering are orthogonal; please do not confuse the two.

Atomicity means that the operation cannot be disrupted by any actions of other threads.

Memory ordering governs when other threads are required to see the effect of your operation, and which effects of other threads' operations you are required to see.

The x86's XCHG is irrelevant here, because Rust is not x86-specific. x86 is a strange bird among the architectures, as it is willing to enforce much more in terms of cache coherency than any other multi-core architecture (and XCHG issuing a cache sync across all physical CPUs is insane, and insanely expensive by the token of other architectures).

The guarantees Rust developers have chosen are, fortunately, documented in Rust's load-and-store functions such as AtomicPtr::swap. They do not seem to support your view.

elidupree commented 2 years ago

The guarantees Rust developers have chosen are, fortunately, documented

They are indeed documented - and there is more documentation than just that on the individual functions. From the top-level std::sync::atomic docs, we have:

Rust atomics currently follow the same rules as C++20 atomics, specifically atomic_ref

We must look to C++ for definitions, then. From a draft of the C++ standard (n.b. viewing the true standard requires you to pay loads of money, but I've been assured that these drafts are virtually identical in content):

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M. [Note: There is a separate order for each atomic object. There is no requirement that these can be combined into a single total order for all objects. In general this will be impossible since different threads may observe modifications to different objects in inconsistent orders. — end note

and:

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

Just to verify that AtomicPtr::swap indeed counts as a read-modify-write operation, we can also find the documentation of atomic_ref::exchange, which I'll assume is the equivalent:

Atomically replaces the value of the referenced object with desired. The operation is a read-modify-write operation.

Hence, if two atomic swaps with None are performed on the same variable, they cannot both read Some: one of the writes must come before the other in the modification order of the atomic variable, and the second read must read the value written by the first write (which is None).

ghost commented 1 year ago

@elidupree is correct. Atomicity guarantees that all threads observe operations on a single atomic object as happening in the same order. That is true even if all of those operations have relaxed memory ordering. Memory ordering determines what set of interleavings of operations on distinct objects are observable from different threads (which may or may not be consistent with each other).

Acquire-release ordering guarantees that, if a thread does an acquire-ordered load or RMW, and it loads the value stored by a release-ordered load or RMW, everything that happened-before the release-ordered operation also happens-before an acquire-ordered operation that loads that value.

Importantly for this crate, that means when release-storing/swapping in a pointer and then acquire-loading/swapping out that same pointer ensures that operations on the pointee that happened-before the release-store/swap also happen-before (and don't cause a data race with) operations on the pointee after the acquire-load/swap.

If the storing thread swaps/stores a none/null, there is no pointee, so there are no operations on the pointee to ensure ordering on. I can imagine some use cases might use take to ensure ordering of operations on other memory or IO, so release ordering when swapping none/null is likely still useful in some cases. But for the fully-safe use case of just ensuring ordering on the pointee, relaxed stores of none/null should be just fine.