Packing pointers into double-word width atomics

eggyal commented 1 month ago

Under Using Strict Provenance, the standard library documents:

(Yes, if you’ve been using AtomicUsize for pointers in concurrent datastructures, you should be using AtomicPtr instead. If that messes up the way you atomically manipulate pointers, we would like to know why, and what needs to be done to fix it.)

(I've not been able to locate a discussion similar to the following, and assume that this is the best place to reply to that invitation—but please do point me elsewhere if not!).

Some concurrent algorithms (I am particularly thinking of some epoch-based memory reclamation schemes) solve the ABA problem by packing both a pointer and an integer (e.g. an epoch) into a single (double-word width) atomic variable (e.g. a 128-bit atomic, manipulated using cmpxchg16b, on x86_64 architectures).

To do this with the existing atomic types requires using the appropriate double-word width atomic integer (e.g. AtomicU128 on 64-bit architectures), and converting the relevant single-word width subpart to/from a pointer.

If such pointer may be to different allocated objects over the course of the algorithm, it clearly is not possible to derive provenance in those conversions via strict provenance's .with_addr() method (because any "base" pointer from which provenance is derived would also have to be updated atomically with any update to this atomic).

I am not sure what the best solution to this might be. Perhaps something like an AtomicDoublePtr<T, U> type or similar?

Somewhat related to #286 and #480.

RalfJung commented 1 month ago

Yeah this is unfortunately not supported currently. A similar situation arises with types like (u64, u32) that would fit into 16 bytes but are not safe to transmute to u128 due to padding.

IMO the best solution is to have some way to do atomic operations on MaybeUninit<u128> -- or else on any T that is 16 bytes large (or 8 bytes, or some other size supported for atomics). I am not sure what the best way is to expose that from the standard library, though. Maybe a generic Atomic<T> with something like const { assert!(valid_atomic_size(size_of::<T>())) }? Or maybe we can have a magic trait that is implemented only for types that have the right size? @lcnr @boxyuwu how terrible of an idea would that be?

lcnr commented 1 month ago

Or maybe we can have a magic trait that is implemented only for types that have the right size?

That's possible. It's also a stability hazard for user-defined types so idk if that is desirable :thinking: you could have an explicit AtomicSized derive which checks that the type has a valid size but is opt-in?

RalfJung commented 1 month ago

And then a (T, U) tuple would only be AtomicSized if it has the right size and both T and U are AtomicSized? That could work. It would also be a limitation since one can form a tuple of the right size from types that don't have the right size, like ([u8; 3], u8).

OTOH it's not a new stability hazard, it is exactly the same hazard as transmute::<LibraryType, u128>.

eggyal commented 1 month ago

Whilst I agree that there is a broader problem of atomically manipulating types that can't transmute to the containing atomic due to padding etc, my point was more about maintaining pointer provenance: something that AtomicPtr does but AtomicUsize does not.

When packing a pointer into a larger atomic, there is no equivalent to AtomicPtr, and I don't think this would be solved by having some more general means of packing smaller types into larger atomics? Do we not need a specific type for pointers here?

Diggsey commented 1 month ago

@eggyal There is a hierarchy:

usize -> *mut -> MaybeUninit [nothing] -> [provenance] -> [uninitialized bytes]

Where stuff to the right can store everything that stuff to the left can. @RalfJung is proposing tackling the atomic manipulation of generic types, which would include MaybeUninit, and so would automatically cover all the other use-cases, whereas an AtomicDoublePtr type would solve the provenance issue, but wouldn't allow you to atomically manipulate eg. types with padding, since padding may be uninitialized.

eggyal commented 1 month ago

Ah, my misunderstanding. I hadn't appreciated that provenance is propagated through uninitialized bytes!

lcnr commented 1 month ago

And then a (T, U) tuple would only be AtomicSized if it has the right size and both T and U are AtomicSized? That could work. It would also be a limitation since one can form a tuple of the right size from types that don't have the right size, like ([u8; 3], u8).

hmm, would that work? Given that the tuple is larger than T and U I feel like it's not obvious that copying (T, U) works if T and U can be individually handled by atomic operations.

RalfJung commented 1 month ago

Ah, my misunderstanding. I hadn't appreciated that provenance is propagated through uninitialized bytes!

Uninitialized bytes do not have provenance. However, MaybeUninit can hold any kind of byte -- uninit, init, and init with provenance. That's why it works as a universal container that can store any data without UB and without losing anything in the roundtrip.

hmm, would that work? Given that the tuple is larger than T and U I feel like it's not obvious that copying (T, U) works if T and U can be individually handled by atomic operations.

That's why I said "if it has the right size and both fields are AtomicSized".

I don't know which rules you had in mind, since this is not a structural property. But clearly if the concern is future compatibility, then to use (T, U) in an atomic type, the authors of T and U must opt-in somehow.

RalfJung commented 1 month ago

There is however a classic issue with atomic operations on types with padding -- how should compare_exchange be specified? Comparing uninitialized bytes is UB. So we'd have to some very specific stunts I think... like, only allow types where "which bytes are padding" is statically known (no unions, no enums), and then specify compare_exchange to only compare the non-padding bytes but also say that it can always spuriously fail if there are any padding bytes.

I am sure we already discussed this at some point, but I am not sure if we have an issue for that... @chorman0773 do you remember?

So in that sense the situation of having two pointers is simpler, since there are no padding bytes. OTOH, by virtue of this being pointers, the issue in https://github.com/rust-lang/unsafe-code-guidelines/issues/480 would come up but now there'd be two provenances to worry about...

Diggsey commented 1 month ago

how should compare_exchange be specified? Comparing uninitialized bytes is UB.

IMO there's only two sane options, either it should be UB to do the comparison, or it should be specified that input values are frozen before being compared, with spurious failures being the logical consequence of that.

bjorn3 commented 1 month ago

or it should be specified that input values are frozen before being compared, with spurious failures being the logical consequence of that.

That wouldn't have a forward progress guarantee as the compiler can always freeze it to the wrong value if the freeze happens as part of the compare_exchange. Freeze only requires the output of the freeze to be consistent between uses of the same freeze, not for it to be consistent between freezes.

https://llvm.org/docs/LangRef.html#freeze-instruction

All uses of a value returned by the same ‘freeze’ instruction are guaranteed to always observe the same value, while different ‘freeze’ instructions may yield different values.

Diggsey commented 1 month ago

That wouldn't have a forward progress guarantee as the compiler can always freeze it to the wrong value if the freeze happens as part of the compare_exchange

I think that's unavoidable if the compare/exchange loop happens in user code, since I don't think there's any guarantee that padding bytes are preserved?

Plausibly fetch_update could be made to work, since the implementation could make sure to use MaybeUninit to store the old value in that case.

bjorn3 commented 1 month ago

I think that's unavoidable if the compare/exchange loop happens in user code, since I don't think there's any guarantee that padding bytes are preserved?

If the user freezes themself once before the compare exchange loop rather than delegate it to compare_exchange there is no forward progress problem, but if compare_exchange does it, the compiler is allowed to produce an infinite loop.

chorman0773 commented 1 month ago

Depends on how its done. If the value is frozen into an aligned [u8;N] before being stored/compare_exchanged, then it will be frozen in memory.

On Thu, Jul 18, 2024 at 10:18 bjorn3 @.***> wrote:

I think that's unavoidable if the compare/exchange loop happens in user code, since I don't think there's any guarantee that padding bytes are preserved?

If the user freezes themself once before the compare exchange loop rather than delegate it to compare_exchange there is no forward progress problem, but if compare_exchange does it, the compiler is allowed to produce an infinite loop.

— Reply to this email directly, view it on GitHub https://github.com/rust-lang/unsafe-code-guidelines/issues/517#issuecomment-2236656763, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD243JNHTNGFBCKWX5STZM7FD5AVCNFSM6AAAAABLBGWXCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWGY2TMNZWGM . You are receiving this because you were mentioned.Message ID: @.***>

bjorn3 commented 1 month ago

If the current argument has padding, the compiler can always freeze the padding such that it doesn't match the actual value in the atomic, and thus force the compare_exchange to fail.

chorman0773 commented 1 month ago

You'd get a new current value though, so you could repeat.

On Thu, 18 Jul 2024 at 10:56, bjorn3 @.***> wrote:

If the current argument has padding, the compiler can always freeze the padding such that it doesn't match the actual value in the atomic, and thus force the compare_exchange to fail.

— Reply to this email directly, view it on GitHub https://github.com/rust-lang/unsafe-code-guidelines/issues/517#issuecomment-2236789387, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGLD2ZSFELSXPH744M7BM3ZM7JRLAVCNFSM6AAAAABLBGWXCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWG44DSMZYG4 . You are receiving this because you were mentioned.Message ID: @.***>

bjorn3 commented 1 month ago

Right, if a MaybeUninit is returned, that would indeed work.

chorman0773 commented 1 month ago

Or [u8;N]. I use this in my libatomic impl.

chorman0773 commented 1 month ago

FTR, the way xlang defines compare_exchange is that any atomic operation will freeze the padding bytes (and only the padding bytes) in any value it stores, then compare_exchange and any compound assignment is UB when the backing memory contains uninit bytes.

VorpalBlade commented 1 month ago

Could this operation instead be implemented only for types that lack padding? The user could always add an explicit "padding field" to fill out any padding (and for example initialise it to zero). Of course in that case you would need something like a compiler implemented marker trait for "contains no padding" so perhaps that is more work.

RalfJung commented 1 month ago

FTR, the way xlang defines compare_exchange is that any atomic operation will freeze the padding bytes (and only the padding bytes) in any value it stores, then compare_exchange and any compound assignment is UB when the backing memory contains uninit bytes.

That makes it unsound to turn an &mut TypeWithPadding into &Atomic<TypeWithPadding>... or at least, that operation can't be a NOP, it has to freeze.

But indeed, then we can have the invariant that an &Atomic<T> always points to fully initialized data, which alleviates the issue with liveness/progress around a freezing version of compare_exchange. (It does need a somewhat non-trivial compare_exchange though as was noted above, where if we get a "not equal" result, we compare what we actually saw with the intended current state via PartialEq::eq, and if those are equal we know that the "not equal" was caused by padding differences and we replace our intended result with the thing we just read. IIRC that's what corssbeam's AtomicCell does.)

chorman0773 commented 1 month ago

That makes it unsound to turn an &mut TypeWithPadding into &Atomic... or at least, that operation can't be a NOP, it has to freeze.

It would, yes. That's the downside here.

pitaj commented 1 month ago

What about something like this?

static FOO: AtomicSet<(AtomicUsize, AtomicPtr)> = AtomicSet::new((0, null()));

FOO.compare_exchange(...)

AtomicSet would only be implemented for permutations of std atomics that are guaranteed to fill a supported atomic (no padding).

VorpalBlade commented 1 month ago

What about something like this?
static FOO: AtomicSet<(AtomicUsize, AtomicPtr)> = AtomicSet::new((0, null()));

FOO.compare_exchange(...)
AtomicSet would only be implemented for permutations of std atomics that are guaranteed to fill a supported atomic (no padding).

This could in the future be extended neatly to support accessing the inner atomics as well, depending on how mixed size atomics in #345 gets decided (but either case is obviously not blocked on the other, just a nice potential future synergy).

Ddystopia commented 1 month ago

Hello, what's the reason compiler cannot just mask out (on assembly level) padding bytes/bits before atomic store? Applying mask on a word or two introduces a performance penalty? Or has problems with orderings/llvm?

RalfJung commented 1 month ago

For union/enum types, the padding bytes are not statically known.

comex commented 1 month ago

Plausibly fetch_update could be made to work, since the implementation could make sure to use MaybeUninit to store the old value in that case.

In my experience, when working with atomic structs, fetch_update is almost always what you want anyway. It would be nice to have it available without the additional padding restrictions.

eggyal commented 1 month ago

(It does need a somewhat non-trivial compare_exchange though as was noted above, where if we get a "not equal" result, we compare what we actually saw with the intended current state via PartialEq::eq, and if those are equal we know that the "not equal" was caused by padding differences and we replace our intended result with the thing we just read. IIRC that's what corssbeam's AtomicCell does.)

But how can we "replace our intended result with the thing we just read"? crossbeam's AtomicCell uses simple assignment:

https://github.com/crossbeam-rs/crossbeam/blob/2a82b619bef638f328776714ec7ccf022859dda2/crossbeam-utils/src/atomic/atomic_cell.rs#L1126-L1134

However, can we be assured that current = previous; won't be optimised away when it can be deduced that current == previous from the immediately preceding test/branch? Can we ever guarantee that we overwrite current with an equal value but different padding?

RalfJung commented 1 month ago

current and previous should have type MaybeUninit<T> to make sure all padding etc is preserved.

chorman0773 commented 1 month ago

Respectfully, MaybeUninit does not preserve padding any better than T does'

RalfJung commented 1 month ago

Ah of course... :facepalm: Yeah the type for that would be MaybeUninit<[u8; size_of::<T>()]>.

Crossbeam anyway does something else -- the relevant part is current_raw = previous_raw;, and that is also what the compare-exchange works on.

For a primitive operation supporting compare-exchange on any T however, what this means is that returning the old value as a T is not good enough since padding would be lost. It'd be better to make the operation take expected_val: &mut T; then the value at *expected_val can be updated in-place using bytewise operations, without losing padding.

Diggsey commented 1 month ago

Yeah the type for that would be MaybeUninit<[u8; size_of::()]>.

If the values have to be frozen for compare/exchange to work, then shouldn't the type be just [u8; size_of::<T>()]?

edit: I guess MaybeUninit does have a second purpose if "freeze" is a thing - to prevent safe code from accessing frozen bytes.

RalfJung commented 1 month ago

If the bytes can carry provenance, an array of u8 won't work.

Diggsey commented 1 month ago

If the bytes can carry provenance, an array of u8 won't work.

Of course 🤦

I think it's very confusing and inconsistent that MaybeUninit<StructWithPadding> does not preserve padding, but MaybeUninit<u8> does preserve provenance, despite u8 not having provenance.

Why is padding treated differently than provenance? If MaybeUninit is supposed to be exactly like T except that it allows uninitialized bytes, then it should not preserve provenance. If on the other hand MaybeUninit is supposed to act like a "bag of bytes" then it should also preserve padding.

chorman0773 commented 1 month ago

Padding is a property of the type - provenance is a property of a byte. MaybeUninit<T> probably shouldn't have padding bytes, but we promised that it has the same ABI as T, which only is possible if it has the same padding bytes as T (due to certain ABI requirements).

RalfJung commented 1 month ago

It's definitely confusing, and the underlying reason is that computers are terrible. :(

When we promised that MaybeUninit<T> is ABI-compatible with T, the people involved didn't know all the weird ABIs out there, so this it was only found out quite a bit later that this implies that not all bytes in the MaybeUninit can always be preserved. Maybe we should try to fix this by going back on this ABI promise, restricting it to types that don't have padding...

That's a separate discussion, though: https://github.com/rust-lang/unsafe-code-guidelines/issues/518

eggyal commented 1 month ago

I think I'm still not grasping something here.

If, for compare-exchange of an Atomic<T>, it is required that T: Eq (which seems to me to be the only sane interpretation of the operation), then is it not the case that:

since structs have static padding, successful byte-wise comparison can only occur when the non-padding bytes are equal;
since an enum's discriminant is never undef, successful byte-wise comparison can only occur if the discriminant byte(s) are equal—and then field bytes follow as for structs above: ie successful byte-wise comparison can only occur when the non-padding bytes are equal; and
for a union to implement Eq, PartialEq::cmp must be capable of determining the variant—and therefore any bytes required for that determination cannot be undef (at least when the compare-exchange is invoked, although because PartialEq::cmp is safe to call I should think this is always required?); consequently, successful byte-wise comparison can only occur if the variants are equal, and then non-padding bytes again follow as above.

It therefore seems to be the case that T: Eq implies that successful byte-wise comparison can only occur if non-padding bytes are equal, irrespective of any padding bytes?

Obviously byte-wise comparison may fail due to undef/padding bytes, but that is why one must then (as crossbeam does) call PartialEq::cmp to determine whether the failure is genuine (return to user) or spurious (reattempt with newly discovered current value, type-equal but byte-unequal to the original).

I suppose I'm struggling to understand why it would ever be necessary to freeze, or why this couldn't be implemented generically by the compiler (either with a compile-time size check, or requiring T: AtomicSized as discussed early in this issue).

Ddystopia commented 1 month ago

If, for compare-exchange of an Atomic, it is required that T: Eq (which seems to me to be the only sane interpretation of the operation)

for a union to implement Eq, PartialEq::cmp must be capable of determining the variant

I'm not sure how can we require T: Eq, because compare-exchange is a hardware-defined operation that requires a bitwise comparation and you can't run arbitrary PartialEq::cmp code to perform a comparation. For example, this obviously cannot work:

struct Foo(u8);
impl Eq for Foo;
impl PartialEq for Foo {
    fn eq(&self, other: &Foo) -> bool {
        std::thread::sleep_ms(1000);
        self.0.eq(&other.0)
    }
}

fn foo(foo: &Atomic<Foo>) {
    foo.compare_exchange(Foo(3), Foo(4), Relaxed, Relaxed);
}

eggyal commented 1 month ago

The library API Atomic::<T>::compare_exchange would not itself be an atomic operation, but would be guaranteed to use atomic operations on the underlying memory. PartialEq::cmp is called on (a copy of) the value loaded from that memory, not the memory itself.

comex commented 1 month ago

Obviously byte-wise comparison may fail due to undef/padding bytes, but that is why one must then (as crossbeam does) call PartialEq::cmp to determine whether the failure is genuine (return to user) or spurious (reattempt with newly discovered current value, type-equal but byte-unequal to the original).

The problem is that "reattempt with newly discovered current value" may never succeed if the current value in memory contains uninit bytes. Or in @chorman0773's formulation, it would be undefined behavior.

However, after re-reading the thread, I personally find the proposed solutions unsatisfying. Atomic*::from_mut is useful, so adding extra validity requirements on Atomic<T> compared to T would be a unfortunate loss of functionality.

And in this case, the problem being solved is purely an artifact of compiler optimizations. In general, uninit bytes are not just an artifact of optimizations because of MADV_DONTNEED/MADV_FREE, the special case where memory truly can 'change on its own' in the assembly-level abstract machine. But when it comes to compare-exchange loops specifically, it's okay if the memory changes on its own sometimes, as long as it doesn't do so every single time you try to read from it. You will still make forward progress eventually. And MADV_DONTNEED/MADV_FREE more than satisfies that criterion, since AFAIK it can cause any given byte to change at most once.

But I don't see any straightforward way to 'fix' this either, not without adding an ugly special case that LLVM and GCC may or may not end up respecting – that is, if they ever get around to adding optimizations for atomics. Right now they basically don't optimize them at all.

VorpalBlade commented 1 month ago

Wouldn't the "min" variation of this be to only support the no-padding case, and leave the door open to relax that in the future?

Would it be useful to move forward with such a proposal, or do the motivating use cases need padding? If so, why can't they just add manual "padding fields"? This seems to me to be entirely about ergonomics: yes it is nicer to support padding, but there is no use case that can't be rewritten to avoid needing padding.

pitaj commented 1 month ago

It's difficult if not impossible to support explicit padding with generics.

RalfJung commented 1 month ago

@eggyal it's a bit hard to reply since you haven't given concrete code. Let me try:

Obviously byte-wise comparison may fail due to undef/padding bytes, but that is why one must then (as crossbeam does) call PartialEq::cmp to determine whether the failure is genuine (return to user) or spurious (reattempt with newly discovered current value, type-equal but byte-unequal to the original).

The issue is that unini bytes behave a bit like bytes that take different values each time you read them. If the memory backing a (u8, u16) is 0x00UU0000 (using U to indicate "uninit"), and you do a compare-exchange with 0x00000000, you would usually get UB (can't compare uninit bytes) but if we say that atomics freeze both operands before comparing, let's say you get false and it says the actual value stored there is 0x00FF0000 (a valid result of freezing the in-memory value). So now you try again with 0x00FF0000, comparing that to the (unchanged) in-memory value 0x00UU0000, and again you get false and it says the actual return value is now 0x00AA0000 (another valid result of freezing the in-memory value). This can go on forever.

It's a somewhat theoretical concern because of course real uninit bytes are not unstable in this way, but if programmers want to reason about things like "does this concurrent program ever make progress", then we have to design a spec that doesn't have such "implicit" infinite loops, not even theoretically.

This can either be fixed by somehow saying that freeze "stabilizes" when it is tried sufficiently often) (but I have no idea how to do that), or by picking a different scheme for specifying these atomic operations.

RalfJung commented 1 month ago

Another option would be to atomically write back the frozen value after a failed compare-exchange. But usually a failed compare-exchange is a read-only operation so making it a full RMW could cause problems (such as extra data races).

rust-lang / unsafe-code-guidelines

Packing pointers into double-word width atomics #517