Closed pitdicker closed 2 years ago
Added a commit that does the same thing for the parking_lot
implementation.
This definitely sounds really exciting! I must confess that the only thing I know about Consume
ordering is that it exists. So, I won't be able to review this for this reason (and there's a problem that I have a ton of PRs to review at rust-analyzer as well...).
I wonder what's the best way to proceed is here? Even if I were able to convince myself that consume is OK here, I wouldn't feel confident just changing the implementation. I want once_cell to be, first and foremost, conservative and dependable crate, and doing some weak memory order shenanigans seems risky.
At the same time, I would very much like to unlock and encourage experimentation here. Perhaps we can hide this behind the feature flag? That way, we can actually publish this for folks to easily try it out
I want once_cell to be, first and foremost, conservative and dependable crate, and doing some weak memory order shenanigans seems risky.
Big :+1:.
Shall I make a separate PR with the two extra tests/benchmarks first?
I think https://github.com/rust-lang/rust/pull/65719 should allow others to take care of some review work. If the standard library likes some of the changes, that would give a little more confidence for changes to the atomic orderings in once_cell
? I would like to make a PR at some point that at least removes the UB from aliasing a mutable pointer.
At the same time, I would very much like to unlock and encourage experimentation here. Perhaps we can hide this behind the feature flag? That way, we can actually publish this for folks to easily try it out
Thank you, good to hear!
Using consume ordering requires a different abstraction between lib.rs
and imp_std
/imp_pl
. Shall I try to figure something out that works for the current implementation and for something using consume, and make a PR for that first?
BTW, what do you think about a vendored MaybeUnit
module like parking_lot
now has? Otherwise I will probably have to duplicate the unchecked unwrapping of an option in a couple of places, or make it into some function. It seems nicer to instead move towards what the future is going to be.
and there's a problem that I have a ton of PRs to review at rust-analyzer as well...
Thank you for all the work. This is just a bit of a hobby for me.
Shall I make a separate PR with the two extra tests/benchmarks first?
Yep, tests and benches are always useful!
I would like to make a PR at some point that at least removes the UB from aliasing a mutable pointer.
:+1:
Using consume ordering requires a different abstraction between lib.rs and imp_std/imp_pl. Shall I try to figure something out that works for the current implementation and for something using consume, and make a PR for that first?
I think I would prefer for this to be a completely separate module under a cfg
flag, even it means some code duplication. So, cfg
-ing top-level sync
module seems like it would do the trick.
BTW, what do you think about a vendored MaybeUnit module like parking_lot now has?
I see that is has T: Copy
bound, so I feel like we won't be able to use this. But if it works, :+1: from me.
Thank you for all the work. This is just a bit of a hobby for me.
I believe you have commit access to the repo, so feel free about moving this project in directions which you find interesting, as long as it requires opt-in via a feature flag :) For things that change the "official" implementation I hope to provide reviews without too much lag. Also, how would you feel about implementing the once_cell RFC (sending PR to rust-lang rust and pestering bors and T-lib until it is merged) ^^?
I think I would prefer for this to be a completely separate module under a
cfg
flag, even it means some code duplication. So,cfg
-ing top-levelsync
module seems like it would do the trick.
I hope the necessary changes are minor, but that is a good option also. Would you be okay with splitting the unsync
and sync
modules out of lib.rs
?
I see that is has
T: Copy
bound, so I feel like we won't be able to use this. But if it works, :+1: from me.
I hit that problem right around the time of your comment. Now I took an approach where a vendored MaybeUninit
is just a wrapper around Option
. Terrible.
I believe you have commit access to the repo, so feel free about moving this project in directions which you find interesting, as long as it requires opt-in via a feature flag :) For things that change the "official" implementation I hope to provide reviews without too much lag.
Thank you! Lets start small...
Also, how would you feel about implementing the once_cell RFC (sending PR to rust-lang rust and pestering bors and T-lib until it is merged) ^^?
No promises yet, but that sounds like fun :smile:. But after some experimenting first.
I'd like to come back to this! #83 made me think that this is actually an API-design question!
Specifically, consider this program:
static FLAG: AtomicBool = AtomicBool::new(false);
static CELL: OnceCell<()> = OnceCell::new();
// thread1
CELL.get_or_init(|| FLAG.store(true, Relaxed));
// thread2
if CELL.get().is_some() {
assert!(FLAG.load(Relaxed))
}
With Acquire
, the assert will never fire, but with Consume
it could (at least, as far as my understanding of memory models goes)! So what synchronization we use is actually a part of the public API of the crate, and I think how "why not both" is useful.
Specifically, I think it's important that we do provide Acquire
guarantee by default, but maybe we should have a way to opt out of it. I think a good practical way do do this would be to add a separate top-level consume
module, which provides the same signature as sync
, but with weaker synchronization gurantees. We'd also use an optional dependecy on crossbeam util to get AtomicConsume
.
A nice property of this plan is that, as this becomes opt-in for the user, I won't be worrying about breaking clients by relying on implementation-defined semantics. @pitdicker wdyt?
That is a good issue. I wrote my reply on the RFC thread too soon, I'll think some more.
Another reason for consume semantics to be optional is that it is very shady territory as far as language specifications go. There's a reason why Rust does not expose Ordering::Consume
natively. Even in the C++ world, compilers currently implement memory_order_consume as an alias for memory_order_acquire, and cppreference currently features the following ominous statement on its memory model page:
The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged.
In particular, the implementation that is proposed in this PR is incorrect, because it fails to take compiler optimizations into account. Given this kind of pseudocode, which the PR's definition of consume morally reduces to...
let ptr = atomic_ptr.load(Ordering::Relaxed);
let dependent = unsafe { *ptr };
...the compiler is allowed to optimize out the load from "ptr", either partially or entirely, and use cached data from previous loads instead. This would violate the synchronization guarantees that consume ordering intends to provide.
Therefore, although this code currently works, it is not guaranteed to keep working in future compiler releases or in all use cases of OnceCell (e.g. for all type parameters T, for all surrounding codes around OnceCell calls...), and should therefore not be used.
Crossbeam's AtomicConsume tries to work around this problem by using an Acquire compiler_fence, which should resolve the problem in most cases, but...
array[idx-idx]
depends on idx
". What about situations where it's "obvious" to the compiler that a load from memory can be elided? Should you use volatile loads all the time just to be safe?TL;DR: Consume is currently a memory model specification farce. If you truly need it for performance, please at least make it optional and use (something like) crossbeam's implementation, which tries to tell the compiler that you're doing something shady.
The Atomic ordering consume is meant for data structures that are rarely written but often read. That sounds like a description of
once_cell
:smile:. On all modern processors it can be implemented without using any synchronization fences.This PR is an experiment, to see if it really gives a performance improvement. This is very much a proof of concept, not cleaned up or intended for merging.
To use consume ordering, there has to be a data dependency between the atomic and the data it synchronizes. In other words: without reading the atomic it must be impossible for this thread to know where the data lives that has to be synchronized.
The queue of
Waiters
in the std implementation is a perfect fit: the atomic contains a pointer to the first node, and every node contains a pointer to the next. This is like an example from the book of a data dependency chain that could be used with Consume. But as that part is not performance sensitive, we can just leave it as is.To create a data dependency of
value
onstate
, I madestate
encode an offset from the address of theOnceCell
struct to thevalue
field. Encoding a raw pointer is not possible, as theOnceCell
can be moved. This trick relies on the compiler not to change the layout of an initialized struct (which seems like something we can rely on).Rust doesn't expose an
Ordering::Consume
. But it doesn't need anything special: just usingOrdering::Relaxed
is enough, and is what Crossbeam uses. We just have to next use the atomic as a pointer, array index, pointer offset, etc.To test the ordering works, I added a
break_it
test inexamples
. It can reliably detect unsufficent synchronization when I run it in release mode on my phone, which is easy using dinghy. If I intentionally reduce the synchronization of the current implementation, it will detect a few races. But with consume it works. To make sure it doesn't work by accident, I tuned down all other orderings to the lowest possible for now.In the existing benchmark that does nothing but read from a
OnceCell
in a loop, there is not any real difference I can measure. In a changed benchmark where all threads also do some work, like writing some data, a difference becomes more clear. This is a result of 10 runs of the modifiedbench.rs
:To make
state
in the std implementation encode everything, I set up the following scheme:OnceCell
is initialized cheap: just a comparison against zero, with no bitmasks or even another integer to compare against.#[repr(align)]
is 2^29. And with large alignments, the compiler currently changes the layout of the struct to havestate
aftervalue
, which gives an offset of 0. Anyway, reserving this whole range seems like a safe enough choice.OnceCell
is not initialized. We can control the alignment of theWaiter
nodes, which I set to 8. This causes the pointer to aWaiter
node to be in the range 8..usize::max. after encoding it with-(waiterptr >> 1)
it will be in the range -(4..isize::max). That still leaves -1, -2 and -3 to encode the states EMPTY and RUNNING (without other threads waiting).I will start cleaning the code up, and add more comments. Don't put too much time in reviewing yet. But I am interested in your opinion! Is this worth pursuing?