Closed KodrAus closed 1 year ago
Somewhere I found mention of deprecation of Once
if/when once_cell
becomes stable. Once
has an important guarantee that the new types don't make: "When this function returns [it] is also guaranteed that any memory writes performed by the executed closure can be reliably observed by other threads at this point (there is a happens-before relation between the closure and code executing after the return)." See https://github.com/matklad/once_cell/issues/83 for more discussion of this.
Basically, it isn't clear if these new types intend to guarantee anything stronger than "Consume" semantics, regardless of whether the present implementation may (or may not) implement stronger semantics.
[Edited to add] In the once_cell
crate's issue, @matklad basically said once_cell
has the stronger semantics, so if the standard library variant doesn't have the same guarantee, then that needs to be called out as a potential reason to NOT switch from the once_cell
crate, as there are probably users depending on the stronger semantics.
I have a codebase where Rust code that ensures a global static mut GLOBAL_STATE: [u32; 4]
array is initialized before calling into C and assembly code code that directly reads GLOBAL_STATE
on the assumption it has been initialized beforehand. I investigated adapting this codebase to use the new feature. I see this:
pub struct OnceLock<T> {
once: Once,
// Whether or not the value is initialized is tracked by `state_and_queue`.
value: UnsafeCell<MaybeUninit<T>>,
In order to support this pattern, it would be convenient to make it so a OnceLock<T>
can be accessed from non-Rust code as though it were a T
. That is, it would be great to guarantee that the address of the OnceLock
is equal to the address of its value
; i.e. make value
the first field in the structure, make OnceLock
be repr "C"
or equivalent, and ensure value
is #[repr(transparent)]
.
Basically, it isn't clear if these new types intend to guarantee anything stronger than "Consume" semantics, regardless of whether the present implementation may (or may not) implement stronger semantics.
I think anything not providing Acquire/Release semantics would be too much of a footgun.
If we ever get some way of emulating Consume, user code can implement its own types that provide that instead, similar to how user code that wants a racy or relaxed behavior currently would need to implement that itself.
That is, it would be great to guarantee that the address of the
OnceLock
is equal to the address of itsvalue
; i.e. makevalue
the first field in the structure, makeOnceLock
berepr "C"
or equivalent, and ensurevalue
is#[repr(transparent)]
.
I'm a lot less sure about this though. We don't currently do this for any stdlib types (aside from trivial cases like NonZeroFoo and such), do we?
I'm a lot less sure about this though. We don't currently do this for any stdlib types (aside from trivial cases like NonZeroFoo and such), do we?
UnsafeCell
itself is repr(transparent)
so it has this property, IIUC.
Do we ever guarantee the representation of something that isn't purely a wrapper? This would need additional state.
No, repr(C)
guarantee order, but compatibility only with most used layout of C compilers of the target.
I meant if any existing types in the stdlib did this that aren't purely wrappers around other types.
Probably worth cross-linking between the sync and unsync versions in their docs.
Having some brief discussion with Thom on Zulip https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/moving.20once.20cell.20forward it seems like it makes sense to guarantee acquire/release at this point, and possibly add a way to specify consume
if there ever is a use. Which there very well may never be.
The three open issues from the top post are:
Consume
at this point)The RFC is still open, but it's fairly dependent on the discussion at this issue. It probably just needs some minor updates to the current decisions and then can be merged.
Is there anything blocking FCP aside from the RFC?
For lazy, there’s https://github.com/rust-lang/rust/pull/103718.
I think we should fcp something here, yeah.
my personal preference would be for FCPing a minimal subset first (only once cell, no lazy), the api surface is large, we might easily overlook some annoying detail if we try to rubber-stump whatever there is now.
Not sure if this is the right place to point this out: the documentation for LazyLock reads:
This type is a thread-safe
Lazy
, and can be used in statics.
However, there's no type called Lazy
. This should probably refer to LazyCell
instead?
Could OnceCell<T>
be #[repr(transparent)]
with its layout documented to be identical to Option<T>
? I would like to initialise a OnceCell
from FFI without calling back into Rust, by writing directly to the memory location.
Option<T>
doesn't have a defined layout either except for Option<&T>
, Option<NonNullU*>
and the like.
Indeed. But in those cases it's useful: Option<&T>
in my case.
Actually, the same applies to UnsafeCell
, SyncUnsafeCell
and Cell
: whilst they are already #[repr(transparent)]
, their internal layout is undocumented.
I think repr(transparant)
would be a bit misleading since it imho kind of guides users to assume that OnceCell<T>
has the same layout as T
, not Option<T>
(the three examples mentioned all have the layout of T
).
The get
and get_mut
methods on OnceCell
can be used to get pointers to the inner value, would that work in your case?
No,
repr(C)
guarantee order, but compatibility only with most used layout of C compilers of the target.
The C standards require the address of the first field to be the address of the structure, which is why I suggested #[repr(c)]
, putting the value at the start of the field, and avoiding using any non-transparent wrappers like Option
around the field.
Anyway, I don't have a strong opinion about whether to do extra work to support the ability of non-Rust code to be able to access the value.
I don't think there's much benefit to providing any sort of guarantee on internal layout - any alternative to Option
means mimicing its behavior in a separate place, and losing optimizations geared at Option
(e.g. niches).
For any Rust + C project that already has a good reason to use a Rust OnceCell
, I really think the correct solution is to useget()
, get_mut()
, get_or_init()
, etc. and wrap them in something extern "C"
, or pass their result to C as applicable. Otherwise, you're just rewriting those exact functions in C
I've opened partial stabilization PR #105587 for OnceCell
and OnceLock
, I believe a FCP for those would be next
Can we add fn into_inner(self) -> Option<T>
to LazyCell? That'd be helpful when doing things only if the LazyCell fired.
Can we add
fn into_inner(self) -> Option<T>
to LazyCell? That'd be helpful when doing things only if the LazyCell fired.
Wouldn't fn is_initialized(this: &Self) -> bool
or perhaps fn get(this: &Self) -> Option<&T>
be more useful, as they don't require taking ownership? (As with other smart pointers, these are associated functions to avoid conflicts with methods of the inner type T
).
Maybe for others, but for my use case I specifically need to take ownership.
Went ahead and opened a PR: https://github.com/rust-lang/rust/pull/106152
If you have ownership is it useful to have a LazyCell
at all instead of Option
? Or is there some scenario where you first need to initialize through a shared reference, and later recover full ownership?
Well yeah but then I have to manage lazy initialization myself.
TIL! That's ever so slightly more annoying b/c you have to carry around the option and closure separately (or wrap them in your own type), but I'd be ok with having my PR closed if we think you should use this instead.
I've added non-blocking flavors of the primitives to the once_cell crate: https://docs.rs/once_cell/1.5.1/once_cell/race/index.html. They are restricted (can be provided only for atomic types), but are compatible with no_std.
FWIW, I use once_cell
a lot for initializing a cryptographic context, and usually, the racy
option is the one that I actually want (parking a thread can be more expensive than just initializing the context)
This kind of use case might disappear once const fn
becomes powerful enough that I'll be able to initialize these at compile time
I have a question about using get_or_init in the OnceLock struct.
My use case involves two Tasks producing values (lhs and rhs), and I need to reduce the redex once both values are available. The order of these values becoming available is unknown.
Could OnceLock be used here? I was thinking of using one OnceLock shared between two Tasks, and once a Task produces its value, it calls get_or_init. If the OnceLock is empty, it will be set, otherwise, I would get the existing value. However, I am not sure how to determine which value was returned in order to process the redex.
As I understand it, after the get_or_init call, the (Boxed) value will be moved, and I can't compare pointers.
My question is: could get_or_init take an Option with the current value as a parameter, or is there a way to map over OnceLock to either use or set its value?
It just realized that in my specific case, one of the values has a positive polarity and the other one negative. So I can use OnceLock as is by checking the polarity of the cell in the OnceLock. In any case, I think the question above still holds.
There appears to be a unexpected behavior when using const
with Lazy
. Issue: https://github.com/matklad/once_cell/issues/224
With const
variable, Lazy
can get evaluated multiple times and return different results.
I believe the expected behavior should be one of the following.
Lazy
is evaluated once, and the result is shared.That is simply how const
s work. There is no reason it should not compile.
With
const
variable,Lazy
can get evaluated multiple times and return different results.
That's pretty much what I would expect per the documentation of const
:
const
items looks remarkably similar tostatic
items, which introduces some confusion as to which one should be used at which times. To put it simply, constants are inlined wherever they’re used, making using them identical to simply replacing the name of theconst
with its value. Static variables, on the other hand, point to a single location in memory, which all accesses share. This means that, unlike with constants, they can’t have destructors, and act as a single value across the entire codebase.
Perhaps you're after a static
instead?
Note that there is also a clippy lint for const
s with interior mutability, which should fire if you use Lazy
in a const
: https://rust-lang.github.io/rust-clippy/master/index.html#declare_interior_mutable_const
Closing as once_cell was stabilized in https://github.com/rust-lang/rust/pull/105587. The two follow up issues are
get_or_try_init
which might or might not interact with the Try trait. Is there a plan or tracking issue for OnceLock::wait?
No, but it seems to me that it should be added (under a separate feature gate&tracking issue).
Can we have a get_mut_or_init
?
OnceCell::get_mut_or_init
would only be safe if you have &mut self
. But if you have exclusive access when initializing, do you even need a cell in the first place? Couldn’t you use a plain Option
and initialize it with get_or_insert_with
?
Can't you say the same about get_mut
? But it may be a conditional situation, where sometimes you have the necessary information to initialize while you have exclusive access, and other times you need to do it later while shared.
OnceCell::get_mut_or_init would only be safe if you have &mut self. But if you have exclusive access when initializing, do you even need a cell in the first place? Couldn’t you use a plain Option and initialize it with get_or_insert_with?
Because not all accessors are get_mut_or_init
. Basically, I need a get_or_init
, but sometimes I want a mut
reference and am unsure if it's initialized.
You can read this code snippet and see if there is a better solution than providing a get_mut_or_init
:
// batches: OnceCell<Vec<RecordBatch>>,
pub fn mut_batches(&mut self) -> IterMut<'_, RecordBatch> {
self.batches.get_or_init(|| load_batches(&self.buf));
// SAFETY - init above
unsafe { self.batches.get_mut().unwrap_unchecked() }.iter_mut()
}
pub fn batches(&self) -> Iter<'_, RecordBatch> {
self.batches.get_or_init(|| load_batches(&self.buf)).iter()
}
But you're right that without concurrent calls an Option
+ get_or_insert_with
may work.
No. I need to guard shared non-mut access to batches
so I still need a get_mut_or_init.
Otherwise,
pub fn batches(&self) -> Iter<'_, RecordBatch> {
self.batches
.get_or_insert_with(|| load_batches(&self.buf))
.iter()
}
failed to compile: Cannot borrow immutable local variable `self.batches` as mutable
.
OnceCell impl !Sync
and can support such interior mutability
This is a tracking issue for the RFC "standard lazy types" (rust-lang/rfcs#2788). The feature gate for the issue is
#![feature(once_cell)]
.Unstable API
Steps
Unresolved Questions
Inlined from #72414:
Sync
prefix likeSyncLazy
for now, but have a personal preference forAtomic
likeAtomicLazy
. Resolved in: https://github.com/rust-lang/rust/issues/74465#issuecomment-1098359963. Surprisingly, after more than a year of deliberation we actually found a better name.std::sync
types that we might want to just avoid upfront forstd::lazy
, especially if that would align with a futurestd::mutex
that doesn't poison. Personally, if we're adding these types tostd::lazy
instead ofstd::sync
, I'd be on-board with not worrying about poisoning instd::lazy
, and potentially deprecatingstd::sync::Once
andlazy_static
in favour ofstd::lazy
down the track if it's possible, rather than attempting to replicate their behavior. cc @Amanieu @sfackler.SyncOnceCell::get
blocking. There doesn't seem to be consensus in the linked PR on whether or not that's strictly better than the non-blocking variant. (resolved in https://github.com/rust-lang/rust/issues/74465#issuecomment-663414310).Release/Acquire
, but it could also use the elusive Consume ordering. Should we spec that we guaranteeRelease/Acquire
? (resolved as yes: consume ordering is not defined enough to merit inclusion into std)SyncOnceCell
in no_std. I think there's consensus that we don't want to include "blocking" parts of API, but it's unclear if non-blocking subset (get+set) would be useful. (resolved in https://github.com/rust-lang/rust/issues/74465#issuecomment-725360596).get_or[_try]_init
the best name? (resolved as yes in https://github.com/rust-lang/rust/pull/107184)Implementation history
68198 (closed in favor of #72414)
72414 initial imlementation
74814 fixed
UnwindSafe
bounds