rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.97k stars 12.53k forks source link

Tracking Issue for `once_cell` #74465

Closed KodrAus closed 1 year ago

KodrAus commented 4 years ago

This is a tracking issue for the RFC "standard lazy types" (rust-lang/rfcs#2788). The feature gate for the issue is #![feature(once_cell)].

Unstable API

// core::lazy

pub struct OnceCell<T> { .. }

impl<T> OnceCell<T> {
    pub const fn new() -> OnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
}
impl<T> From<T> for OnceCell<T>;
impl<T> Default for OnceCell<T>;
impl<T: Clone> Clone for OnceCell<T>;
impl<T: PartialEq> PartialEq for OnceCell<T>;
impl<T: Eq> Eq for OnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for OnceCell<T>;

pub struct Lazy<T, F = fn() -> T> { .. }

impl<T, F> Lazy<T, F> {
    pub const fn new(init: F) -> Lazy<T, F>;
}
impl<T, F: FnOnce() -> T> Lazy<T, F> {
    pub fn force(this: &Lazy<T, F>) -> &T;
}
impl<T: Default> Default for Lazy<T>;
impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>;
impl<T: fmt::Debug, F> fmt::Debug for Lazy<T, F>;

// std::lazy

pub struct SyncOnceCell<T> { .. }

impl<T> SyncOnceCell<T> {
    pub const fn new() -> SyncOnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(mut self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
    fn is_initialized(&self) -> bool;
    fn initialize<F, E>(&self, f: F) -> Result<(), E> where F: FnOnce() -> Result<T, E>;
    unsafe fn get_unchecked(&self) -> &T;
    unsafe fn get_unchecked_mut(&mut self) -> &mut T;
}
impl<T> From<T> for SyncOnceCell<T>;
impl<T> Default for SyncOnceCell<T>;
impl<T: RefUnwindSafe + UnwindSafe> RefUnwindSafe for SyncOnceCell<T>;
impl<T: UnwindSafe> UnwindSafe for SyncOnceCell<T>;
impl<T: Clone> Clone for SyncOnceCell<T>;
impl<T: PartialEq> PartialEq for SyncOnceCell<T>;
impl<T: Eq> Eq for SyncOnceCell<T>;
unsafe impl<T: Sync + Send> Sync for SyncOnceCell<T>;
unsafe impl<T: Send> Send for SyncOnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for SyncOnceCell<T>;

pub struct SyncLazy<T, F = fn() -> T>;

impl<T, F> SyncLazy<T, F> {
    pub const fn new(f: F) -> SyncLazy<T, F>;
}
impl<T, F: FnOnce() -> T> SyncLazy<T, F> {
    pub fn force(this: &SyncLazy<T, F>) -> &T;
}
impl<T, F: FnOnce() -> T> Deref for SyncLazy<T, F>;
impl<T: Default> Default for SyncLazy<T>;
impl<T, F: UnwindSafe> RefUnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: RefUnwindSafe;
impl<T, F: UnwindSafe> UnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: UnwindSafe;
unsafe impl<T, F: Send> Sync for SyncLazy<T, F> where SyncOnceCell<T>: Sync;
impl<T: fmt::Debug, F> fmt::Debug for SyncLazy<T, F>;

Steps

Unresolved Questions

Inlined from #72414:

Implementation history

briansmith commented 1 year ago

Somewhere I found mention of deprecation of Once if/when once_cell becomes stable. Once has an important guarantee that the new types don't make: "When this function returns [it] is also guaranteed that any memory writes performed by the executed closure can be reliably observed by other threads at this point (there is a happens-before relation between the closure and code executing after the return)." See https://github.com/matklad/once_cell/issues/83 for more discussion of this.

Basically, it isn't clear if these new types intend to guarantee anything stronger than "Consume" semantics, regardless of whether the present implementation may (or may not) implement stronger semantics.

[Edited to add] In the once_cell crate's issue, @matklad basically said once_cell has the stronger semantics, so if the standard library variant doesn't have the same guarantee, then that needs to be called out as a potential reason to NOT switch from the once_cell crate, as there are probably users depending on the stronger semantics.

briansmith commented 1 year ago

I have a codebase where Rust code that ensures a global static mut GLOBAL_STATE: [u32; 4] array is initialized before calling into C and assembly code code that directly reads GLOBAL_STATE on the assumption it has been initialized beforehand. I investigated adapting this codebase to use the new feature. I see this:

pub struct OnceLock<T> {
    once: Once,
    // Whether or not the value is initialized is tracked by `state_and_queue`.
    value: UnsafeCell<MaybeUninit<T>>,

In order to support this pattern, it would be convenient to make it so a OnceLock<T> can be accessed from non-Rust code as though it were a T. That is, it would be great to guarantee that the address of the OnceLock is equal to the address of its value; i.e. make value the first field in the structure, make OnceLock be repr "C" or equivalent, and ensure value is #[repr(transparent)].

thomcc commented 1 year ago

Basically, it isn't clear if these new types intend to guarantee anything stronger than "Consume" semantics, regardless of whether the present implementation may (or may not) implement stronger semantics.

I think anything not providing Acquire/Release semantics would be too much of a footgun.

If we ever get some way of emulating Consume, user code can implement its own types that provide that instead, similar to how user code that wants a racy or relaxed behavior currently would need to implement that itself.


That is, it would be great to guarantee that the address of the OnceLock is equal to the address of its value; i.e. make value the first field in the structure, make OnceLock be repr "C" or equivalent, and ensure value is #[repr(transparent)].

I'm a lot less sure about this though. We don't currently do this for any stdlib types (aside from trivial cases like NonZeroFoo and such), do we?

briansmith commented 1 year ago

I'm a lot less sure about this though. We don't currently do this for any stdlib types (aside from trivial cases like NonZeroFoo and such), do we?

UnsafeCell itself is repr(transparent) so it has this property, IIUC.

thomcc commented 1 year ago

Do we ever guarantee the representation of something that isn't purely a wrapper? This would need additional state.

Stargateur commented 1 year ago

No, repr(C) guarantee order, but compatibility only with most used layout of C compilers of the target.

thomcc commented 1 year ago

I meant if any existing types in the stdlib did this that aren't purely wrappers around other types.

ibraheemdev commented 1 year ago

Probably worth cross-linking between the sync and unsync versions in their docs.

tgross35 commented 1 year ago

Having some brief discussion with Thom on Zulip https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/moving.20once.20cell.20forward it seems like it makes sense to guarantee acquire/release at this point, and possibly add a way to specify consume if there ever is a use. Which there very well may never be.

The three open issues from the top post are:

The RFC is still open, but it's fairly dependent on the discussion at this issue. It probably just needs some minor updates to the current decisions and then can be merged.

Is there anything blocking FCP aside from the RFC?

matklad commented 1 year ago

For lazy, there’s https://github.com/rust-lang/rust/pull/103718.

I think we should fcp something here, yeah.

my personal preference would be for FCPing a minimal subset first (only once cell, no lazy), the api surface is large, we might easily overlook some annoying detail if we try to rubber-stump whatever there is now.

WhyNotHugo commented 1 year ago

Not sure if this is the right place to point this out: the documentation for LazyLock reads:

This type is a thread-safe Lazy, and can be used in statics.

However, there's no type called Lazy. This should probably refer to LazyCell instead?

eggyal commented 1 year ago

Could OnceCell<T> be #[repr(transparent)] with its layout documented to be identical to Option<T>? I would like to initialise a OnceCell from FFI without calling back into Rust, by writing directly to the memory location.

bjorn3 commented 1 year ago

Option<T> doesn't have a defined layout either except for Option<&T>, Option<NonNullU*> and the like.

eggyal commented 1 year ago

Indeed. But in those cases it's useful: Option<&T> in my case.

eggyal commented 1 year ago

Actually, the same applies to UnsafeCell, SyncUnsafeCell and Cell: whilst they are already #[repr(transparent)], their internal layout is undocumented.

tgross35 commented 1 year ago

I think repr(transparant) would be a bit misleading since it imho kind of guides users to assume that OnceCell<T> has the same layout as T, not Option<T> (the three examples mentioned all have the layout of T).

The get and get_mut methods on OnceCell can be used to get pointers to the inner value, would that work in your case?

briansmith commented 1 year ago

No, repr(C) guarantee order, but compatibility only with most used layout of C compilers of the target.

The C standards require the address of the first field to be the address of the structure, which is why I suggested #[repr(c)], putting the value at the start of the field, and avoiding using any non-transparent wrappers like Option around the field.

Anyway, I don't have a strong opinion about whether to do extra work to support the ability of non-Rust code to be able to access the value.

tgross35 commented 1 year ago

I don't think there's much benefit to providing any sort of guarantee on internal layout - any alternative to Option means mimicing its behavior in a separate place, and losing optimizations geared at Option (e.g. niches).

For any Rust + C project that already has a good reason to use a Rust OnceCell, I really think the correct solution is to useget(), get_mut(), get_or_init(), etc. and wrap them in something extern "C", or pass their result to C as applicable. Otherwise, you're just rewriting those exact functions in C

tgross35 commented 1 year ago

I've opened partial stabilization PR #105587 for OnceCell and OnceLock, I believe a FCP for those would be next

SUPERCILEX commented 1 year ago

Can we add fn into_inner(self) -> Option<T> to LazyCell? That'd be helpful when doing things only if the LazyCell fired.

eggyal commented 1 year ago

Can we add fn into_inner(self) -> Option<T> to LazyCell? That'd be helpful when doing things only if the LazyCell fired.

Wouldn't fn is_initialized(this: &Self) -> bool or perhaps fn get(this: &Self) -> Option<&T> be more useful, as they don't require taking ownership? (As with other smart pointers, these are associated functions to avoid conflicts with methods of the inner type T).

SUPERCILEX commented 1 year ago

Maybe for others, but for my use case I specifically need to take ownership.

SUPERCILEX commented 1 year ago

Went ahead and opened a PR: https://github.com/rust-lang/rust/pull/106152

SimonSapin commented 1 year ago

If you have ownership is it useful to have a LazyCell at all instead of Option? Or is there some scenario where you first need to initialize through a shared reference, and later recover full ownership?

SUPERCILEX commented 1 year ago

Well yeah but then I have to manage lazy initialization myself.

SimonSapin commented 1 year ago

Like this? https://doc.rust-lang.org/std/option/enum.Option.html#method.get_or_insert_with

SUPERCILEX commented 1 year ago

TIL! That's ever so slightly more annoying b/c you have to carry around the option and closure separately (or wrap them in your own type), but I'd be ok with having my PR closed if we think you should use this instead.

elichai commented 1 year ago

I've added non-blocking flavors of the primitives to the once_cell crate: https://docs.rs/once_cell/1.5.1/once_cell/race/index.html. They are restricted (can be provided only for atomic types), but are compatible with no_std.

FWIW, I use once_cell a lot for initializing a cryptographic context, and usually, the racy option is the one that I actually want (parking a thread can be more expensive than just initializing the context)

This kind of use case might disappear once const fn becomes powerful enough that I'll be able to initialize these at compile time

ydewit commented 1 year ago

I have a question about using get_or_init in the OnceLock struct.

My use case involves two Tasks producing values (lhs and rhs), and I need to reduce the redex once both values are available. The order of these values becoming available is unknown.

Could OnceLock be used here? I was thinking of using one OnceLock shared between two Tasks, and once a Task produces its value, it calls get_or_init. If the OnceLock is empty, it will be set, otherwise, I would get the existing value. However, I am not sure how to determine which value was returned in order to process the redex.

As I understand it, after the get_or_init call, the (Boxed) value will be moved, and I can't compare pointers.

My question is: could get_or_init take an Option with the current value as a parameter, or is there a way to map over OnceLock to either use or set its value?

ydewit commented 1 year ago

It just realized that in my specific case, one of the values has a positive polarity and the other one negative. So I can use OnceLock as is by checking the polarity of the cell in the OnceLock. In any case, I think the question above still holds.

kpark-hrp commented 1 year ago

There appears to be a unexpected behavior when using const with Lazy. Issue: https://github.com/matklad/once_cell/issues/224

With const variable, Lazy can get evaluated multiple times and return different results. I believe the expected behavior should be one of the following.

jhpratt commented 1 year ago

That is simply how consts work. There is no reason it should not compile.

eggyal commented 1 year ago

With const variable, Lazy can get evaluated multiple times and return different results.

That's pretty much what I would expect per the documentation of const:

const items looks remarkably similar to static items, which introduces some confusion as to which one should be used at which times. To put it simply, constants are inlined wherever they’re used, making using them identical to simply replacing the name of the const with its value. Static variables, on the other hand, point to a single location in memory, which all accesses share. This means that, unlike with constants, they can’t have destructors, and act as a single value across the entire codebase.

Perhaps you're after a static instead?

jplatte commented 1 year ago

Note that there is also a clippy lint for consts with interior mutability, which should fire if you use Lazy in a const: https://rust-lang.github.io/rust-clippy/master/index.html#declare_interior_mutable_const

matklad commented 1 year ago

Closing as once_cell was stabilized in https://github.com/rust-lang/rust/pull/105587. The two follow up issues are

mina86 commented 1 year ago

Is there a plan or tracking issue for OnceLock::wait?

matklad commented 1 year ago

No, but it seems to me that it should be added (under a separate feature gate&tracking issue).

tisonkun commented 1 year ago

Can we have a get_mut_or_init?

SimonSapin commented 1 year ago

OnceCell::get_mut_or_init would only be safe if you have &mut self. But if you have exclusive access when initializing, do you even need a cell in the first place? Couldn’t you use a plain Option and initialize it with get_or_insert_with?

cuviper commented 1 year ago

Can't you say the same about get_mut? But it may be a conditional situation, where sometimes you have the necessary information to initialize while you have exclusive access, and other times you need to do it later while shared.

tisonkun commented 1 year ago

OnceCell::get_mut_or_init would only be safe if you have &mut self. But if you have exclusive access when initializing, do you even need a cell in the first place? Couldn’t you use a plain Option and initialize it with get_or_insert_with?

Because not all accessors are get_mut_or_init. Basically, I need a get_or_init, but sometimes I want a mut reference and am unsure if it's initialized.

You can read this code snippet and see if there is a better solution than providing a get_mut_or_init:

// batches: OnceCell<Vec<RecordBatch>>,

    pub fn mut_batches(&mut self) -> IterMut<'_, RecordBatch> {
        self.batches.get_or_init(|| load_batches(&self.buf));
        // SAFETY - init above
        unsafe { self.batches.get_mut().unwrap_unchecked() }.iter_mut()
    }

    pub fn batches(&self) -> Iter<'_, RecordBatch> {
        self.batches.get_or_init(|| load_batches(&self.buf)).iter()
    }

https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/kafka-api/src/record.rs#L108-L116

tisonkun commented 1 year ago

But you're right that without concurrent calls an Option + get_or_insert_with may work.

tisonkun commented 1 year ago

No. I need to guard shared non-mut access to batches so I still need a get_mut_or_init.

Otherwise,

    pub fn batches(&self) -> Iter<'_, RecordBatch> {
        self.batches
            .get_or_insert_with(|| load_batches(&self.buf))
            .iter()
    }

failed to compile: Cannot borrow immutable local variable `self.batches` as mutable.

OnceCell impl !Sync and can support such interior mutability