Tracking Issue for `once_cell`

KodrAus commented 4 years ago

This is a tracking issue for the RFC "standard lazy types" (rust-lang/rfcs#2788). The feature gate for the issue is #![feature(once_cell)].

Unstable API

// core::lazy

pub struct OnceCell<T> { .. }

impl<T> OnceCell<T> {
    pub const fn new() -> OnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
}
impl<T> From<T> for OnceCell<T>;
impl<T> Default for OnceCell<T>;
impl<T: Clone> Clone for OnceCell<T>;
impl<T: PartialEq> PartialEq for OnceCell<T>;
impl<T: Eq> Eq for OnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for OnceCell<T>;

pub struct Lazy<T, F = fn() -> T> { .. }

impl<T, F> Lazy<T, F> {
    pub const fn new(init: F) -> Lazy<T, F>;
}
impl<T, F: FnOnce() -> T> Lazy<T, F> {
    pub fn force(this: &Lazy<T, F>) -> &T;
}
impl<T: Default> Default for Lazy<T>;
impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>;
impl<T: fmt::Debug, F> fmt::Debug for Lazy<T, F>;

// std::lazy

pub struct SyncOnceCell<T> { .. }

impl<T> SyncOnceCell<T> {
    pub const fn new() -> SyncOnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(mut self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
    fn is_initialized(&self) -> bool;
    fn initialize<F, E>(&self, f: F) -> Result<(), E> where F: FnOnce() -> Result<T, E>;
    unsafe fn get_unchecked(&self) -> &T;
    unsafe fn get_unchecked_mut(&mut self) -> &mut T;
}
impl<T> From<T> for SyncOnceCell<T>;
impl<T> Default for SyncOnceCell<T>;
impl<T: RefUnwindSafe + UnwindSafe> RefUnwindSafe for SyncOnceCell<T>;
impl<T: UnwindSafe> UnwindSafe for SyncOnceCell<T>;
impl<T: Clone> Clone for SyncOnceCell<T>;
impl<T: PartialEq> PartialEq for SyncOnceCell<T>;
impl<T: Eq> Eq for SyncOnceCell<T>;
unsafe impl<T: Sync + Send> Sync for SyncOnceCell<T>;
unsafe impl<T: Send> Send for SyncOnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for SyncOnceCell<T>;

pub struct SyncLazy<T, F = fn() -> T>;

impl<T, F> SyncLazy<T, F> {
    pub const fn new(f: F) -> SyncLazy<T, F>;
}
impl<T, F: FnOnce() -> T> SyncLazy<T, F> {
    pub fn force(this: &SyncLazy<T, F>) -> &T;
}
impl<T, F: FnOnce() -> T> Deref for SyncLazy<T, F>;
impl<T: Default> Default for SyncLazy<T>;
impl<T, F: UnwindSafe> RefUnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: RefUnwindSafe;
impl<T, F: UnwindSafe> UnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: UnwindSafe;
unsafe impl<T, F: Send> Sync for SyncLazy<T, F> where SyncOnceCell<T>: Sync;
impl<T: fmt::Debug, F> fmt::Debug for SyncLazy<T, F>;

Steps

[X] Complete the RFC process over at https://github.com/rust-lang/rfcs/pull/2788
[X] FCP https://github.com/rust-lang/rust/pull/105587#issuecomment-1367890678
[X] Stabilization PR: https://github.com/rust-lang/rust/pull/105587

Unresolved Questions

Inlined from #72414:

[X] Naming. I'm ok to just roll with the Sync prefix like SyncLazy for now, but have a personal preference for Atomic like AtomicLazy. Resolved in: https://github.com/rust-lang/rust/issues/74465#issuecomment-1098359963. Surprisingly, after more than a year of deliberation we actually found a better name.
[x] Poisoning. It seems like there's some regret around poisoning in other std::sync types that we might want to just avoid upfront for std::lazy, especially if that would align with a future std::mutex that doesn't poison. Personally, if we're adding these types to std::lazy instead of std::sync, I'd be on-board with not worrying about poisoning in std::lazy, and potentially deprecating std::sync::Once and lazy_static in favour of std::lazy down the track if it's possible, rather than attempting to replicate their behavior. cc @Amanieu @sfackler.
[x] Consider makingSyncOnceCell::get blocking. There doesn't seem to be consensus in the linked PR on whether or not that's strictly better than the non-blocking variant. (resolved in https://github.com/rust-lang/rust/issues/74465#issuecomment-663414310).
[X] Atomic Ordering. the implementation currently use Release/Acquire, but it could also use the elusive Consume ordering. Should we spec that we guarantee Release/Acquire? (resolved as yes: consume ordering is not defined enough to merit inclusion into std)
[x] Sync no_std subset. It seems plausible that we might provide some subset of SyncOnceCell in no_std. I think there's consensus that we don't want to include "blocking" parts of API, but it's unclear if non-blocking subset (get+set) would be useful. (resolved in https://github.com/rust-lang/rust/issues/74465#issuecomment-725360596).
[x] Method naming is get_or[_try]_init the best name? (resolved as yes in https://github.com/rust-lang/rust/pull/107184)

Implementation history

68198 (closed in favor of #72414)
72414 initial imlementation
74814 fixed UnwindSafe bounds

matklad commented 4 years ago

Let's cross-out the "should get be blocking?" concern. I decided against this for once_cell, for the following reasons:

it's makes Clone, Eq, Debug blocking, which is surprising
the original issue that prompted this question used Lazy, and Lazy is immune from this issue, as it always uses blocking get_or_init.

matklad commented 4 years ago

Added two more open questions from the RFC.

matklad commented 4 years ago

I've added a summary of proposed API to the issue description.

I wonder if makes sense for @rust-lang/libs to do a sort of "API review" here: this is a pretty big chunk of API, and we tried to avoid bike shedding on the RFC.

matklad commented 3 years ago

Here's an interesting use-case for non-blocking subset of OnceCell -- building cyclic data structures: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4eceeefc224cdcc719962a9a0e1f72fc

withoutboats commented 3 years ago

I strongly expect a method called get to be nonblocking. I am softly in favor of adding a wait API that blocks, but would prefer that it be added in a separate feature later based on demand.

matklad commented 3 years ago

Yeah, to be clear, there's a consensus that get should be non-blocking, the question is resolved. What is not completely solved in my mind, is where we should have core::lazy::SyncOnceCell. That's possible in theory (by only providing get and set methods), but would be hacky to implement, and of questionable usefulness. The above example is a new use-case for that thing.

m-ou-se commented 3 years ago

Naming. I'm ok to just roll with the Sync prefix like SyncLazy for now, but have a personal preference for Atomic like AtomicLazy.

I don't think Atomic would be the right word for these. Rust's Atomic types and operations (including Arc) never block and never involve the operating system's scheduler (they're all defined in core or alloc, not std). They're all directly based on the basic atomic operations supported by the processor architecture itself.

I'd expect something that's named AtomicLazy/AtomicOnceCell to do the same. And that's something that already exists as another valid strategy for certain Lazy/OnceCell-like types: Instead of blocking all but one thread when multiple threads encounter an 'empty' cell, it wouldn't block but run the initialization function on each of these threads. The first thread to finish atomically stores its initialized value in the cell, and the others simply drop() the value they created.

The std does something similar in a few places (although not wrapped in a type or publicly exposed). For example, here:

https://github.com/rust-lang/rust/blob/a835b483fe0418b48ca44afb65cd0dd6bad4eb9b/library/std/src/sys/windows/compat.rs#L65-L67

And here:

https://github.com/rust-lang/rust/blob/a835b483fe0418b48ca44afb65cd0dd6bad4eb9b/library/std/src/sys/windows/mutex.rs#L119-L133

And another example in parking_lot.

So, since this Lazy/OnceCell implementation does block (such that the initialization function can be FnOnce and the type doesn't have to fit in an atomic), and an alternative purely atomic strategy does exist, I'd really avoid using the word 'atomic' in the name here.

matklad commented 3 years ago

I've added non-blocking flavors of the primitives to the once_cell crate: https://docs.rs/once_cell/1.5.1/once_cell/race/index.html. They are restricted (can be provided only for atomic types), but are compatible with no_std.

It seems to me that "first one wins" is a better semantics if you can't block, so I am going to resolve Sync no_std subset like this:

only std supports sync module, as it requires synchronization (std::thread)
if you need something like OnceCell in no-std, your choices are
- use race module from once_cell with different API, which might or might not be upifted to std some day
- use some version based on spin locks (this risks @matklad crashing into issue tracker of your project with explanation of how pure spin locks are almost always wrong).

I've ticked this question's box.

m-ou-se commented 3 years ago

It's a bit of a shame that Lazy uses a fn() -> T by default. With that type, it needlessly stores a function pointer even if it is constant. Would it require big language changes to make it work without storing a function pointer (so, a closure as ZST), while still being as easy to use? Maybe if captureless closures would implement some kind of const Default? And some way to not have to name the full type in statics. That's probably not going to happen very soon, but it'd be a shame if this becomes possible and we can't improve Lazy because the fn() -> T version was already stabilized. Is there another way to do this?

phil-opp commented 3 years ago

@matklad

They are restricted (can be provided only for atomic types), but are compatible with no_std.

This seems like a very major restriction, which rules out most use cases of SyncLazy/SyncOnceCell. So I don't think that this really resolves the sync no_std use case.

use some version based on spin locks (this risks @matklad crashing into issue tracker of your project with explanation of how pure spin locks are almost always wrong).

I agree that spinlocks have their problems, but they're still better than using static mut instead. I understand that we don't want to hardcode SyncLazy/SyncOnceCell to use a deadlock-prone spinlock on no_std, but maybe it's possible to let the user supply their own implementation of a Mutex/Once primitive?

This could be implemented using a second generic argument on the Sync* types (or maybe even on the Mutex/Once types). This way, users could specify how the synchronization should happen based on their application. A single-threaded embedded application could just disable interrupts for the critical section, a toy OS kernel could use a spinlock, and projects with their own threading system could supply a "proper" synchronization primitive. Maybe I'm missing something, but this seems like a good solution to me.

m-ou-se commented 3 years ago

Some thoughts about &mut self functions on (Sync)OnceCell:

These types have both &mut self and &self functions, but the &mut interface seems somewhat incomplete, and it's a bit tricky to pick names for overlapping functionality. For example, take can only be done with unique access, so fn take(&mut self) -> Option<T> makes sense. But set can be done on an empty cell through a shared reference, or on a cell in any state through an unique reference. So both fn set(&self, value: T) -> Result<&T, T>; (like Cell::set) and fn set(&mut self, value: T) -> &mut T; (like Option::insert) would make sense.

Maybe if the get_or_insert/get_or_insert_with pair already provides a 'one time set' functionality, set (or insert?) should be the &mut self version instead?

matklad commented 3 years ago

Unresolved question: method naming

Currently, we have get_or_init and get_or_try_init. Are those good names? Here are some alternatives (see also https://github.com/rust-lang/rust/pull/78943)

1) get_or_init, get_or_try_init 2) get_or_insert_with, try_get_or_insert_with 3) get_with, try_get_with

1. Pro: Status Quo, name specific to OnceCell (you see x.get_or_init, you know x is one cell). Con: doesn't feel like it perfectly fits with other std names. 2. Pro: matches Option::get_or_inser_with exactly. Con: for OnceCell, unlike Option, this is the core API. It's a shame that its a mouthful. 3. Pro: short, matches std conventions. Con: _with without or suggest that the closure will be always called, but it's not really the case.

I've though more about this, and I think I actually like 3 most. It's Con seems like a Pro to me. In the typical use-case, you only use _with methods:

impl Spam {
  fn get_eggs(&self) -> &Eggs {
    self.eggs.get_with(|| Eggs::cook())
  }
}

So, the closure is sort-of always called, it's just cached. Not sure if I my explanation makes sense, but I do feel that this is different from, eg, Entry::or_insert_with.

matklad commented 3 years ago

@phil-opp: I think it is rather certain that, even if std provides a subset of OnceCell for no_std, it will be non-blocking subset (set and get).

It certainly is possible to use spinlocks, or make sync::OnceCell parametric (compile-time or run-time) over blocking primitives. I am pretty sure that should be left for crates.io crate though.

I feel one important criterion for inclusion in std is "design space has a solution with a single canonical API". OnceCell API seem canonical. If we add paramters, the design space inflates. Even if some solution would be better, it won't be obviously canonical, and would be better left to crates.io.

matklad commented 3 years ago

It's a bit of a shame that Lazy uses a fn() -> T by default.

@m-ou-se yeah, totally agree that this is a hack and feels like a hack. It works well enough in practice, but there's one gotcha: specifying type for a local lazy does not work:

let x = 92;
let works1: = Lazy::new(|| x.to_string());
let broken: Lazy<String> = Lazy::new(|| x.to_string());
let works2: Lazy<String, _> =  Lazy::new(|| x.to_string());

The broken variant is something that people occasionally write, and it fails with a somewhat confusing error. If we remove the default type, it will still be broken, but folks won't have intuition that "one parameter should be enough".

One easy way out here is to stabilize only OnceCell, and punt on Lazy for the time being. OnceCell contains all the tricky bit, and Lazy is just some syntactic sugar. For me (and probably for some, but not all, other folks) writing

fn global_state() -> &'static GlobalState {
  static INSTANCE: SyncOnceCell<GlobalState> = SyncOnceCell::new();
  INSTANCE.get_or_init(GlobalState::default)
}

doesn't feel like a deal breaker.I'd prefer that to pulling a 3rd party dep (lazy_staic or once_cell).

That said, I think Lazy's hack is worth stabilizing. Even if in the future we'll be able to write:

static GLOBAL_STATE: Lazy<GlobalState, _> = Lazy::new(GlobalState::default);

I don't see a lot of practical problems with

static GLOBAL_STATE: Lazy<GlobalState> = Lazy::new(GlobalState::default);

working as well.

nwn commented 3 years ago

Unresolved question: method naming

1. `get_or_init`, `get_or_try_init`

2. `get_or_insert_with`, `try_get_or_insert_with`

3. `get_with`, `try_get_with`

I think 1 is the most appropriate. The init terminology makes more sense than insert in the context of a once cell. Depending on whether we expose a direct value initializer, it may be more consistent to add _with to these methods, though.

I've though more about this, and I think I actually like 3 most. It's Con seems like a Pro to me. In the typical use-case, you only use _with methods:

[...]

So, the closure is sort-of always called, it's just cached. Not sure if I my explanation makes sense, but I do feel that this is different from, eg, Entry::or_insert_with.

This doesn't seem very intuitive to me and isn't always true when there are multiple points of initialization. For example, consider:

impl Spam {
    fn get_eggs(&self, cooked: bool) -> &Eggs {
        if cooked {
            self.eggs.set(Eggs::cook());
        }
        self.eggs.get_with(|| Eggs::raw())
    }
}

In this case, the closure may not run and in fact a different value has been cached. I think get_or_init_with would make this case more clear.

raphaelcohn commented 3 years ago

Something I've recently got bitten by is the need to manage which memory allocator a memory uses. I've been workign wit ha design that has a different global memory allocator when running threads or coroutines (so restricting a coroutine to a maximum amount of memory). This could be thought of as a bit of a hack; one of the long-term design decisions of early Rust that still bites is not making the memory allocator type explicit in the standard collections.

With a lazy, the challenge becomes ensuring that they're all allocated using the same memory allocator.

KodrAus commented 3 years ago

On the naming and organization question, I've found myself coming back to:

mod core {
    pub mod cell {
        pub struct OnceCell<T> {}
        pub struct LazyCell<T, F = fn() -> T> {}
    }
}

mod std {
    pub mod lazy {
        pub struct Once<T> {}
        pub struct Lazy<T, F = fn() -> T> {}
    }
}

Based on a few observations:

In my own usage I haven't needed the "unsync" versions of OnceCell or Lazy yet. Just the sync ones, so giving them the canonical name seems preferable to me.
In wanting to split up the std::sync module I find myself wanting to deprecate sync::Once in favor of lazy::Once<()>, where the poisoning behavior of sync::Once could be layered in through something like lazy::Once<Poison<()>>.
cell is the module I already go to to find non-Sync shared mutable containers

We looked at this API briefly in #68198 but didn't want to hold up landing something, so didn't really spend much time on it.

matklad commented 3 years ago

I find "give sync versions the canonical name based on usage frequency" to be a strong and compelling argument. I like std::lazy::{Once, Lazy} and I think we should do that, provided that we do want to deprecate std::sync::Once eventually.

I find cell::{OnceCell, LazyCell} relatively less compelling, using Cell suffix to mean !Sync feels like a hack. But it's not like we have an obviously better alternative.

We can consider providing only std sync versions, but I'd be against that. I think it's important not to pessimize single-threaded use-cases. I find that single-threaded shared-nothing architectures might be the way to get the most out of many-core machines, and it's Rust unique advantage to check shared-nothingness at compile time.

Does anybody want to send a PR with the reorg according to https://github.com/rust-lang/rust/issues/74465#issuecomment-763993225? It might make sense to leave deprecated aliases in place, to not make the life of nightly users more complicated.

Zenithsiz commented 3 years ago

Is implementing DerefMut or having some get_mut_or_init for non-sync Lazy a planned feature?

My use case is conditionally locking a mutex depending on a branch, making it so that it doesn't get locked if never dereferenced. Something along the lines of this:

// `value` is `Mutex<T>`.
let mut value = Lazy::new(|| value.lock().expect("Poisoned"))
match other_value {
  0 => *value += 1,
  2 => *value -= 1,
  _ => (),
}

so that I don't have to repeat value.lock().expect("Poisoned") in each branch, nor introduce a local within each match arm while not locking the mutex if we get to the _ branch.

rcls commented 3 years ago

What's the intended way to return a Lazy from a function (or store in a data-structure) without exposing the second type parameter? The best I could come up with is the fairly fugly:

#![feature(once_cell)]
use std::lazy::Lazy;
use std::ops::Deref;
pub fn strint(x: u64) -> impl Deref<Target = String> {
    Lazy::new(move || x.to_string())
}

Even just implementing Into appropriately would make this saner?

matklad commented 3 years ago

@rcls in those cases, it would probably be prudent to build off OnceCell directly, as that gives your more flexibility with respect to lifetimes. For your example, I'd do:

pub struct LazyString {
  x: u64,
  cell: OnceCell<String>,
}

impl LazyString {
  pub const fn new(x: u64) -> LazyString { LazyString { x, cell: OnceCell::new() } } 
}

impl Deref for LazyString {
  type Target = String;
  fn deref(&self) -> &String { self.cell.get_or_init(|| self.x.to_string()) }
}

SimonSapin commented 3 years ago

Filed a minor docs bug: https://github.com/rust-lang/rust/issues/85716

anka-213 commented 3 years ago

Is there any specific reason why something like into_inner doesn't/shouldn't exist for Lazy and SyncLazy, as it does for OnceCell and SyncOnceCell?

I believe something like this would work

    pub fn into_inner(self) -> T {
        match self.cell.into_inner() {
            Some(x) => x,
            None => match self.init.take() {
                Some(f) => f(),
                None => panic!("`Lazy` instance has previously been poisoned"),
            },
        }
    }

matklad commented 3 years ago

No specific reason. once_cell has into_value for that: https://docs.rs/once_cell/1.8.0/once_cell/sync/struct.Lazy.html#method.into_value, it could be added here as well.

yaahc commented 3 years ago

Unresolved question: method naming

Currently, we have get_or_init and get_or_try_init. Are those good names? Here are some alternatives (see also #78943)
1. `get_or_init`, `get_or_try_init`

2. `get_or_insert_with`, `try_get_or_insert_with`

3. `get_with`, `try_get_with`
1. Pro: Status Quo, name specific to OnceCell (you see x.get_or_init, you know x is one cell). Con: doesn't feel like it perfectly fits with other std names. 2. Pro: matches Option::get_or_inser_with exactly. Con: for OnceCell, unlike Option, this is the core API. It's a shame that its a mouthful. 3. Pro: short, matches std conventions. Con: _with without or suggest that the closure will be always called, but it's not really the case.

I've though more about this, and I think I actually like 3 most. It's Con seems like a Pro to me. In the typical use-case, you only use _with methods:
impl Spam {
  fn get_eggs(&self) -> &Eggs {
    self.eggs.get_with(|| Eggs::cook())
  }
}
So, the closure is sort-of always called, it's just cached. Not sure if I my explanation makes sense, but I do feel that this is different from, eg, Entry::or_insert_with.

I definitely favor 3 as well. IMO the OnceCell itself implies the fact that it's only called once, so OnceCell::get_with still communicates the same semantics that *_or_* methods do on other types.

yaahc commented 3 years ago

Summarizing some of the backlog related to open issues:

Mara: Atomic doesn't feel appropriate. (source)
Mara: Can we change Lazy so it doesn't unconditionally store a function pointer? (source)
phil-opp: Could the no-std usecase for Lazy be improved by adding a generic parameter for the mutex type? (source)
Mara: is OnceCell::set equivalent to Cell::set or to Option::insert and based on the analogous interface, should it take &mut self or &self? (source)

Didn't get as far through the backlog as I wanted so some of these may already be resolved by later comments.

programmerjake commented 3 years ago

So both fn set(&self, value: T) -> Result<&T, T>; (like Cell::set) and fn set(&mut self, value: T) -> &mut T; (like Option::insert) would make sense.

imho we should have set take &self, the &mut self variant can instead be spelled *self = OnceCell::from(value).

WaffleLapkin commented 3 years ago

@matklad Is the rename/move proposed in https://github.com/rust-lang/rust/issues/74465#issuecomment-763993225 still considered? https://github.com/rust-lang/rust/issues/74465#issuecomment-783294558 asked if someone could make a PR and I don't see anyone responding to this. I'd like to make a PR, if it's still an option

Currently, the set methods look like this

pub fn set(&self, value: T) -> Result<(), T> {}

I'm a bit surprised that it doesn't return a reference to the value (either old or just set) like Option::insert, {HashMap,BTreeMap}::try_insert.

Maybe we could change set to one of the following signatures?

1)

   pub fn set(&self, value: T) -> Result<&T, (&T, T)> {}

2)

   // maybe poorly named, just an example
   pub struct AlreadySet<'a, T> { 
       pub already_set_to: &'a T,
       pub value: T,
   }
   pub fn set(&self, value: T) -> Result<&T, AlreadySet<'_, T>> {}

3)

   pub fn set(&self, value: T) -> (&T, Result<(), T>) {}

4)

   // maybe poorly named, just an example
   pub enum SetResult<'a, T> { Set(&'a T), WasSet(&'a T, T) }
   pub fn set(&self, value: T) -> SetResult<'_, T>);

That would allow using the value immediately after setting it without a need to unwrap:

// Currently unwrap is needed
cell.set(value);
let r = cell.get().unwrap();

KodrAus commented 2 years ago

Things get buried in discussion threads so I think it’s worth restating that I think any stabilization plan for what’s currently called SyncOnceCell that doesn’t consider deprecation of the existing sync::Once as an end state is going to leave us in a confusing place where we have multiple APIs for doing the same thing with their differences buried mostly in trivia. I think that should factor into naming and organization.

SimonSapin commented 2 years ago

Can we do library API deprecation such that usage only emits warnings in a new edition?

sync::Once has been the only way to do some things for a long time. Directing new code to use SyncOnceCell instead is good, but pushing existing code to migrate can feel like churn for not much benefit.

KodrAus commented 2 years ago

pushing existing code to migrate can feel like churn for not much benefit.

That’s a fair point. Maybe for a start we could just do a “soft deprecation” where the docs for sync::Once suggest you use SyncOnceCell instead.

vultix commented 2 years ago

Tokio provides a Future based OnceCell type. This might be entirely out of scope, but is there any way the standard OnceCell and Lazy types might support both synchronous and asynchronous operations?

I imagine in the future we could add a signature similar to this:

pub async fn get_or_init_async<F, Fut>(&self, f: F) -> &T
    where
        F: FnOnce() -> Fut,
        Fut: Future<Output = T>,
{}

Zenithsiz commented 2 years ago

@vultix Wouldn't async { value.get_or_init(f().await) } be equivalent? Both would have type impl Future<Item = &T>.

Edit: Nevermind, I see now that the tokio impl only calls f when not initialized and locked, so it wouldn't be equivalent.

vultix commented 2 years ago

@Zenithsiz Your edit is exactly correct. It's important the initializer is only called once so you can use get_or_init_async to cache slow operations you only want to happen once, such as initializing a DB connection

ghost commented 2 years ago

Is there any update on when this feature will be pushed into stable?

cdecompilador commented 2 years ago

It would be great to have the option to reset the Lazy to its initial state (evaluating the initialization again) and (deinitialize) for example, here is a use case:

const FILE: SyncLazy<&'static str> = SyncLazy::new(|| {
        if cfg!(debug_assertions) {
            let buf = std::fs::read_to_string("test.txt").unwrap();
            Box::leak(buf.into_boxed_str())
        } else {
            include_str!("test.txt")
        }
});

Later in the code

// Some lets say http server loop
loop {
    // ...
    if cfg!(debug_assertions) {
        FILE.reset();
    }
    // ...
}

This would allow in debug build to have live file reloading for test.txt while in release build the file is completely static.

EFanZh commented 2 years ago

@cdecompilador The reference a global SyncLazy gives has 'static lifetime, which means it should always be valid, so you should never destroy a global SyncLazy value after creation. For example:

#![feature(once_cell)]

use std::lazy::SyncLazy;

static X: SyncLazy<i32> = SyncLazy::new(|| 7);
static Y: SyncLazy<&'static i32> = SyncLazy::new(|| &X);

fn main() {
    let x_ref: &'static i32 = *Y;

    // <-- If you reset `X` here, `x_ref` will be invalidated.

    dbg!(x_ref);
}

You can’t reset X because Y should always be valid.

GutsTang commented 2 years ago

Is there any opportunity to implement the Copy trait for Lazy ？ When I tried to use Lazy in a const variable, the closure inside was called multiple times. This behavior is extremely counter-intuitive.

#![feature(once_cell)]
use rand::Rng; // 0.8.4
use std::lazy::Lazy;

const CONST_LAZY: Lazy<i32> = Lazy::new(|| rand::thread_rng().gen::<i32>());

fn main() {
    let local_lazy: Lazy<i32> = Lazy::new(|| rand::thread_rng().gen::<i32>());
    println!("{}", *local_lazy); // -1475423855
    println!("{}", *local_lazy); // -1475423855

    println!("{}", *CONST_LAZY); // 1975106939
    println!("{}", *CONST_LAZY); // -1848043613
}

jRimbault commented 2 years ago

@GutsTang this is expected behavior, what you want is static. const means each instance is the same compile time value, think of it almost like #define N 1.

What you wrote would be equivalent to :

println!("{}", *Lazy::new(|| rand::thread_rng().gen::<i32>()));
println!("{}", *Lazy::new(|| rand::thread_rng().gen::<i32>()));

It doesn't have to do with Copy.

GutsTang commented 2 years ago

@JRimbault Thanks. I'm always confused about the const variable in Rust.

ayosec commented 2 years ago

this is expected behavior, what you want is static. const means each instance is the same compile time value, think of it almost like #define N 1.

Maybe it would be useful to have a Clippy check for this. I guess that something like std::lazy::Lazy should never be stored in a const value.

Nemo157 commented 2 years ago

There is an existing default-warn clippy lint for this:

warning: a `const` item should never be interior mutable
 --> src/lib.rs:2:1
  |
2 | pub const LAZY: std::lazy::Lazy<i32> = std::lazy::Lazy::new(|| 5);
  | -----^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | |
  | make this a static item (maybe with lazy_static)
  |
  = note: `#[warn(clippy::declare_interior_mutable_const)]` on by default
  = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#declare_interior_mutable_const

TopologicallySpeaking commented 2 years ago

I'm a bit surprised that it doesn't return a reference to the value (either old or just set) like Option::insert, {HashMap,BTreeMap}::try_insert.

Maybe we could change set to one of the following signatures?
1. ```rust
   pub fn set(&self, value: T) -> Result<&T, (&T, T)> {}
   ```

2. ```rust
   // maybe poorly named, just an example
   pub struct AlreadySet<'a, T> { 
       pub already_set_to: &'a T,
       pub value: T,
   }
   pub fn set(&self, value: T) -> Result<&'a T, AlreadySet<'_, T>> {}
   ```

3. ```rust
   pub fn set(&self, value: T) -> (&'a T, Result<(), T>) {}
   ```

4. ```rust
   // maybe poorly named, just an example
   pub enum SetResult<'a, T> { Set(&'a T), WasSet(&'a T, T) }
   pub fn set(&self, value: T) -> (&'a T, Result<(), T>) {}
   ```
That would allow using the value immediately after setting it without a need to unwrap:
// Currently unwrap is needed
cell.set(value);
let r = cell.get().unwrap();

I strongly agree. I'm currently writing a builder pattern, the builder has a number of items which need to be set one after another, each one depends on the last, and there are multiple references to the builder so I can't mutably borrow it. That's exactly what OnceCell is designed for. I should be able to do something roughly like this:

fn build(&self) -> BuiltType {
    let item1 = self
        .item1
        .set(Item1::new(self.info))
        .unwrap();

    self.item2
        .set(Item2::new(item1))
        .unwrap();

    self.finish()
}

Option 1 and 2 are the only ones you listed which would work for that purpose. I'd prefer the second one, as it's more explicit about the semantics.

fogti commented 2 years ago

@TopologicallySpeaking huh, but I think you could also work with the third option, e.g.

fn build(&self) -> BuiltType {
    let (item1, y) = self
        .item1
        .set(Item1::new(self.info));
    y.unwrap();

    self.item2
        .set(Item2::new(item1))
        .unwrap();

    self.finish()
}

ghost commented 2 years ago

Not sure if this has been discussed, but I think OnceCell should potentially belong in std::cell rather than std::lazy, since its use-case can be more general than for laziness. Similarly, perhaps SyncOnceCell should belong in either std::cell or std::sync.

fogti commented 2 years ago

@DefinitelyNotRobot If OnceCell belongs in std::cell, then SyncOnceCell belongs in std::sync, like Rc/Arc.

andylokandy commented 2 years ago

Is OnceCell safe in static considering multiple threads will try to initialize it? I've seen some usage like that, for example: https://github.com/metrics-rs/quanta/blob/fbf383a33d7836d7303dd3aa8d9627e17cd613da/src/lib.rs#L170

SimonSapin commented 2 years ago

@andylokandy There are two variants of OnceCell. One of them implements the Sync trait, which means it is safe to access from multiple threads. (The language won’t let you make a static item of a !Sync type.) Both variants exist because this extra synchronization has some cost. In your example, note the import use once_cell::sync::OnceCell; instead of once_cell::unsync::OnceCell.

Person-93 commented 2 years ago

Proposed addition to the API for OnceCell: initialization that returns a mutable reference.

// regular initialization
let mut cell = OnceCell::new();
let n: &mut i32 = cell.get_mut_with(|| 42);

// fallible initialization
let mut cell = OnceCell::new();
let n: &mut i32 = cell.try_get_mut_with(|| {
 if all_is_well {
    Ok(42)
  } else {
    Err("oh no!")
  }
})?;

NOTE: I'm not sure about the method names.

SimonSapin commented 2 years ago

If you have a &mut OnceCell<T> during initialization why would you need OnceCell at all? If you only have &OnceCell<T>, any method that returns &mut T from that would be unsound. Only &T ever being accessible through &OnceCell<T> is a basic principle of OnceCell.

rust-lang / rust