rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.97k stars 12.53k forks source link

Tracking Issue for `once_cell` #74465

Closed KodrAus closed 1 year ago

KodrAus commented 4 years ago

This is a tracking issue for the RFC "standard lazy types" (rust-lang/rfcs#2788). The feature gate for the issue is #![feature(once_cell)].

Unstable API

// core::lazy

pub struct OnceCell<T> { .. }

impl<T> OnceCell<T> {
    pub const fn new() -> OnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
}
impl<T> From<T> for OnceCell<T>;
impl<T> Default for OnceCell<T>;
impl<T: Clone> Clone for OnceCell<T>;
impl<T: PartialEq> PartialEq for OnceCell<T>;
impl<T: Eq> Eq for OnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for OnceCell<T>;

pub struct Lazy<T, F = fn() -> T> { .. }

impl<T, F> Lazy<T, F> {
    pub const fn new(init: F) -> Lazy<T, F>;
}
impl<T, F: FnOnce() -> T> Lazy<T, F> {
    pub fn force(this: &Lazy<T, F>) -> &T;
}
impl<T: Default> Default for Lazy<T>;
impl<T, F: FnOnce() -> T> Deref for Lazy<T, F>;
impl<T: fmt::Debug, F> fmt::Debug for Lazy<T, F>;

// std::lazy

pub struct SyncOnceCell<T> { .. }

impl<T> SyncOnceCell<T> {
    pub const fn new() -> SyncOnceCell<T>;
    pub fn get(&self) -> Option<&T>;
    pub fn get_mut(&mut self) -> Option<&mut T>;
    pub fn set(&self, value: T) -> Result<(), T>;
    pub fn get_or_init<F>(&self, f: F) -> &T where F: FnOnce() -> T;
    pub fn get_or_try_init<F, E>(&self, f: F) -> Result<&T, E> where F: FnOnce() -> Result<T, E>;
    pub fn into_inner(mut self) -> Option<T>;
    pub fn take(&mut self) -> Option<T>;
    fn is_initialized(&self) -> bool;
    fn initialize<F, E>(&self, f: F) -> Result<(), E> where F: FnOnce() -> Result<T, E>;
    unsafe fn get_unchecked(&self) -> &T;
    unsafe fn get_unchecked_mut(&mut self) -> &mut T;
}
impl<T> From<T> for SyncOnceCell<T>;
impl<T> Default for SyncOnceCell<T>;
impl<T: RefUnwindSafe + UnwindSafe> RefUnwindSafe for SyncOnceCell<T>;
impl<T: UnwindSafe> UnwindSafe for SyncOnceCell<T>;
impl<T: Clone> Clone for SyncOnceCell<T>;
impl<T: PartialEq> PartialEq for SyncOnceCell<T>;
impl<T: Eq> Eq for SyncOnceCell<T>;
unsafe impl<T: Sync + Send> Sync for SyncOnceCell<T>;
unsafe impl<T: Send> Send for SyncOnceCell<T>;
impl<T: fmt::Debug> fmt::Debug for SyncOnceCell<T>;

pub struct SyncLazy<T, F = fn() -> T>;

impl<T, F> SyncLazy<T, F> {
    pub const fn new(f: F) -> SyncLazy<T, F>;
}
impl<T, F: FnOnce() -> T> SyncLazy<T, F> {
    pub fn force(this: &SyncLazy<T, F>) -> &T;
}
impl<T, F: FnOnce() -> T> Deref for SyncLazy<T, F>;
impl<T: Default> Default for SyncLazy<T>;
impl<T, F: UnwindSafe> RefUnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: RefUnwindSafe;
impl<T, F: UnwindSafe> UnwindSafe for SyncLazy<T, F> where SyncOnceCell<T>: UnwindSafe;
unsafe impl<T, F: Send> Sync for SyncLazy<T, F> where SyncOnceCell<T>: Sync;
impl<T: fmt::Debug, F> fmt::Debug for SyncLazy<T, F>;

Steps

Unresolved Questions

Inlined from #72414:

Implementation history

Person-93 commented 2 years ago

If you have a &mut OnceCell<T> during initialization why would you need OnceCell at all? If you only have &OnceCell<T>, any method that returns &mut T from that would be unsound. Only &T ever being accessible through &OnceCell<T> is a basic principle of OnceCell.

My use case is where initialization can be expensive so it should only happen once but updates can happen often and are usually pretty cheap.

SimonSapin commented 2 years ago

If there’s shared access and there can be any update after initialization, OnceCell does not work. That’s why it’s called "once". You need a RefCell or a Mutex or a RwLock instead.

Allowing mutation after initialization in OnceCell would make it unsound. For example:


let mut cell = OnceCell::new();
let a: &mut Option<String> = cell.get_mut_with(|| Some(String::from("foo")));
let b: &Option<String> = cell.get(); // `&T` aliases `&mut T`
let c: &str = b.as_deref().unwrap()
*a = None; // String is deallocated
println!("{c}") // Use-after-free
Shadlock0133 commented 2 years ago

It is possible to implement it completely outside of std:

fn get_mut_with<T, F: FnOnce() -> T>(cell: &mut OnceCell<T>, f: F) -> &mut T {
    cell.get_or_init(f);
    cell.get_mut().unwrap()
}

It's safe, because we're taking exclusive reference here.

But I still don't know if it's that useful.

Person-93 commented 2 years ago

If there’s shared access and there can be any update after initialization, OnceCell does not work. That’s why it’s called "once". You need a RefCell or a Mutex or a RwLock instead.

Allowing mutation after initialization in OnceCell would make it unsound. For example:

let mut cell = OnceCell::new();
let a: &mut Option<String> = cell.get_mut_with(|| Some(String::from("foo")));
let b: &Option<String> = cell.get(); // `&T` aliases `&mut T`
let c: &str = b.as_deref().unwrap()
*a = None; // String is deallocated
println!("{c}") // Use-after-free

The call to cell.get() to assign to b wouldn't work because cell is mutably borrowed.

SimonSapin commented 2 years ago

So get_mut_with would take &mut self? But if you can afford exclusive access for initialization, why use a cell at all?


// regular initialization
let mut value;
value = 42;
let n: &mut i32 = &mut value;

// fallible initialization
let mut value;
if all_is_well {
    value = 42;
} else {
    return Err("oh no!")
}
let n: &mut i32 = &mut value;
Person-93 commented 2 years ago

So get_mut_with would take &mut self? But if you can afford exclusive access for initialization, why use a cell at all?

If initialization is expensive or not idempotent, I want to ensure that it only happens once.

matklad commented 2 years ago

@Person-93 perhaps you are looking for https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.get_or_insert_with? Otherwise, I'd say https://github.com/rust-lang/rust/issues/74465#issuecomment-1060033519 is the way to go.

As a policy question, I would probably suggest to not discuss additional convenience methods in this issue, but rather direct issue requests at once_cell crate.

Here, I feel we can benefit most from focusing on the minimal useful subset we can stabilize first. After we have a stable base, it would be much more efficient to open follow-up issues/pr expanding the API surface.

matklad commented 2 years ago

I ... have a new naming proposal:

I think I really like OnceLock:

m-ou-se commented 2 years ago

That sounds great. I'm wondering if this means they shouldn't live in the {core, std}::lazy module, but instead in core::cell and std::sync.

bstrie commented 2 years ago

Agreed, I like it. (And I am getting to this, just slowly! Someone with more free time is welcome to beat me to it, of course. :) )

I will say that std::sync feels like sort of a miscellaneous grab bag, but std::sync::OnceLock doesn't feel terrible.

cuviper commented 2 years ago

It might be confusing to have std::sync::{Once, OnceLock} next to each other, because the difference is not really obvious from their names alone. At the very least, docs should anticipate this confusion and try to clarify their roles -- OnceLock is roughly just (Once, T).

cuviper commented 2 years ago

A small point against the OnceLock name is that you don't have guards like Mutex or RwLock. In that sense, you never actually "lock" the data inside.

cuviper commented 2 years ago

Alternate name idea: InitLock -- as in, only initialization is actually locked, and then it has unguarded access.

(I kind of like InitCell for the !Sync version too... These are data wrappers, but the only thing that's "once" about that data is its initialization, so it makes some sense to make that more direct in the name.)

KodrAus commented 2 years ago

@cuviper Another way to look at it is that Once is OnceLock<()>, which is my personal preference because it means we can deprecate Once altogether.

SimonSapin commented 2 years ago

If you mean soft-deprecation in docs, why not. But making Once emit a deprecation warning to push its existing users to migrate to OnceLock would create churn. For what benefit? Since Once is #[stable] so we can’t actually remove it.

yaahc commented 2 years ago

If you mean soft-deprecation in docs, why not. But making Once emit a deprecation warning to push its existing users to migrate to OnceLock would create churn. For what benefit? Since Once is #[stable] so we can’t actually remove it.

I know we can't really remove it from the compiled library, but I still do like the idea of eventually removing the ability to call deprecated APIs in newer editions. The benefit in my mind being keeping the language surface area as small as possible. I think there is a significant cost in having multiple ways to do the same thing, if people can use Once or OnceLock<()> to do the exact same thing, it benefits them for us to push everyone into using the newer more powerful API so you don't end up with people learning old dialects and having to eventually unlearn the old API in favor of the new one once they eventually discover it and become aware of the similarities.

I wouldn't want to introduce this churn carelessly, we'd need to have mechanical ways to update code when they move to the new edition so the only churn that happens is in the diff rather than requiring repetitive manual updates across the entire ecosystem. But between a small amount of diff churn and having to endlessly maintain two ways to do the same thing, I'd choose the diff churn and the smaller library API every time.

nbdd0121 commented 2 years ago

How about making std::sync::Once Once<T = ()>?

cuviper commented 2 years ago

But between a small amount of diff churn and having to endlessly maintain two ways to do the same thing, I'd choose the diff churn and the smaller library API every time.

We still do have to maintain the old stuff, even if it's inaccessible from newer editions.

bjorn3 commented 2 years ago

How about making std::sync::Once Once<T = ()>?

That is unfortunately a breaking change. Generic param defaults only work in some cases but not others. Don't remember exactly when they do and when they don't work though.

SimonSapin commented 2 years ago

If deprecation is tied to an edition and there’s a rustfix migration, that sounds fine. It’s new deprecation warnings in existing code just by upgrading the toolchain that I feel are probably not worth the unexpected churn if it’s only to simplify std learnability.

nbdd0121 commented 2 years ago

That is unfortunately a breaking change. Generic param defaults only work in some cases but not others. Don't remember exactly when they do and when they don't work though.

It wouldn't break anything if we just change the struct, but we can't change new from impl Once to impl<T> Once<T> because default type parameters won't affect type param fallback.

John-Nagle commented 2 years ago

What's the status on this? I wanted to use it for something that generates strings for internationalization, but it's still unstable. Is this going to become stable soon?

matklad commented 2 years ago

This is blocked on implementation work. I feel this can become stable fairly soon, if this issue gets someone who'd champion remaining implementation&stabilization work. So far this didn't happen.

djc commented 2 years ago

(But you can use the once_cell crate in the mean time?)

John-Nagle commented 2 years ago

(But you can use the once_cell crate in the mean time?)

True. The use case involves Egui, the menu system often used for games. On every frame, it goes through the items it needs to display. Those are &str parameters. If I add internationalization, there's a lookup for every menu item on every frame. Hence the need for some kind of memoization.

WaffleLapkin commented 2 years ago

This is blocked on implementation work.

Can someone outline what implementation work is needed? This issue has quite a few of comments and it's somewhat hard to get what is needed to be done :')

matklad commented 2 years ago

I'd start with the following plan, adjusting according to reality:

  1. Do the rename proposed in https://github.com/rust-lang/rust/issues/74465#issuecomment-1095665053, that is
mod cell {
    OnceCell,
    CellLazy,
}
mod sync {
    OnceLock,
    LockLazy,
}
  1. Re-read the disucssion here and on the RFC to make sure nothing is missed.

  2. Take a critical look at the naming, signatures and semantics of OnceX methods, update the summary comment in the tracking issue.

  3. Take the minimal orthogonal set of methods of OnceX, start FCP to stabilize thouse under base_once_cell feature flag

  4. After OnceX types are stabilized, move forward with stabilizing Lazy types (I'd expect that to need a bit more bikeshed due to less clear naming and objectonable default parameter trick).

Throughout: keep an eye on what's the current blocked, poke relevant people and otherwise ensure that the work doesn't get stuck.

MatrixDev commented 2 years ago

Why CellLazy instead of LazyCell (same for lock)? It is harder to pronounce (triple consecutive L) and also all other cells are suffixes instead of prefixes.

metasim commented 2 years ago

Why CellLazy instead of LazyCell (same for lock)?

LazyCell is also more coherent with RefCell....

mqudsi commented 2 years ago

From a usability/ergonomics perspective, is there any chance of adding a set(value: T) method to Lazy<T> that would be used if-and-only-if the lazy value hasn't already been initialized with the constructor-provided lazy init method (panicking/failing otherwise)?

The idea is that you could specify a default value and override it "in-place" rather than needing to use a OnceCell<T> and a separate mut T instance (or a third, default const T instance) to hold the default value with the option for a user/environment/runtime-provided alternative taking its place, while still only initializing once.

e.g. in lieu of the following:

static DEST_PORT OnceCell<u16> = OnceCell::new();

pub fn main() {
    // This is the default value, used if no alternative is provided
    let mut dest_port: u16 = 1024;
    ...
    if let Some(arg) = std::env::args().skip(1).next() {
        dest_port = arg.parse().expect("Expected a valid u16 port number as the argument");
    }

    DEST_PORT.set(dest_port).unwrap();
}

you would be able to do something like this:

// This is the default value, used if no alternative is provided
static DEST_PORT Lazy<u16> = Lazy::new(|| 1024u16);

pub fn main() {
    if let Some(arg) = std::env::args().skip(1).next() {
        let dest_port = arg.parse().expect("Expected a valid u16 port number as the argument");
        DEST_PORT.set(dest_port).unwrap();
    }

    // At some later point:
    let foo: u16 = *DEST_PORT;
}
mbartlett21 commented 2 years ago

@KodrAus

Can you update the initial post with the updated API please?

ranile commented 2 years ago

Naming. I'm ok to just roll with the Sync prefix like SyncLazy for now, but have a personal preference for Atomic like AtomicLazy.

I would vote in favor of sync module. core::lazy has the unsync implementation and core::lazy::sync has the sync ones. This is also consistent with the naming used for Arc, Mutex, etc -- those are also in a module named sync

Iron-E commented 2 years ago

@mqudsi you could do something like this:

static DEST_PORT Lazy<u16> = Lazy::new(||
  match std::env::args().skip(1).next() {
    Some(arg) => arg.parse().expect("Expected a valid u16 port number as the argument"),
    _ => 1024,
  }
);

pub fn main() {
  let port = *DEST_PORT;
}

Edit: had a better idea

mqudsi commented 2 years ago

@Iron-E yes, it's possible directly with OnceCell but you're back to using a helper function to wrap the get_or_init() logic, which is what Lazy<T> is supposed to help avoid.

Iron-E commented 2 years ago

See my edit, I had a better idea

mqudsi commented 2 years ago
static DEST_PORT Lazy<u16> = Lazy::new(||
    match std::env::args().skip(1).next() {
        Some(arg) => arg.parse().expect("Expected a valid u16 port number as the argument"),
        _ => 1024,
    }
);

Thanks, but that's frankly not a better idea at all. The example was contrived, in practice you'll have a full command line arg handler processing perhaps dozens of switches, with proper error handling and early abort w/ usage information. The solution isn't to move everything into the Lazy closure; you normally want as little in there as you can get away with.

Iron-E commented 2 years ago

in practice you'll have a full command line arg handler processing perhaps dozens of switches, with proper error handling and early abort w/ usage information.

True… but if you have so many arguments it's probably better to use something like clap::Parser anyway. Lazy::new(|| Args::parse()) could probably do whatever is needed and more.

I don't see any issue with Lazy::set fwiw (not that my opinion is that important!), I just wanted to show how one could already accomplish similar effects by moving some code around.

The solution isn't to move everything into the Lazy closure; you normally want as little in there as you can get away with.

Agree to disagree I suppose. The two main reasons I can see that one would want something to be Lazy is that you either:

  1. Might eventually need a value that is somewhat expensive to initialize.
  2. Need a static that depends on runtime computation (e.g. a String).

Putting as much work in the Lazy as possible is better for the first case, but is not strictly better for the latter.

rtrevinnoc commented 2 years ago

Hello :) I am sorry if this is not the right place to post this, but could you help me figure out what is the current API? Importing Lazy as such seems to not work anymore. For reference, I am using rustc 1.64.0-nightly. Thank you very much beforehand

#![feature(once_cell)]
use std::lazy::Lazy;
rusty-snake commented 2 years ago

Read the comments above or look at the documentation (it has a search feature).

std::lazy::Lazy got renamed and moved to std::cell::LazyCell.

Iron-E commented 2 years ago

When you search the docs it brings up the former :)

rusty-snake commented 2 years ago

The docs you linked say

Version 1.62.0 (a8314ef7d 2022-06-27)

If you use a nightly compiler (for nightly features) you need nightly docs.

rtrevinnoc commented 2 years ago

Thank you very much, I couldn't find it. I'm so sorry for the disturbance, I should've looked at the nightly docs

kennystrawnmusic commented 2 years ago

Attempting to use core::lazy::OnceCell in a kernel project is throwing an “unresolved import” error despite the documentation stating that it’s in the core library, forcing me to fall back on the conquer_once implementation at the moment — any idea when this will get fixed?

mbartlett21 commented 2 years ago

@kennystrawnmusic If you look a few comments up, you will see that it was renamed to core::cell::LazyCell. Since you are running nightly, you can either look at your local docs or the nightly docs on doc.rust-lang.org to find that it is there.

kennystrawnmusic commented 2 years ago

@kennystrawnmusic

If you look a few comments up, you will see that it was renamed to core::cell::LazyCell.

Since you are running nightly, you can either look at your local docs or the nightly docs on doc.rust-lang.org to find that it is there.

That's what core::lazy::Lazy got renamed to but I didn't know that OnceCell also got moved to core::cell. For some weird reason though the OnceCell implementation in the core library doesn't like being used in constants since it still uses UnsafeCell behind the scenes, thus still forcing me to use conquer_once::spin::OnceCell for my use case (namely, initializing my very own printk crate) for the time being. If there's a way to make core::cell::OnceCell as thread-safe as conquer_once::spin::OnceCell I'd like to know how.

not-wlan commented 2 years ago

FYI: SyncLazy can be found in std::sync::LazyLock now

m-ou-se commented 2 years ago

@kennystrawnmusic The problem is that in no_std there's no way to block (other than spinlooping), so anything that blocks, like OnceLock::get_or_init, can't exist in core.

A non-blocking but Sync version of a oncecell/oncelock is one that doesn't block, but lets multiple threads race for initialization, like this: https://docs.rs/once_cell/latest/once_cell/race/index.html

nrc commented 2 years ago

ISTM that since OnceCell has a take method which resets it back to the initialised state, it can be set multiple times and therefore 'once' in the name is misleading. Am I missing something about the once-ness?

matklad commented 2 years ago

I think this is just a special case of the more general phenomenon of "nothing is immutable, b/c you can always overwrite the variable itself". This is true of the Once we already have:

fn main() {
    let mut x = std::sync::Once::new();
    x.call_once(|| println!("hello"));
    x = std::sync::Once::new();
    x.call_once(|| println!("hello"));
}

This is also similar to the docs for Mutex, which currently say

The data can only be accessed through the RAII guards returned from lock and try_lock, which guarantees that the data is only ever accessed when the mutex is locked.

This is the right intuition, but also a lie, because you can do whatever if you have &mut Mutex.

Admittedly, this is a common question which comes up from time to time https://github.com/matklad/once_cell/issues/153.

My suggestion would be to stick with Once terminology, as it builds the right intuition, and put the details to the &mut taking methods.

mbartlett21 commented 1 year ago

Should the set method return Result<&T, T> rather than Result<(), T>?