matklad / once_cell

Rust library for single assignment cells and lazy statics without macros
Apache License 2.0
1.84k stars 110 forks source link

💡 discussion: `lazy!`/`once!` macros #189

Closed CAD97 closed 2 years ago

CAD97 commented 2 years ago

The main disadvantage of the Lazy type over lazy_static! is that Lazy<T> necessarily stores and calls fn() -> T rather than impl FnOnce() -> T. This means that any inlining of the initialization must happen via devirtualization, rather than coming for free with monomorphization.

On the other hand, perhaps this is an advantage; initialization is #[cold] and only called once, so inlining the initialization codepath may serve more to make inlining callers more costly than to make initialization less expensive.

What doesn't suffer from the (perceived?) virtualization cost is the fn get_global() -> &'static Global pattern using OnceCell directly rather than through Lazy. Using a macro interface, we can make this pattern more accessible.

Presented for consideration, a potential implementation:

```rust #![feature(type_alias_impl_trait)] pub use once_cell::sync::*; #[doc(hidden)] pub use std::{ops::Deref, marker::Sized}; pub trait LazyStatic: Deref { /// Force initialization of this lazy static. /// /// Calling this function is a hint that this is called once /// during startup, and could benefit from inlining. /// /// For normal usage, use [`Deref`][::deref] instead; /// `deref` hints that the initialization path is `#[cold]`. #[inline] #[allow(unused_attributes)] // documentation fn force(this: &Self) -> &'static Self::Target; } #[macro_export] macro_rules! __once_cell__sync__lazy { // item position lazy! { vis static NAME: Ty = expr; } // expands to: // vis static NAME: impl LazyStatic; { $(#[$meta:meta])* $vis:vis static $STATIC:ident: $Type:ty = $build:expr $(; $($continue:tt)*)? } => { // To avoid TAIT, expose the Impl type instead. type $STATIC = impl $crate::LazyStatic; $(#[$meta])* $vis static $STATIC: $STATIC = { use $crate::LazyStatic as _; static ONCE: $crate::OnceCell<$Type> = $crate::OnceCell::new(); struct Impl; impl $crate::LazyStatic for Impl { #[inline] fn force(_this: &Self) -> &'static $Type { ONCE.get_or_init(|| $build) } } impl $crate::Deref for Impl { type Target = $Type; #[inline] fn deref(&self) -> &'static $Type { #[cold] fn init(this: &Impl) -> &'static $Type { Impl::force(this) } ONCE.get().unwrap_or_else(|| init(self)) } } Impl }; // continue processing multiple items $($crate::lazy! { $($continue)* })? }; // item position lazy! { vis fn name() -> Ty { block_content } // expands to: // vis fn name() -> Ty { once! { block_content } } { $(#[$meta:meta])* $vis:vis fn $name:ident() -> &'static $Ret:ty { $($content:tt)* } $($($continue:tt)+)? } => { $(#[$meta])* $vis fn $name() -> &'static $Ret { // inlining once! can avoid TAIT since we know $Ret $crate::once! { $($content)* } } // continue processing multiple items $($crate::lazy! { $($continue)+ })? }; // empty {} => {}; } pub use __once_cell__sync__lazy as lazy; #[macro_export] macro_rules! __once_cell__sync__once { // expression position once!( block_content ) // expands to: // static ONCE: OnceCell<_>; ONCE.get_or_init(|| block_content) ( $($content:tt)* ) => {{ // TAIT is transparent to the calling scope, but // only if the it ascribes the type at the callsite. #[allow(nonstandard_style)] type NEEDS_TYPE_ANNOTATION = impl $crate::Sized; static ONCE: $crate::OnceCell = $crate::OnceCell::new(); ONCE.get_or_init(|| { $($content)* }) }}; } pub use __once_cell__sync__once as once; ``` Actually... perhaps `lazy! { fn }` should actually be `memoize!` or similar, since that matches the semantics (take the output and cache it) moreso than `lazy!`. But this is how the proof of concept went. Bonus far-future example: if we get function assignment, it would be cool to write ```rust fn make() -> T; fn get_global() -> &'static T = once!(make()); ``` instead of `lazy! { fn }`.

I understand that part of the draw of once_cell is the macroless API, but providing optional macros with straightforward documented translation[^1] can help as a documentation point for common-to-the-point-of-generalization patterns for using the API.

[^1]: Straightforward in the expository documentation, even if the actual expansion is more complicated. macro_rules! always have to do interesting things to achieve proper hygiene when defining items.

It probably makes more sense to keep the simple API for the time being (e.g. because the nice impls I provide rely on TAIT in order to be properly nice), but if/when std's version of OnceCell is available, it might make sense to wrap some common patterns into simple macros in this (or another) crate.

matklad commented 2 years ago

On the other hand, perhaps this is an advantage; initialization is #[cold] and only called once, so inlining the initialization codepath may serve more to make inlining callers more costly than to make initialization less expensive.

Initialization is also dyn, so some devirtualization has to happen:

https://github.com/matklad/once_cell/blob/eda22cec55e9d37b16d408b7d0a5b396c7feb44c/src/imp_std.rs#L194

Another problem with the macro is that introduces a fresh type per lazy instance, (which also leaks into the API).

So, I don't think this shold be included in the library, or in std.

But it might be a good fit for the recipes section of the docs!

https://github.com/matklad/once_cell/blob/master/src/lib.rs#L27

notgull commented 2 years ago

I agree with matklad here. A Lazy storing anything aside from a fn() -> T is a code smell that indicates it should be replaced with a OnceCell.

CAD97 commented 2 years ago

I think I'm in agreement: the patterns implemented by these macros can be documented in the docs and that's sufficient to close this "issue." Providing them as macros is not really necessary (especially given the reliance on TAIT for niceness; having static GLOBAL: impl Lazy<Resource>; I think is fine, whereas a custom type is significantly heavier and less nice).


A Lazy storing anything aside from a fn() -> T is a code smell that indicates it should be replaced

Is this justification for removing the second generic on Lazy? Doing so for once_cell is probably just breaking for minimal gain, but doing so for std's unstable version might be beneficial.

Note that a simple let it = Lazy::new(make); will use the fn()->T {make} ZST rather than the erased fn() -> T function pointer.

Rustc is smart enough to say expected `fn() -> T` found `T` instead of expected `Lazy<T, fn() -> T>` found `Lazy<T, T>` when calling Lazy::new(T), so perhaps there's no reason to prevent the fully inferred case from using dataful closures and monomorphized ZST function types.

with a OnceCell.

This isn't always possible (without just rewriting Lazy); consider:

let big_data = BigData::acquire();
let lazy = LazyCell::new(|| process(big_data));

if maybe() {
    observe(&*lazy);
}

if maybe() {
    observe(&*lazy);
}

The only way to rewrite that with OnceCell or Lazy<_, fn() -> _> is to change semantics (only acquire BigData inside the lazy callback) or reinvent Lazy by mem::takeing big_data into the closure, since it can't be FnOnce in multiple arms.

matklad commented 2 years ago

A Lazy storing anything aside from a fn() -> T is a code smell

Yeah, I think I wouldn't agree with that -- it is sometimes (but rarely) useful to use a Lazy local variable (which is exactly CAD97's example)

matklad commented 2 years ago

Closing! This is a useful discussion, but I think conclusion is that we don't actually want to change anything!