Feature request: Make `Lazy<T, F>` covariant in `F`

jorendorff commented 2 years ago

I have some code where we'd like to use Lazy values as fields of structs, which means boxing closures:

type AnyLazy<'a, T> = Lazy<T, Box<dyn FnOnce() -> T + 'a>>;

The problem is that this type alias is invariant in 'a, because Lazy<T, F> is invariant in F. So actually using this type causes insurmountable lifetime issues in our code.

I hesitate to ask for this, because the only way to get it is to lie a lot to Rust about what's inside the Cell. The API would be safe, but I don't see a way to do it without a fairly gross amount of unsafe code.

jorendorff commented 2 years ago

If this is something you'd consider, let me know. I can give it a shot.

matklad commented 2 years ago

I think we should have it. I don't particularly mind unsafe, as long as it is sound. One problem here is that it'll be hard to check soundness for me. I guess, let's do the following:

you write PR with comments and tests
I get up-to-speed with respect to variance and double check that the logic is sound
at the end, we rope in RalfJung / r/rust to poke holes in our reasoning?

matklad commented 2 years ago

We also should fix https://doc.rust-lang.org/std/lazy/struct.Lazy.html in the same way -- for std types, it's very important that they are pedantically correct in terms of variance and such.

danielhenrymantilla commented 1 year ago

For what is worth, since the api w.r.t. F is ownership-based exclusively (no & or &mut on the (value of type) F), it means covariance is necessarily fine 🙂

douglas-raillard-arm commented 1 year ago

Drive by comment: I've hit a similar issue, for which I could not find a fully satisfactory answer because it seems impossible to get interior mutability and covariance at the same time for owned data.

We also should fix https://doc.rust-lang.org/std/lazy/struct.Lazy.html in the same way -- for std types, it's very important that they are pedantically correct in terms of variance and such.

I fully agree. Even UnsafeCell<T> is invariant in T, which makes it basically impossible to use as the basis of a memoization system even if it's perfectly fine (i.e. replacing the original value by either a value of the exact same lifetime or by 'static ). This is a bummer especially since I'd expect UnsafeCell to give me maximum (unsafe) control, given it's the only primitive that can be used.

endorpersand commented 1 year ago

I think it makes sense that UnsafeCell<T> is invariant over T since that's usually the correct and least error-prone option (for example, both Cell<T> and OnceCell<T> are invariant over T). However, yeah, I don't think it's possible to create a covariant LazyCell without a covariant version of UnsafeCell.

I did come up with a hack that sort of allows for covariant UnsafeCells, but it's clunky, and I couldn't figure out a way to make it general purpose.

My idea was to create a union over T and an UnsafeCell over an array:

#[repr(C)]
union CovUnsafeCell<T, const T_SIZE: usize> {
    value: ManuallyDrop<T>,
    cell: ManuallyDrop<UnsafeCell<[u8; T_SIZE]>>
}

impl<T, const T_SIZE: usize> CovUnsafeCell<T, T_SIZE> {
    pub fn new(t: T) -> Self {
        // T_SIZE is the number of bytes within the UnsafeCell. 
        // It has to be >= size_of::<T>() so that a raw mut pointer from the UnsafeCell
        // has permission to modify all of the value field.
        assert!(T_SIZE >= size_of::<T>());
        CovUnsafeCell { value: ManuallyDrop::new(t) }
    }

    // Safety: Using this pointer has all the requirements of UnsafeCell and
    // an extra requirement that mutations should not cause violations in covariance.
    pub fn get(&self) -> *mut T {
        unsafe { &self.cell }.get() as *mut T 
    }
}

This is covariant since the T parameter isn't in UnsafeCell<...>. However, this approach is incredibly clunky because the T_SIZE parameter has to be specified when the cell is initialized. And unfortunately, I can't think of a way for this parameter to disappear. T_SIZE cannot be defined generically to depend on T (not even with #[feature(const_generic_exprs)]), or else CovUnsafeCell becomes invariant over T.

danielhenrymantilla commented 1 year ago

@douglas-raillard-arm a potential trick to make it not need a SIZE, which I suspect to be sound based on SB[^1] (but I don't know about TB[^2]), would be:

#[repr(C)]
union CovariantCell<T> {
    value: ManuallyDrop<T>,
    _mutable: ManuallyDrop<Cell<u8>>,
}

Playground, which currently passes miri
The trick being, IIUC, that the way {Unsafe,}Cell<T> interacts with mutability is by preventing a &ThingContainingIt<T> from assuming an immutable pointee; and this property is tracked at the pointer/reference level, rather than w.r.t. each pointee byte?

If I were right, this, thus, would guarantee that &CovariantCell<T> not be an immutable reference.

[^1]: Stacked Borrows (paper, main blog post, repo) [^2]: Treed Borrows (blog post)

danielhenrymantilla commented 1 year ago

Another alternative, for the specific case of F within Lazy<T, F>, could be to hold it in a ManuallyDrop<F> instead of a Cell<Option<F>>. The sketch of this would be:

on init: use <*const _>::read(&*self.f) to get the owned f, and initialize cell accordingly.
on drop: check if the f has been read or not. Either through an extra dedicated bool flag, or through .cell. If not, ManuallyDrop::drop the f.

Worst case scenario we'll have to pay for a separate Cell<bool> flag, thereby hindering the discriminant elision that Option<T> takes advantage of. Best case scenario, .cell itself has enough state to implement this without it.

Proof of concept

douglas-raillard-arm commented 10 months ago

I ended up circling back to the same issue and I realized that a covariant LazyCell would indeed be nice, but we might want to have an UnsafeCovariantOnceCell as well. Embedding a LazyCell<T, F> in a struct with a cached "property" is a non-starter as F would either be:

a generic parameter on the struct. It would become impossible to instantiate it since F is an internal private function.
Box<dyn Fn() -> T> which is not great for my use case because of the box overhead. The only reason I'm messing with LazyCell<> is to optimize in the first place.

Maybe it would be possible to design a OnceCell<'a, T> that remembers the lifetime 'a that T had when creating the cell. Then OnceCell could be invariant in 'a and covariant in T, and force the Fn passed to get_or_try_init() to return a T + 'a. This way we can't accidentally store a value with a lifetime that is too short inside the cell, which is AFAIK the only reason UnsafeCell and &mut are invariant. I don't know if that's feasible or if that bit of invariance would still get in the way of consuming the cell though.

matklad / once_cell

Feature request: Make `Lazy<T, F>` covariant in `F` #167