rust-lang / libs-team

The home of the library team
Apache License 2.0
117 stars 18 forks source link

ACP: `ForwardInit<'a, T>` to complement `MaybeUninit<T>` #438

Open clarfonthey opened 2 weeks ago

clarfonthey commented 2 weeks ago

Proposal

Problem statement

Right now, there's no good way to generalise "write-only" memory with MaybeUninit due to one caveat: &mut MaybeUninit<T> gives you the ability to de-initialize memory, which makes it unsound to create a &mut MaybeUninit<T> from a &mut T.

Motivating examples or use cases

BorrowedBuf is a type in the standard library which would benefit from this pattern internally, although in general, the use case is wherever you wish to write to a value but not read from it.

For cases where you're only writing, it's useful to be able to allow both uninitialized memory and already-initialized memory, although right now, there's no way to reuse the same function to perform both these operations without using unsafe code.

You always run into one of the following issues:

(note: all of the above also apply to already-init slices (&mut [T]) and maybe-uninit slices (&mut [MaybeUninit<T>]) but for brevity, the above notation is used)

Solution sketch

A new type, ForwardInit<'a, T>, is added as a conceptual wrapper over &'a mut MaybeUninit<T>. It adds the safety invariant that even if the memory starts uninitialized, it cannot be de-initialized; the initialization is in the forward direction only. Similar to Pin, we ensure the lifetime is "trapped" inside the type to disallow directly mutating the memory inside.

I have some working code up on another repo but I'll provide a basic overview of what I think is most relevant for now before additional API nitpicking.

First, MaybeUninit<T> currently does not allow unsized types. However, since we're wrapping &mut MaybeUnint<T> anyway, we can get away with just storing a pointer inside the actual struct:

#[repr(transparent)]
pub struct ForwardInit<'a, T: ?Sized> {
    inner: NonNull<T>,
    marker: PhantomData<&'a mut T>,
}

This will allow ForwardInit<'a, [T]> and similar types, which are arguably the ones people are most commonly going to be using.

Its constructors will be from other mutable references:

impl<'a, T: ?Sized> ForwardInit<'a, T> {
    pub fn from_mut(value: &mut T) -> ForwardInit<'a, T>;
}
impl<'a, T> ForwardInit<'a, T> {
    pub fn from_maybe_uninit_mut(maybe: &'a mut MaybeUninit<T>) -> ForwardInit<'a, T>;
    pub fn from_maybe_uninit_slice_mut(slice: &'a mut [MaybeUninit<T>]) -> ForwardInit<'a, [T]>;
}

Since we have to ensure that any references are trapped inside the type, we can't offer an Index implementation for slices, but we can still allow indexing them with dedicated methods:

impl<'a, T> ForwardInit<'a, [T]> {
    pub fn index<I: SliceIndex<[T]>>(
        &mut self,
        index: I,
    ) -> ForwardInit<'_, <I as SliceIndex<[T]>>::Output>;
}

(Note: SliceIndex will have to be modified to accomodate this type, but that should be okay. In my sample code, I had to rely on Clone and SliceIndex<[MaybeUninit<T>]> to ensure soundness.)

Additionally, we can offer an iterator for slices as well:

// totally valid rust syntax
pub struct Iter<'a, T>;
impl<'a, T> Iterator<Item = ForwardInit<'a, T>> for Iter<'a, T>;

impl<'a, T> ForwardInit<'a, [T]> {
    pub fn iter(&mut self) -> Iter<'a, T>;
}

Initially, to allow reborrowing, I decided to make the write and assume_init methods take &mut self instead of self. This messed with APIs however, and I decided to add a by_ref method similar to iterators to reborrow:

impl<'a, T: ?Sized> ForwardInit<'a, T> {
    pub fn by_ref(&mut self) -> ForwardInit<'_, T>;
}
impl<'a, T> ForwardInit<'a, T> {
    pub fn write(self, value: T) -> &'a mut T;
}
impl<'a, T: ?Sized> ForwardInit<'a, T> {
    pub unsafe fn assume_init(self) -> &'a mut T;
}

And here are some sample slice methods:

impl<'a, T: Clone> ForwardInit<'a, [T]> {
    pub fn write_cloned_slice(self, src: &[T]) -> &'a mut [T];
}
impl<'a, T: Copy> ForwardInit<'a, [T]> {
    pub fn write_copied_slice(self, src: &[T]) -> &'a mut [T];
}

Like the MaybeUninit API, this would likely be rolled out in stages depending on what feels most useful to the community, and what APIs are desired. The biggest proposal here is the idea of a ForwardInit<'a, T> type itself, and we can change how we fill in the blanks later.

Alternatives

As of now, the BorrowedBuf API is probably the closest to something that might offer something similar to this. However, this does show that there is some kind of desire for this type of API, even in the standard library.

Additionally, one very reasonable alternative is… I implemented this as an external crate, so, keep it that way. This has some of the downsides of being unusable in the standard library and potentially fragmenting the ecosystem with different implementations, but the upsides of not requiring any stdlib maintenance.

Links and related work

Sample implementation: https://codeberg.org/clarfonthey/forward-init/src/branch/main/src/lib.rs

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

RustyYato commented 2 weeks ago

Before I criticize this, I want to say that I think this is a good step forward, and that I have thought about this api (under a different name) some time ago. So maybe I can lend some insight.

This isn't as useful as it seems in public apis. For example:

pub fn foo(init: ForwardInit<T>) { ... }

The caller has no guarantees that init is actually initialized, so for example, you can't use this in a safe trait and assume anything about the contents of init after foo is called.

Of course this doesn't affect private apis, but then it's only marginally better than just using &mut MabyUninit<T>, only one (easy to verify) additional invariant to take care of.

So I think for ForwardInit to be useful in a safe context a newtyped enum like:

// actual newtype wrapper omitted here
enum ForwardInit<'a, T> {
    Mu(&'a mut MaubeUninit<T>),
    Ref(&'a mut T)
}

Where it automatically switches from Mu to Ref on write, would be more useful. This would allow the caller to ensure that the callee did initialize the pointee, and would allow entirely safe and generic usages of MaybeUninit, which is a major step up from what we have today.

This could be combined with an optimization to take advantage of pointer alignment niches so it's still just one ptr large, but that's not necessary.

clarfonthey commented 1 week ago

So, maybe you were reading the version that just had versions of &mut self -> &mut T (which I changed to self -> &mut T) but least the intent here is that ForwardInit<'_, T> represents &mut MaybeUninit<T>, and when the API is done with it, you get &mut T, which verifies that the write is completed.

IMHO, while an enum is okay, it doesn't really offer the same ergonomics, and additionally, it won't work for slices either.

RustyYato commented 1 week ago

So, maybe you were reading the version that just had versions of &mut self -> &mut T (which I changed to self -> &mut T) but least the intent here is that ForwardInit<'_, T> represents &mut MaybeUninit, and when the API is done with it, you get &mut T, which verifies that the write is completed.

Yes, I think I did. However, a signature like fn(ForwardInit<'a, T>) -> &'a mut T doesn't guarantee that the callee wrote to init, since they could have done Box::leak(Box::new(make_value())), without touching init at all.

IMHO, while an enum is okay, [...], it won't work for slices either.

I agree that the ergonomics may be more difficult to achieve, but I think it should be possible to implement all the methods you did.

Can you elaborate on why an enum wouldn't work for slices?

clarfonthey commented 1 week ago

Can you elaborate on why an enum wouldn't work for slices?

MaybeUninit<T> requires T: Sized, which is why there's the asymmetry of using [MaybeUninit<T>] instead of MaybeUninit<[T]>.

Amanieu commented 1 week ago

The main use case for this API is for read-like APIs which return a variable length slice into a user-provided buffer. This would allow such methods to accept both &mut [T] and &mut [MaybeUninit<T>] for the buffer. I can't think of any other use cases for it.

Could you give more examples of how this API could be used in practice? How does it interact with BorrowedBuf (which is still unstable so we can still make changes to it)?

clarfonthey commented 6 days ago

To clarify, I don't think this would actually affect the API for BorrowedBuf as-is, since it actually provides more than you really need for this use case. The main distinction would be that BorrowedBuf could use it internally.

However, the expectation would be that simpler APIs similar to read could look like the following:

fn read_buf<'b>(&mut self, buf: ForwardInit<'b, [u8]>) -> &'b [u8];

(note that the choice to return &[u8] or &mut [u8] is mostly arbitrary; ForwardInit returns &mut for flexibility, but you can decide to return an immutable version if you prefer)

This API is a bit different from BorrowedBuf or BorrowedCursor since that API explicitly stores all the information about what is initialized in the struct itself. However, if you just want a simple API where you read some data to either copy somewhere else or discard after you've read it, you really only need to return the slice of initialized data, since that gives you all the information you need.

Maybe a similar analogy to an API that might look similar to this is char::encode_utf8, which currently takes in &mut [u8] and returns &mut str. This could instead take in ForwardInit<'_, [u8]> and the API would effectively work the same: you're not expecting to actually save the buffer you receive after the fact, and are just using it as a place to store temporary data to be copied elsewhere before being discarded.

And also to further clarify: the main benefit of an API like this is that you can write one function that works with ForwardInit and you can effectively pass both &mut [T] and &mut [MaybeUninit<T>] to the same function, assuming you first convert them to a ForwardInit. The main benefit is that you're only writing one function to handle cases instead of several that are parameterised depending on whether the buffer was initialized or not beforehand.