Tracking Issue for maybe_uninit_as_bytes

Feature gate: #![feature(maybe_uninit_as_bytes)

This is a tracking issue for APIs which allow access to individual bytes of a MaybeUninit<T>. This is always safe even for padding bytes since the bytes are themselves represented as MaybeUninit<u8>.

Public API

// core::mem

impl<T> MaybeUninit<T> {
    pub fn as_bytes(&self) -> &[MaybeUninit<u8>];
    pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>];
    pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>];
    pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>];
}

Steps / History

[x] Implementation: #89747
[ ] Final comment period (FCP)
[ ] Stabilization PR

Unresolved Questions

None yet.

Would it maybe make sense for these to be arrays of size_of::<T>() elements instead? (I don't recall if this is allowed in stable Rust yet)

The only thing I can think of that can possibly be an issue is making a type with padding into a &mut [MaybeUninit<u8>], and then calling .write() on a padding byte.

use std::mem::MaybeUninit;

#[repr(C)]
#[derive(Default)]
struct Foo {
    a: u16,
    b: u8
}

fn main() {
    let mut f = Foo::default();
    let fmur = &mut f as *mut Foo as *mut MaybeUninit<u8>;
    unsafe {
        dbg!(*std::slice::from_raw_parts_mut(fmur, 4)[3].write(42));
    }
}

This is presumably sound because padding bytes are only made uninit when you write a field of type Foo, not just "it came from a Foo and is therefore always uninit".

But also I'd need to have a think about how that interacts with interior mutability. Presumably sound, in any case.

In principle we could have such methods for any type U, not just u8 (after adjusting the size arithmetic accordingly). But I guess there is no good reason to have those?

If you allow casting to a type U, then you need to solve the problem of

What if U is a ZST?
What if there's not a multiple of size_of(U) bytes in the MaybeUninit?
What if the align of U is more than T, and the slice is misaligned for a U?

These could be solved by panicking or returning an error, but none of these are problems when U is size 1 align 1.

Should as_bytes_mut be instead named as_mut_bytes as recommended by the API guidelines? Same with slice_as_bytes_mut/slice_as_mut_bytes.

If the mut qualifier in the name of a conversion method constitutes part of the return type, it should appear as it would appear in the type. For example Vec::as_mut_slice returns a mut slice; it does what it says. This name is preferred over as_slice_mut.

I recognize the stdlib is not consistent on this. See str::as_bytes_mut, which also outputs a &mut [u8] instead of a &mut [MaybeUninit<u8>] while being called bytes. If the _mut_X order really is preferred, we should stick with it for future APIs.

The current implementation of as_bytes (and slice_as_bytes) is unsound in the sense that it allows undefined behaviour in unsafe code that obeys all the stated safety guidelines:

#![feature(maybe_uninit_as_bytes)]
use std::mem::MaybeUninit;
use std::cell::Cell;

fn unsoundness_demo(a: &u8, b: &Cell<u8>) {
    println!("{}", a);
    b.set(2);
    println!("{}", a);
}

fn main() {
    unsafe {
        let mut u = MaybeUninit::uninit();
        u.write(Cell::new(1));
        unsoundness_demo(u.as_bytes()[0].assume_init_ref(), u.assume_init_ref());
    }
}

Playground

unsoundness_demo's first argument a is a shared reference without interior mutability, yet the thing it pointed to was modified, which is undefined behaviour. However, the only unsafe method called here is assume_init_ref, and the code upholds the stated safety guideline for that method (that the content is fully initialised – every byte of both the MaybeUninit<Cell<u8>> and the MaybeUninit<u8> is set to a known value that is legal for the data type).

This problem could be fixed either via adding a new safety requirement on assume_init_ref and friends that the MaybeUninit<T> has not been converted from a MaybeUninit<U> where types T and U differ in interior mutability (perhaps by considering a byte to be uninitialised if copied from a type with different interior mutability for that byte), or via banning the use of as_bytes if either the source or destination type has interior mutability.

@RalfJung wrote:

In principle we could have such methods for any type U, not just u8 (after adjusting the size arithmetic accordingly). But I guess there is no good reason to have those?

There is actually an important use case for being able to do the equivalent of as_bytes_mut between arbitrary MaybeUninit types. A common pattern in C is to write a function in which the caller of a function allocates storage for the result of the function call, and the function writes its result to that storage (e.g. this is used by the C standard library function snprintf). In Rust, it is possible to do this in safe code by having the function return a reference back to the same storage (i.e. the function takes a &'a mut MaybeUninit<T> as its argument, initialises it using write and returns the &mut T that was returned by write as its own return value).

Although this works well if you know the return type already, there are cases where you'd want to do this sort of thing but don't know the type in advance. For example, I've been recently looking into "local allocators" in which allocations are made one at a time on-demand, but where deallocations are not allowed while the local allocator is alive (rather, dropping the local allocator drops all the allocations at once). This is a very efficient pattern which is commonly used in game development. In general the applications of this sort of allocator are fairly similar to the applications of Rc (you can even use it to create structures that reference cycles by using it together with Cell, just like you can with Rc), except that the deallocations happen all at once rather than when no references remain – but you get a number of advantages in return (no risk of accidentally leaking memory due to a reference cycle, substantially better performance due to not needing a separate allocation for every Rc and not needing to store or update reference counts, and the references are Copy whereas an Rc can't be embedded in a type that is Copy), and one disadvantage (you can't soundly run the destructors of the types you allocated) which doesn't matter in most of the cases where you'd want to use this. Via using MaybeUninit::write(), it is possible to implement this entirely in safe code.

If there were a safe transmute_mut for MaybeUninit that allowed transmutation between &mut MaybeUninit<T> and &mut MaybeUninit<U> whenever the source reference was sufficiently large, and sufficiently aligned, to be read as a reference to the destination type, then this sort of allocator would be much more powerful, as it could allocate data of more than one type. (Presently, it needs to initially allocate its data storage as a [MaybeUninit<T>], meaning that it can only allocate Ts. With a safe transmute_mut, it could alternatively allocate its data storage as a [MaybeUninit<u8>] and transmute slices of that into the appropriate type in order to do allocations.) Although this is currently possible to work around by using a different data storage for each possible T, this precludes the situation where data of a type with a lifetime is allocated in a loop (with a different lifetime each time round the loop, meaning that T is a slightly different type each time). As far as I can tell, transmuting an &'a mut MaybeUninit<T> into an &'a mut MaybeUninit<U> is safe whenever the reference has sufficient size and alignment, and would make this sort of allocator more powerful because it would no longer be resticted to allocating finitely many types. (Doing this soundly on shared references also requires the interior mutabilities to match, and I also can't think of a use case for it in safe Rust – it might have uses in unsafe Rust.) So a good reason to have an arbitrary transmute_mut for MaybeUninit would be that it allows an important memory allocation primitive to be written entirely in safe Rust.

This problem could be fixed either via adding a new safety requirement on assume_init_ref and friends that the MaybeUninit has not been converted from a MaybeUninit where types T and U differ in interior mutability (perhaps by considering a byte to be uninitialised if copied from a type with different interior mutability for that byte), or via banning the use of as_bytes if either the source or destination type has interior mutability.

IMO the fix is to say that for the duration that the reference returned by as_bytes is life, no (interior) mutation of the underlying memory may happen.

What about pointer provenance? Converting &mut MaybeUninit<*const T> into &mut [MaybeUninit<u8>] looks a lot like a pointer-usize-pointer roundtrip, forbidden under strict provenance rules.

Although that conversion could be used to do a pointer-usize-pointer roundtrip, it doesn't actually violate any rules until you try to treat the resulting pointer as initialized.

A much simpler example of the same sort of thing is to convert an &mut MaybeUninit<bool> into &mut [MaybeUninit<u8>; 1], and write 2 to the memory through the new reference. You end up with an &mut MaybeUninit<bool> pointing to memory holding the value 2, which is not valid for a bool. But that doesn't matter, because there's no requirement that uninitialized memory has a valid value for the type.

In this case, the soundness requirement would be the usual "you musn't assume a MaybeUninit has been initialised unless it actually has a valid value for the data type". Under strict provenance, pointer values constructed without going through provenance APIs aren't valid – they can be held in a MaybeUnint<*const T> because you can hold any random bit pattern in those, but can't be converted into a *const T without writing a valid value to the memory first.

rust-lang / rust