Open Amanieu opened 2 years ago
Would it maybe make sense for these to be arrays of size_of::<T>()
elements instead? (I don't recall if this is allowed in stable Rust yet)
The only thing I can think of that can possibly be an issue is making a type with padding into a &mut [MaybeUninit<u8>]
, and then calling .write()
on a padding byte.
Like
use std::mem::MaybeUninit;
#[repr(C)]
#[derive(Default)]
struct Foo {
a: u16,
b: u8
}
fn main() {
let mut f = Foo::default();
let fmur = &mut f as *mut Foo as *mut MaybeUninit<u8>;
unsafe {
dbg!(*std::slice::from_raw_parts_mut(fmur, 4)[3].write(42));
}
}
This is presumably sound because padding bytes are only made uninit when you write a field of type Foo
, not just "it came from a Foo
and is therefore always uninit".
But also I'd need to have a think about how that interacts with interior mutability. Presumably sound, in any case.
In principle we could have such methods for any type U
, not just u8
(after adjusting the size arithmetic accordingly). But I guess there is no good reason to have those?
If you allow casting to a type U, then you need to solve the problem of
These could be solved by panicking or returning an error, but none of these are problems when U is size 1 align 1.
Should as_bytes_mut
be instead named as_mut_bytes
as recommended by the API guidelines? Same with slice_as_bytes_mut
/slice_as_mut_bytes
.
If the
mut
qualifier in the name of a conversion method constitutes part of the return type, it should appear as it would appear in the type. For exampleVec::as_mut_slice
returns a mut slice; it does what it says. This name is preferred overas_slice_mut
.
I recognize the stdlib is not consistent on this. See str::as_bytes_mut
, which also outputs a &mut [u8]
instead of a &mut [MaybeUninit<u8>]
while being called bytes
. If the _mut_X
order really is preferred, we should stick with it for future APIs.
The current implementation of as_bytes
(and slice_as_bytes
) is unsound in the sense that it allows undefined behaviour in unsafe code that obeys all the stated safety guidelines:
#![feature(maybe_uninit_as_bytes)]
use std::mem::MaybeUninit;
use std::cell::Cell;
fn unsoundness_demo(a: &u8, b: &Cell<u8>) {
println!("{}", a);
b.set(2);
println!("{}", a);
}
fn main() {
unsafe {
let mut u = MaybeUninit::uninit();
u.write(Cell::new(1));
unsoundness_demo(u.as_bytes()[0].assume_init_ref(), u.assume_init_ref());
}
}
unsoundness_demo
's first argument a
is a shared reference without interior mutability, yet the thing it pointed to was modified, which is undefined behaviour. However, the only unsafe
method called here is assume_init_ref
, and the code upholds the stated safety guideline for that method (that the content is fully initialised – every byte of both the MaybeUninit<Cell<u8>>
and the MaybeUninit<u8>
is set to a known value that is legal for the data type).
This problem could be fixed either via adding a new safety requirement on assume_init_ref
and friends that the MaybeUninit<T>
has not been converted from a MaybeUninit<U>
where types T
and U
differ in interior mutability (perhaps by considering a byte to be uninitialised if copied from a type with different interior mutability for that byte), or via banning the use of as_bytes
if either the source or destination type has interior mutability.
@RalfJung wrote:
In principle we could have such methods for any type
U
, not justu8
(after adjusting the size arithmetic accordingly). But I guess there is no good reason to have those?
There is actually an important use case for being able to do the equivalent of as_bytes_mut
between arbitrary MaybeUninit
types. A common pattern in C is to write a function in which the caller of a function allocates storage for the result of the function call, and the function writes its result to that storage (e.g. this is used by the C standard library function snprintf
). In Rust, it is possible to do this in safe code by having the function return a reference back to the same storage (i.e. the function takes a &'a mut MaybeUninit<T>
as its argument, initialises it using write
and returns the &mut T
that was returned by write
as its own return value).
Although this works well if you know the return type already, there are cases where you'd want to do this sort of thing but don't know the type in advance. For example, I've been recently looking into "local allocators" in which allocations are made one at a time on-demand, but where deallocations are not allowed while the local allocator is alive (rather, dropping the local allocator drops all the allocations at once). This is a very efficient pattern which is commonly used in game development. In general the applications of this sort of allocator are fairly similar to the applications of Rc
(you can even use it to create structures that reference cycles by using it together with Cell
, just like you can with Rc
), except that the deallocations happen all at once rather than when no references remain – but you get a number of advantages in return (no risk of accidentally leaking memory due to a reference cycle, substantially better performance due to not needing a separate allocation for every Rc
and not needing to store or update reference counts, and the references are Copy
whereas an Rc
can't be embedded in a type that is Copy
), and one disadvantage (you can't soundly run the destructors of the types you allocated) which doesn't matter in most of the cases where you'd want to use this. Via using MaybeUninit::write()
, it is possible to implement this entirely in safe code.
If there were a safe transmute_mut
for MaybeUninit
that allowed transmutation between &mut MaybeUninit<T>
and &mut MaybeUninit<U>
whenever the source reference was sufficiently large, and sufficiently aligned, to be read as a reference to the destination type, then this sort of allocator would be much more powerful, as it could allocate data of more than one type. (Presently, it needs to initially allocate its data storage as a [MaybeUninit<T>]
, meaning that it can only allocate T
s. With a safe transmute_mut
, it could alternatively allocate its data storage as a [MaybeUninit<u8>]
and transmute slices of that into the appropriate type in order to do allocations.) Although this is currently possible to work around by using a different data storage for each possible T
, this precludes the situation where data of a type with a lifetime is allocated in a loop (with a different lifetime each time round the loop, meaning that T
is a slightly different type each time). As far as I can tell, transmuting an &'a mut MaybeUninit<T>
into an &'a mut MaybeUninit<U>
is safe whenever the reference has sufficient size and alignment, and would make this sort of allocator more powerful because it would no longer be resticted to allocating finitely many types. (Doing this soundly on shared references also requires the interior mutabilities to match, and I also can't think of a use case for it in safe Rust – it might have uses in unsafe Rust.) So a good reason to have an arbitrary transmute_mut
for MaybeUninit
would be that it allows an important memory allocation primitive to be written entirely in safe Rust.
This problem could be fixed either via adding a new safety requirement on assume_init_ref and friends that the MaybeUninit
has not been converted from a MaybeUninit where types T and U differ in interior mutability (perhaps by considering a byte to be uninitialised if copied from a type with different interior mutability for that byte), or via banning the use of as_bytes if either the source or destination type has interior mutability.
IMO the fix is to say that for the duration that the reference returned by as_bytes
is life, no (interior) mutation of the underlying memory may happen.
What about pointer provenance? Converting &mut MaybeUninit<*const T>
into &mut [MaybeUninit<u8>]
looks a lot like a pointer-usize-pointer roundtrip, forbidden under strict provenance rules.
Although that conversion could be used to do a pointer-usize-pointer roundtrip, it doesn't actually violate any rules until you try to treat the resulting pointer as initialized.
A much simpler example of the same sort of thing is to convert an &mut MaybeUninit<bool>
into &mut [MaybeUninit<u8>; 1]
, and write 2 to the memory through the new reference. You end up with an &mut MaybeUninit<bool>
pointing to memory holding the value 2, which is not valid for a bool
. But that doesn't matter, because there's no requirement that uninitialized memory has a valid value for the type.
In this case, the soundness requirement would be the usual "you musn't assume a MaybeUninit
has been initialised unless it actually has a valid value for the data type". Under strict provenance, pointer values constructed without going through provenance APIs aren't valid – they can be held in a MaybeUnint<*const T>
because you can hold any random bit pattern in those, but can't be converted into a *const T
without writing a valid value to the memory first.
Feature gate:
#![feature(maybe_uninit_as_bytes)
This is a tracking issue for APIs which allow access to individual bytes of a
MaybeUninit<T>
. This is always safe even for padding bytes since the bytes are themselves represented asMaybeUninit<u8>
.Public API
Steps / History
Unresolved Questions