SUPERCILEX commented 2 years ago

Goals

Many of the new MaybeUninit APIs deal with slices or arrays. The current state of affairs is unfortunate because it leads to API duplication. An API may need to be duplicated 3 times: once on MaybeUninit and then another two times for slices and arrays. In an ideal world, the type system could express some sort of container API where if something contains a MaybeUninit then certain methods are available. Since we don't live in that world, the primary goal is to make as many MaybeUninit methods available as possible on slices and arrays without duplication.

Other goals include guiding users towards the right APIs and making unsafely more visible.

Problematic APIs

`MaybeUninit::{uninit_array,array_assume_init}`

These APIs let you create MaybeUninit arrays, but do not let you manipulate them. Any APIs on MaybeUninit would need to be duplicated on arrays or accessed via slices (requiring borrowing).

Proposal

An API that converts between [MaybeUninit<T>; N] <-> MaybeUninit<[T; N]>. Since you can get a MaybeUninit with your array inside it, anything on MaybeUninit will work for arrays.

Currently it is named transpose but there may be clearer names.

Alternatives

Implement index traits on `MaybeUninit<[T; N]>`

My two primary concerns are that this prevents the natural use of array initialization syntax and would require bounds checking when that may not be desirable.

Furthermore, I don't think this works with MaybeUninits of non-Copy types unless you call the index method directly?

`slice_assume_init` variants

Assuming a transpose-like API is not possible for slices, these methods need to be available.

Proposal

Currently they aren't inherent methods but should be.

`as_bytes` variants

Based on the discussion below we should keep these methods to circumvent unsafety around invalid pointers.

Proposal

Currently they aren't inherent methods but should be. Add documentation warning users to be extra careful about meeting the invariants of the type they are mucking around with.

`slice_as_ptr` variants

These methods are questionable because they combine the removal of bounds checking with raw pointer conversion. You may want one but not the other, and in either case, you should be required to use separate unsafes.

Proposal

Get rid of them and instead provide safe MaybeUninit pointer conversion to its underlying type (T). That is, *{const,mut} MaybeUninit<T> -> *{const,mut} T.

`write_slice` variants

Based on discussion in the tracking issue, the name of these methods has not conveyed the fact that write_cloned does not drop old items. I believe we should just use consistent names with slices and document this behavior in bold or something. And at worst, memory leaks are not unsafe in rust so eh.

Proposal

Rename to {copy,clone}_from_slice
Move to inherent methods

`uninit`

This method cannot be used directly in array initializers if T is not Copy. However, using a const gets around this.

Proposal

Add an UNINIT const that does the same thing as uninit.

Drawbacks

API duplication. This also isn't necessary with transpose, though it would look prettier.

API diff

impl<T> MaybeUninit<T> {
    // Removed method. Can be replaced with `array.transpose().assume_init()`
-    pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
-    pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
-    pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
-    pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
-    pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
-    pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
-    pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
-    pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
-    pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
-    pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
}

// Moved slice methods
impl<T> [MaybeUninit<T>] {
+    pub unsafe fn assume_init_ref(&self) -> &[T]
+    pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
+    pub fn copy_from_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
+    pub fn clone_from_slice(&mut self, src: &[T]) -> &mut [T] where T: Clone
+    pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
+    pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]
}

// Completely new methods

impl<T, const N: usize> MaybeUninit<[T; N]> {
+    pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T, const N: usize> [MaybeUninit<T>; N] {
+    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

impl<T> *const MaybeUninit<T> {
+    pub const fn inner(self) -> *const T
}
impl<T> *mut MaybeUninit<T> {
+    pub const fn inner(self) -> *mut T
}

the8472 commented 2 years ago

Please list all the API changes you intend to make. And the motivation doesn't explain why this is better, why we'd want this.

SUPERCILEX commented 2 years ago

Done. The motivation is from https://github.com/rust-lang/libs-team/issues/110.

the8472 commented 2 years ago

I'm seeing additional changes on rustc PRs that all are related to MaybeUninit. I think a larger overview of what you're aiming for overall rather than piecemeal changes would be helpful.

SUPERCILEX commented 2 years ago

I'm not sure I agree since these changes can go through independently, but the big theme is "generalize":

Transpose is a superset of anything you could do with MaybeUninit::{uninit_array,array_assume_init}.
The slice_as{,_mut}_ptr methods can be replaced with a combination of array indexing (or get_unchecked if you don't want bounds checking which should be an explicit choice) and then a safe cast to the inner type.
- I'm going to deal with this last because I'm not sure I like my solution, but I also really hate the current methods because they force you to combine elided bounds checking with reference to pointer conversion.

And then unrelated:

Move a bunch of methods that couldn't be on slices to be on slices since that's possible now.

ChrisDenton commented 2 years ago

Quick summary, as far as I understand it. I'm including transpose here even though it's already merged because its use is what makes this work. Sorry if I've missed anything.

EDIT: Updated after feedback from SUPERCILEX EDIT EDIT: Removed as_bytes methods, see below EDIT EDIT EDIT: This is now outdated. See SUPERCILEX opening comment

impl<T, const N: usize> MaybeUninit<[T; N]> {
    /// MaybeUninit<[T; N]> into a [MaybeUninit<T>; N]
    pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T> MaybeUninit<T> {
    // Remove `as_bytes` methods. See below for discussion.
    //pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
    //pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]

    // Removed method. Can be replaced with `array.transpose().assume_init()`
    //pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
    //pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
    //pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
    //pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
    //pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
    //pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}

// New array method
impl<T, const N: usize> [MaybeUninit<T>; N] {
    /// Transposes a [MaybeUninit<T>; N] into a MaybeUninit<[T; N]>.
    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

// Moved slice methods
impl<T> [MaybeUninit<T>] {
    pub fn write_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
    pub fn write_slice_cloned(&mut self, src: &[T]) -> &mut [T] where T: Clone
    pub unsafe fn assume_init_ref(&self) -> &[T]
    pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
}

// Convert a MaybeUninit pointer to a pointer to its underlying type.
// This is not a `From` implementation because that could be surprising in unsafe code.
impl<T> *const MaybeUninit<T> {
    pub const fn inner(self) -> *const T {
        self as *const T
    }
}
impl<T> *mut MaybeUninit<T> {
    pub const fn inner(self) -> *mut T {
        self as *mut T
    }
}

SUPERCILEX commented 2 years ago

as{,_mut}_bytes should be under [MaybeUninit<T>] instead of [MaybeUninit<T>; N]. Only transpose will ever be needed under [MaybeUninit<T>; N] because you have access to all of MaybeUninit after the transpose. Note that the only reason any of the [MaybeUninit<T>] methods exist is because you can't transpose references AFAIK (&[MaybeUninit<u8>] to MaybeUninit<&[u8]> makes no sense).

impl From<*const MaybeUninit<T>> for *const T;
impl From<*mut MaybeUninit<T>> for *mut T;

It was decided that these guys should not be From impls since that's too implicit. I'll tackle slice_as{,_mut}_ptr (and the safe pointer casts) last since that will need bikeshedding + more thought.

Just a general note, I'd hesitate to say the "new" methods were "removed." While true from the API diff standpoint, they were just moved. Anyway, thank you!

ChrisDenton commented 2 years ago

Thanks, I've edited my comment to reflect that.

SUPERCILEX commented 2 years ago

Nice! Slight tweak:

+   // Removed methods. These can be replaced with `array.as_{,_mut}ptr().inner()`.
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]

SUPERCILEX commented 2 years ago

Oh wait no sorry misread stuff. I think you're fixing it as we speak. Writing out reply rn.

ChrisDenton commented 2 years ago

Yeah, sorry my copy/pasting went askew and I got in a muddle fixing it. Should be sorted now.

SUPERCILEX commented 2 years ago

Still not quite cuz I goofed, should be:

impl<T> MaybeUninit<T> {
    // Renamed from `as_bytes_mut`
    pub fn as_mut_bytes(&mut self) -> &mut [MaybeUninit<u8>];

    // Removed method. Can be replaced with `array.transpose().assume_init()`
    //pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
    //pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
    //pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
    //pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
    //pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
    //pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
    //pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
    //pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}

ChrisDenton commented 2 years ago

Ok, I think I got there in the end! Thanks.

SUPERCILEX commented 2 years ago

Lol, thanks! Looks good.

Maybe also worth pointing out that there's discussion around the write_slice method naming: https://github.com/rust-lang/rust/issues/79995.

Seeing all of this listed out is really helpful. It's making me wonder about the as_bytes methods. They feel like a weird restricted transmute. as_bytes could become as_r and output any type R to generalize, but then we've basically re-invented transmutes. Poking around, I couldn't find usages of the non-slice as_bytes methods. The slice ones seem to be used to read data from a byte stream into a repr(C) type. I'm starting to lean towards getting rid of those methods and requiring an explict transmute. The tracking issue talks about padding bytes being safe, but that's not the big pitfall, the implicit transmute is:

#![feature(maybe_uninit_as_bytes)]
use core::mem::MaybeUninit;

#[derive(Debug)]
enum Variant { A(u64), B(String) }

fn main() {
    let mut u = MaybeUninit::new(Variant::A(42));

    // How are variants represented again? Who knows, but 69 is better than 42
    u.as_bytes_mut()[0].write(69);

    // SAFETY: look how safe this is! It's intialized, so nothing could go wrong right?
    let u = unsafe { u.assume_init() };

    // Ahem
    println!("{u:?}");
}

Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly.

I desperately need to get some other stuff done, but I'll open a PR to nuke the as_bytes methods in a bit.

thomcc commented 2 years ago

Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly.

I think it just means we need to be clearer about the invariants that assume_init() must uphold (perhaps... I actually think we're clear about this already) -- it's already true that it must not just be initialized, it must uphold the validity invariants of a T as well.

More specifically, we've been pretty clear for a while now that MaybeUninit<T> can hold initialized values aside from T (it's not just T | undef, IOW).

scottmcm commented 2 years ago

Right, assume_init means "assume initialized as a T", not just "the bits were set" -- otherwise MaybeUninit::zeroed would make no sense. And yes, MaybeUninit holds sizeof(T) bytes that may or may not be initialized. https://github.com/rust-lang/rust/pull/97712 is interesting reading related to that.

I would encourage moving the as_bytes discussions to a different issue. They're more around https://github.com/rust-lang/project-safe-transmute kinds of scenarios, in my head, rather than the uninit/assume_init things that this one seems to mostly be talking about.

Naming feedback: I'm really not convinced by the name inner for those pointer methods. That might be ok for safe stuff, where it being unclear might be fine, but like Ralf was saying in https://github.com/rust-lang/rust/pull/101179/files#r996682951, around unsafe code -- as these necessarily are, because pointers -- it's worth being more specific about what they're doing. Maybe https://github.com/rust-lang/libs-team/issues/51#issuecomment-1190661872 can provide some inspiration? Pointers now have cast, cast_const, and cast_mut, so maybe cast_init or something?

Said otherwise, I think "rework all the MaybeUninit functions" is too big a scope for the issue to feasibly get consensus on. I suspect that there's something smaller and separable but still valuable. For example, I might phrase my initial impression of a bunch of these things as being about "rather than needing different-named creation and consumption methods in MaybeUninit for a bunch of different types, there should be a couple extra transformers or methods on those other types to be able to just use uninit and assume_init".

SUPERCILEX commented 2 years ago

@thomcc Sure, I agree with you that MaybeUninit can hold "initialized" values like MaybeUninit::<NonZeroUsize>::zeroed() that violate the invariants of the type and it's therefore UB to assume_init even though the memory is initialized. That said, AFAIK once a MaybeUninit value becomes properly initialized, it can never be uninitialized. Please correct me if I'm wrong as then I don't object to the as_bytes methods. However, if you really can't uninitialize MaybeUninit once it's been initialized, then the as_bytes methods are the only ones that let you safely perform uninitialization, hence my strong objection. They weaken MaybeUninit from a diode allowing initialization to a bidirectional circuit that supports both initialization and uninitialization (very sneakily too).

I would encourage moving the as_bytes discussions to a different issue.

Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?

Naming feedback: I'm really not convinced by the name inner

Fully agree. Do we want this issue to be a catch-all MaybeUninit nightly API overhaul? If so I'll close the other FCP and work on making this issue encompass all the relevant changes + discussion.

scottmcm commented 2 years ago

Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?

Whatever libs-api says is best is what you should do 😄

It's definitely good to consolidate the array_assume_init/slice_assume_init_mut/slice_assume_init_ref/uninit_array changes into one. But whether that's the same overhaul conversation as as_bytes isn't obvious to me.

thomcc commented 2 years ago

That said, AFAIK once a MaybeUninit value becomes properly initialized, it can never be uninitialized.

It can. Consider a case like:

pub fn make_uninitialized<T>(mu: &mut MaybeUninit<T>) {
    *mu = MaybeUninit::uninit();
}

SUPERCILEX commented 2 years ago

@scottmcm Sounds good! :) I closed https://github.com/rust-lang/libs-team/issues/123.

TODO(me):

Move the Chris signatures to the top
Talk about as_bytes
Talk about potential UNINIT constant
Explain overall API duplication reduction goals
Talk about Index{,Mut} traits

Should be able to get to this in a few days.

@thomcc Good point, that lessens my concerns. That said, I remain unconvinced:

I would consider that to be replacing the value not modifying it, but I agree that from the API contract perspective that this distinction is irrelevant.
as_bytes is still the only safe method that I know of (you'll probs prove me wrong again :laughing:) that lets you transmute a type and then mess around with it. I guess you could argue that it's similar to doing a chain of raw pointer casts, but I tend to think of raw pointers as the ultimate form of unsafety. If I see a raw pointer, I automatically assume the code is doing something insane. But if I see slices, I assume that at least the types are sound.
Related to my previous point, I feel like this turns assume_init into a double unsafe. There's the unsafe for making sure that the type is initialized and its invariants are met, and then as_bytes introduces the unsafe of making sure your individual byte manipulations of the type are sound. This feels different to me than "meet the invariants" because there's no way to individually manipulate part of a struct without using raw pointers: you have to use write which is an all-or-nothing "initialize me" function.
- For example, this makes it trivially easy to forget about endianness. My argument for requiring a transmute in addition to an assume_init is that it forces you to cleanly separate documenting the safety invariants of "I initialized this" vs. "I know my data will be in big-endian and I did the requisite byte manipulations to make the type valid." Basically I'm saying that if you ignore the uninit parts, we've given people an easy way to avoid thinking about what it means to mess around with the raw bytes of their type. If would be nice if after you see a write, you know that the type will be valid because it was created from safe rust. as_bytes feels like a hidden raw pointer to me.

RalfJung commented 2 years ago

That said, I remain unconvinced:

MaybeUninit<u8> is a pretty special type, it's our version of the C++ std::byte. IMO it makes a lot of sense that MaybeUninit would have special methods for it.

But anyway, that's off-topic for this PR. I am totally on-board with overhauling the unstable parts of the MaybeUninit API. I would probably not be much involved in that overhaul though -- I am not a libs API person, once the basic language primitives are available and in principle all things can be written correctly somehow I am really bad at figuring out how to best package them up for people and love to leave that work to others. But that sounds like an ACP/RFC kind of document, not a PR. Oh I didn't realize this is a t-libs design issue. I got pinged into 2 or 3 discussions on the same topic here and completely lost track...

RalfJung commented 2 years ago

Also, transpose is a rather undescriptive name. So I am not very happy about this MaybeUninit::uninit().transpose()... I feel like the name should indicate somehow that this is about arrays/slices.

(I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.)

SUPERCILEX commented 2 years ago

C++

I'm not sure we should be using C++ as the role model. 😜 But actually, maybe what I'm advocating for then is to make the as_bytes methods unsafe. I just realized a simple transmute doesn't cut it since you need to set up the slice metadata (and we don't want people to need to deal with that). Anyway, I'll think about this more and put some thoughts in the summary.

I feel like the name should indicate somehow that this is about arrays/slices.

Why? It's defined as an inherent method on arrays... the type system should yell at you if you goof.

And yeah, happy to bikeshed on the name: how about bellybutton terminology? Innie and outie? Ok but more seriously maybe eversion?

scottmcm commented 2 years ago

Hmm, I suppose could add an inherent assume_init method on [MaybeUninit<T>; N] now that it's much easier to add inherent methods to built-in types -- that's just as easy to add as transpose was.

How would people feel about that one?

(It seems at least plausible to me, but I'm having trouble deciding whether it's weird somehow.)

RalfJung commented 2 years ago

Why? It's defined as an inherent method on arrays... the type system should yell at you if you goof.

As before, I don't think the type system is enough in unsafe code. Ideally one can tell what this does just from reading the code, and transpose is just not very specific.

@scottmcm you mean a [MaybeUninit<T>; N] -> [T; N]? Does that cover all usecases?

thomcc commented 2 years ago

But actually, maybe what I'm advocating for then is to make the as_bytes methods unsafe

What would the invariant that the unsafe code must uphold be?

SUPERCILEX commented 2 years ago

Alrighty folks, I summarized all the API change ideas. Thanks for all the feedback and help so far. :) I'll add the method outlines once we settle on something.

RalfJung commented 2 years ago

I actually quite like as_bytes since I think it helps clarify what the invariants are. The fact that this function is safe means quite a lot for other ways that MaybeUninit might be used. Removing the function doesn't make any other use of MaybeUninit sound, it just removes a very explicit bit of invariant documentation.

SUPERCILEX commented 2 years ago

it just removes a very explicit bit of invariant documentation

Yeah, I think I'd be happy if writing the bytes required unsafe, similar to how creating a raw pointer is fine but using it is not. Having that explicit invariant documentation when you're writing the bytes would be very helpful IMO. No idea how to accomplish that though.

RalfJung commented 2 years ago

What I meant is that the fact that writing the bytes is safe is a very explicit and useful bit of invariant documentation.

SUPERCILEX commented 2 years ago

the fact that writing the bytes is safe is a very explicit and useful bit of invariant documentation.

Sorry, I haven't quite been able to figure out what you mean by this. Are you saying that because writing the bytes is safe, you know you won't cause UB while writing the bytes? As opposed to pointers where writing can cause UB?

If so, I can potentially see how that's a good thing though I'm still bothered that it's pushing the unsafety of manipulating a type's raw bytes downstream. I guess what I've been saying is that I don't see much value in being able to write the bytes safely. ~Crashing while creating the type vs while using the type doesn't feel different to me.~ Actually, thinking about this some more I think what you're trying to say is that as_bytes eliminates the possibility of UBing while creating the type? Whereas pointers get the privilege of UBing after creating a garbage type AND while creating the type?

If I'm understanding your argument correctly, then I'm mostly convinced of the value of the as_bytes methods. Perhaps all the mut variants of as_bytes just need extra loud yelling in the docs that reminds people to ensure the bytes they're putting in the type match its invariants. (Though based on my sidenote below, if everyone is in agreement on which invariants must be matched where, then extra docs could be a nice reminder but probably aren't necessarily.)

As a sidenote, this thought process has changed my understanding of safety with regards to raw pointers. I previously thought writing to the pointer was the right place to document invariants about the type's bytes and how they were being met. However, it sounds like the correct place for that documentation is in the places you read the pointer. That is, writing should focus on the validity of the pointer itself whereas reading should focus on the validity of the data the pointer points to. Seeing that written out it seems somewhat obvious, but this is a newsflash to me lol. I'll reread the pointer docs because I don't remember them talking about this and I think it'd be valuable to mention.

RalfJung commented 2 years ago

I am saying that writing uninit bytes into (parts of) a MaybeUninit is definitely allowed by its safety invariant. Any code that assumes otherwise is wrong. This needs to be documented in the English text that accompanies MaybeUninit (probably our existing docs could be improved), but I claim that having as_bytes also serves as part of that documentation.

Removing as_bytes would not change the safety invariant, and people could still soundly do the exact same thing that as_bytes does. So the only thing we achieve by removing as_bytes is making it less obvious that something like as_bytes is sound for MaybeUninit.

Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant. For once, that would be a breaking change. Second, given that MaybeUninit::uninit() is safe and can be assigned into any MaybeUninit, I don't think there even is a choice here. Which safety invariant are you suggesting, concretely?

What you say about raw pointers make sense. It's still worth being careful on writes though, since the read might be unsuspecting safe code reading through a reference.

SUPERCILEX commented 2 years ago

I am saying that writing uninit bytes into (parts of) a MaybeUninit is definitely allowed by its safety invariant.

Gotya, that makes sense.

Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant.

I was more advocating for forcing people to document why their writes meet a type's invariants, but yeah that doesn't make much sense when you're writing uninit bytes or zeros.

since the read might be unsuspecting safe code reading through a reference.

I thought this wasn't allowed? The pointer docs say this:

The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory.

That makes it sound like you need to make a new reference after using a raw pointer. Or is it just saying you have to stack your usage (so create and drop the pointer before accessing the original reference again)?

RalfJung commented 2 years ago

I thought this wasn't allowed? The pointer docs say this:

You can always do something like

fn foo(x: &mut NonZeroI32)  {
  let ptr = x as *mut NonZeroI32;
  unsafe { ptr.cast::<i32>().write(0); } // no UB here
  let _val = *x; // but UB here!
}

So yes as long as it's properly nested, the ptr access is fine, and even the write itself is fine here, but the read can occur in safe code.

SUPERCILEX commented 2 years ago

Makes sense, thank you!

TODO(me):

Clarify ptr docs about stacked borrows.
Add section to ptr docs talking about write vs read invariants, noting reference interactions.
Update this issue to keep as_bytes and add docs with reminder about invariants.

ClydeHobart commented 2 years ago

Unrelated to the uninit byte methods, but is it reasonable to include either

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub unsafe fn array_assume_init_ref(&self) -> &[T; N];
    pub unsafe fn array_assume_init_mut(&mut self) -> &mut [T; N];
}

or

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub fn transpose_ref(&self) -> &MaybeUninit<[T; N]>;
    pub fn transpose_mut(&mut self) -> &mut MaybeUninit<[T; N]>;
}

as part of this proposal? The copy produced by transpose() could be expensive if the array is exceedingly large. The slice methods still exist, but that loses array size information which could be useful.

Edited for tone.

SUPERCILEX commented 1 year ago

@ClydeHobart I'm not quite sure I understand the use case of those methods.

The copy produced by transpose() could be expensive if the array is exceedingly large.

Are you talking about the copy due to transpose taking in a self as opposed to a &self? If so, that shouldn't be a concern as long as transpose is inlined (which should basically always happen, I'd consider it a bug if not). We could potentially add an llvm ir test to enforce this.

Everyone else: ok, I've updated the issue with the latest proposals. I'm pretty happy with everything (minus some needed bikeshedding) except for the UNINIT const which I'm still unsure about. What are the next steps?

SUPERCILEX commented 1 year ago

From @scottmcm: https://github.com/rust-lang/rust/pull/104475#issuecomment-1320814502

Under the following assumptions:

Inline const will be stabilized (or we're fine telling people they need to use that feature)
The slice methods need to stick around (or can somehow become part of the field projection rfc)

Then I think the new proposal would be the same as what we have right now except:

Keep array_assume_init (but as an inherent method aka Scott's PR)
Delete the transpose methods
Don't add an UNINIT const

Then people will be expected to create arrays the normal way and can therefore access individual elements. To perform array-wise operations, they will convert the array to a slice and use the slice methods.

Thoughts?

m-ou-se commented 1 year ago

This pattern where we want some functionality on some type X but also on [X] seems to occur in various places. For example, AtomicU8::get_mut() goes from a &mut AtomicU8 to an &mut u8, but we also want to do that for &mut [AtomicU8] to &mut [u8] (for which we currently have the unstable get_mut_slice). Similarly for Cell, where we have a 'transpose' method named as_slice_of_cells (in only one direction).

Adding functionality using on [X] as Self (like impl<T> [MaybeUninit<T>] { … }) has some problems. The methods will appear on the documentation of [], and tools like auto completion will suggest them as methods on []. Not using self can also be annoying, having to spell out e.g. AtomicU8::get_mut_slice. And impl [MyType] { … } is not allowed outside of core, and therefore should probably be avoided inside core as much as possible.

I'm hoping we might be able to find an ergonomic solution that works for more than just MaybeUninit. Maybe just an API convention that we consistently use, but I think we're going to need some new language feature. (Perhaps something related to https://github.com/rust-lang/rfcs/pull/3318?)

SUPERCILEX commented 1 year ago

I think you're right: we shouldn't be special casing MaybeUninit.

So that leaves two things from this ACP:

Remove slice_as_ptr. The proposed solution adds a conversion to unwrap a MaybeUninit pointer into just a T pointer. I doubt whatever general container solution we come up with would be able to handle this, so I think it's still relevant.
Make slice_as_bytes inherent. Same as above, this is a transmute, so probs not handled by the solution we come up with.

I'm just speculating here, so I'd also be ok with closing everything and saying we'll wait to find out what the general solution looks like.

nuttycom commented 1 year ago

Also, transpose is a rather undescriptive name. So I am not very happy about this MaybeUninit::uninit().transpose()... I feel like the name should indicate somehow that this is about arrays/slices.

(I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.)

IMO transpose is a better name than the operation that it specializes usually gets: https://hackage.haskell.org/package/base-4.18.0.0/docs/Prelude.html#v:sequenceA

SUPERCILEX commented 1 year ago

Adding functionality using on [X] as Self (like impl [MaybeUninit] { … }) has some problems. The methods will appear on the documentation of [], and tools like auto completion will suggest them as methods on []. Not using self can also be annoying, having to spell out e.g. AtomicU8::get_mut_slice. And impl [MyType] { … } is not allowed outside of core, and therefore should probably be avoided inside core as much as possible.

I think we can strike a good balance by making the important methods inherent. I think that means slice_assume_init (and mut) should be inherent while the others can remain static methods. This will hit the most common use cases and make the assume_init API look consistent. The copy slice methods are arguably like ptr::copy_nonoverlapping so that's consistent.

Thoughts?

SUPERCILEX commented 1 year ago

https://github.com/rust-lang/rust/pull/103131/files shows just how common the assume_init methods are and they end up being much more concise.

Amanieu commented 1 month ago

Could this ACP be updated to reflect the changes that have already been made to nightly and what it is still proposing to do? For example, transpose and copy_from_slice are already available.

rust-lang / libs-team

Revamp unstable MaybeUninit APIs #122

Goals

Problematic APIs

`MaybeUninit::{uninit_array,array_assume_init}`

Proposal

Alternatives

Implement index traits on `MaybeUninit<[T; N]>`

`slice_assume_init` variants

Proposal

`as_bytes` variants

Proposal

`slice_as_ptr` variants

Proposal

`write_slice` variants

Proposal

`uninit`

Proposal

Drawbacks

API diff

rust-lang / libs-team

Revamp unstable MaybeUninit APIs #122

Goals

Problematic APIs

MaybeUninit::{uninit_array,array_assume_init}

Proposal

Alternatives

Implement index traits on MaybeUninit<[T; N]>

slice_assume_init variants

Proposal

as_bytes variants

Proposal

slice_as_ptr variants

Proposal

write_slice variants

Proposal

uninit

Proposal

Drawbacks

API diff

`MaybeUninit::{uninit_array,array_assume_init}`

Implement index traits on `MaybeUninit<[T; N]>`

`slice_assume_init` variants

`as_bytes` variants

`slice_as_ptr` variants

`write_slice` variants

`uninit`