Tracking Issue for MaybeUninit methods for arrays

clarfonthey commented 2 years ago

This is a meta-tracking issue for multiple APIs that are linked across multiple issues. Right now it only includes two methods, but since there seems to be a desire to add more, this issue can be used as a placeholder for those discussions until those methods are added.

Public API

impl<T> MaybeUninit<T> {
    pub const fn uninit_array<const N: usize>() -> [Self; N];
    pub const fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N];
}

impl<T, const N: usize> MaybeUninit<[T; N]> {
    pub fn transpose(self) -> [MaybeUninit<T>; N];
}

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

Steps / History

[x] #65580
[x] #80600
[x] Make const: #96099
[x] #102023
[ ] Final comment period (FCP)
[ ] Stabilization PR

Relevant Links

Old tracking issue for array_assume_init: #80908

Unresolved Questions

Should MaybeUninit::uninit_array::<LEN>() be stabilised if it can be replaced by [const { MaybeUninit::uninit() }; LEN] ?
What other APIs should be added for arrays?
Is array_assume_init the right pattern, or should we convert from [MaybeUninit<T>, N] back to MaybeUninit<[T; N]> first?

joboet commented 2 years ago

I'm a bit concerned about the size and inconsistency of this API, as the corresponding functions for single values are all methods, while for arrays and slices we have associated functions. For example, array_assume_init fundamentally does the same thing as assume_init, but is called in a different way, with a different name. If the API was complete (zeroed_array is currently missing, among others), we would have 8 or so functions for arrays alone (currently only two, but a lot of functionality still unnecessarily requires unsafe).

I would rather prefer a trait-based API like this:

unsafe trait Uninitialized {
    type Initialized: ?Sized;

    fn uninit() -> Self
    where Self: Sized;
    fn zeroed() -> Self
    where Self: Sized;

    unsafe fn assume_init(self) -> Self::Initialized
    where Self: Sized;
    unsafe fn assume_init_ref(&self) -> &Self::Initialized;

    ...
}

unsafe impl<T> Uninitialized for MaybeUninit {...}
unsafe impl<T, const N: usize> Uninitialized for [T; N] {...}
unsafe impl<T> Uninitialized for [T] {...}

or with a marker trait like I proposed here.

Noratrieb commented 2 years ago

I don't think it's worth making a trait for this. I've had many use cases for uninit_array before, and none for a generic solution. MaybeUninit is often used with arrays, since that's where initialization tends to be the most expensive.

While it is a trivial function to write yourself, I do think that it's worth it to stabilize maybe_uninit_uninit_array. It's a very common use case, and the alternative is to write am unsettling MaybeUninit::uninit ().assume_init() which is not very nice and makes it harder to audit the code.

clarfonthey commented 1 year ago

Another thing worth asking: why not have methods directly implemented on arrays? You'd still need some form of uninit_array, but instead of MaybeUninit::array_assume_init(arr), you'd just do arr.assume_init(), and the implementation is on [MaybeUninit<T>; N].

This seems possible specifically because the types exist in core, but open to other interpretations.

I also suggested perhaps just adding Index and IndexMut implementations for MaybeUninit<[T]> that return MaybeUninit<T> and MaybeUninit<[T]> references, then removing the transposed versions of [MaybeUninit<T>] methods.

Of course, the weirdness of MaybeUninit<[T]> is that the length of the slice isn't uninit, just the data in the slice itself, since the pointer metadata is totally separate. But I feel like this is only a weird quirk that quickly becomes natural after a short explanation. Plus, this applies already to other DSTs like dyn Trait, even though a lot of methods require Sized. In principle you could coerce &mut MaybeUninit<T> into &mut MaybeUninit<dyn Trait> if it were more idiomatic and that would make sense as a type, even though you couldn't do much with it.

WaffleLapkin commented 1 year ago

@clarfonthey with the current implementation, MaybeUninit<[T]> is not possible at all, MaybeUninit requires T: Sized.

SUPERCILEX commented 1 year ago

Docs patch extracted from #102023:

Subject: [PATCH] Add MaybeUninit array transpose impls

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
---
Index: library/core/src/mem/maybe_uninit.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/library/core/src/mem/maybe_uninit.rs b/library/core/src/mem/maybe_uninit.rs
--- a/library/core/src/mem/maybe_uninit.rs  (revision 8147e6e427a1b3c4aedcd9fd85bd457888f80972)
+++ b/library/core/src/mem/maybe_uninit.rs  (date 1665873934720)
@@ -117,15 +117,12 @@
 /// `MaybeUninit<T>` can be used to initialize a large array element-by-element:
 ///
 /// ```
-/// use std::mem::{self, MaybeUninit};
+/// use std::mem::MaybeUninit;
 ///
 /// let data = {
-///     // Create an uninitialized array of `MaybeUninit`. The `assume_init` is
-///     // safe because the type we are claiming to have initialized here is a
-///     // bunch of `MaybeUninit`s, which do not require initialization.
-///     let mut data: [MaybeUninit<Vec<u32>>; 1000] = unsafe {
-///         MaybeUninit::uninit().assume_init()
-///     };
+///     // Create an uninitialized array of `MaybeUninit`.
+///     let mut data: [MaybeUninit<Vec<u32>>; 1000] = MaybeUninit::uninit().transpose();
 ///
 ///     // Dropping a `MaybeUninit` does nothing, so if there is a panic during this loop,
 ///     // we have a memory leak, but there is no memory safety issue.
@@ -133,25 +130,23 @@
 ///         elem.write(vec![42]);
 ///     }
 ///
-///     // Everything is initialized. Transmute the array to the
-///     // initialized type.
-///     unsafe { mem::transmute::<_, [Vec<u32>; 1000]>(data) }
+///     // Everything is initialized. Convert the array to the initialized type.
+///     unsafe { MaybeUninit::<[Vec<_>; 1000]>::assume_init(data.transpose()) }
 /// };
 ///
-/// assert_eq!(&data[0], &[42]);
+/// assert_eq!(&data[42], &[42]);
 /// ```
 ///
 /// You can also work with partially initialized arrays, which could
 /// be found in low-level datastructures.
 ///
 /// ```
 /// use std::mem::MaybeUninit;
 /// use std::ptr;
 ///
-/// // Create an uninitialized array of `MaybeUninit`. The `assume_init` is
-/// // safe because the type we are claiming to have initialized here is a
-/// // bunch of `MaybeUninit`s, which do not require initialization.
-/// let mut data: [MaybeUninit<String>; 1000] = unsafe { MaybeUninit::uninit().assume_init() };
+/// // Create an uninitialized array of `MaybeUninit`.
+/// let mut data: [MaybeUninit<String>; 1000] = MaybeUninit::uninit().transpose();
 /// // Count the number of elements we have assigned.
 /// let mut data_len: usize = 0;
 ///

SimonSapin commented 1 year ago

Should MaybeUninit::uninit_array::<LEN>() be stabilised if it can be replaced by [const { MaybeUninit::uninit() }; LEN] ?

I’d like to argue for: yes. Stabilizing now can allow users to remove some unsafe blocks soon, whereas we don’t know how long it’s gonna be until const {…} blocks are stable and powerful enough for this use case. If and when they eventually are, we’d end up with two ways of doing the same thing but that’s not really harmful. At that point we can soft-deprecate the constructor by pointing out the new pattern in its docs. (Emitting an actual deprecation warning will not worth be the churn IMO but will be a possibility.)

What other APIs should be added for arrays?

New proposals can be made at any time. The methods already in Nightly don’t need to be blocked on exhausting the design space for other thematically-related methods.

nbdd0121 commented 1 year ago

What's the rationale behind separate feature flag for const? Is there any reason that, if we stabilise this function, we want it to be non-const?

I think we should merge the feature flags and stabilise it in one go.

safinaskar commented 1 year ago

I don't like name of this method. transpose should be reserved for actual matrix transpose

andylizi commented 1 year ago

I don't like name of this method. transpose should be reserved for actual matrix transpose

That ship has already sailed with Option::transpose and Result::transpose. In this case, I feel it's somewhat unlikely that people doing high-level matrix operations would need to use the low-level MaybeUninit at the same time.

Here's the original discussion for transpose naming options: https://github.com/rust-lang/rust/issues/47338#issuecomment-450529909

AlexTMjugador commented 1 year ago

Inline const expressions are said to render this method less necessary, but such expressions aren't even necessary for Copy types on stable Rust today:

use std::mem::{MaybeUninit, transmute};

fn main() {
    let mut arr = [MaybeUninit::uninit(); 4];

    for i in 0..arr.len() {
        unsafe { *arr[i].as_mut_ptr() = i as u8; }
    }

    // Prints "[0, 1, 2, 3]"
    println!("{:?}", unsafe { transmute::<_, [u8; 4]>(arr) });
}

What's then the point of this method for these cases? Does it have any codegen advantage? I checked on godbolt.org and the generated assembly for this code didn't initialize the array elements. (Edit: even after tweaking this code to initialize the array positions to std::env::args().count() so that the compiler could not get too smart I couldn't get it to emit initialization code for the array positions.)

Edit: given that for non-Copy types you would need to resort to nightly features anyway, I'd prefer to just use inline const expressions instead, but there is some digression on this opinion.

Edit 2: actually, it looks like even for non-Copy types you don't need to resort to nightly features when using the inline-const crate.

Lokathor commented 1 year ago

Because it labels the action you're intending to do, without the reader having to puzzle out "oh, this time the transmute was to initialize the data within the type system".

Exactly like f32::to_bits and Option::unwrap and dozens of other small helper functions. Putting the name on the action is the value, not because there's better codegen than you could write yourself.

AlexTMjugador commented 1 year ago

Thanks - I should clarify that I was referring to uninit_array in my comment, but I indeed see some readability advantages for adding array_assume_init.

Lokathor commented 1 year ago

Ah, with uninit_array there is a small but temporary benefit other than the "name the action" argument. SimonSapin mentioned it a little bit up in the thread: Currently the array size of an output array can be inferred but the length of an array expression cannot.

clarfonthey commented 1 year ago

Another thing which is arguably inconsequential is that the intent is a bit more clear: even though I know that the compiler isn't really copying the same uninitialised data N times into the array, it feels better to explicitly say "I want an uninitialised array" rather than "I want to initialise an array with each element uninitialised."

This is actually a reason why I argue that MaybeUninit<[T; N]> being easier to use feels more natural, although due to concerns with how MaybeUninit<[T]> would work (it doesn't at all right now), this isn't really being pursued.

SimonSapin commented 1 year ago

I believe [MaybeUninit::uninit(); 4] didn't work yet when adding uninit_array, it wasn't about "more explicit".

RalfJung commented 1 year ago

It always worked for Copy types. const blocks are needed for non-Copy types.

Jules-Bertholet commented 9 months ago

Does array_assume_init add much value over using map? Godbolt says they are optimized identically.

As for uninit_array, making MaybeUninit unconditionally Copy (or Default) would address that, though that would have its own issues (#62835).

SUPERCILEX commented 9 months ago

Ah, that's quite clean! Can't believe I didn't think of that. I haven't checked, but assuming this works in all scenarios, I'd be in favor of killing the methods in this tracking issue and the transpose methods I added in favor of const initialization (which needs to be stabilized first) and mapping initialization.

Jules-Bertholet commented 9 months ago

Just realized, uninit_array can also be written with array::from_fn.

Arnavion commented 9 months ago

Godbolt says they are optimized identically.

Replacing u64 with a struct (a more realistic thing to use with MaybeUninit makes that false: https://rust.godbolt.org/z/o6ecsxjGa

SUPERCILEX commented 9 months ago

Godbolt says they are optimized identically.

Replacing u64 with a struct (a more realistic thing to use with MaybeUninit makes that false: rust.godbolt.org/z/o6ecsxjGa

Dang, that's disappointing.

mcronce commented 9 months ago

Just realized, uninit_array can also be written with array::from_fn.

I'm glad you posted this, it's much nicer than the [(); N].map(|_| MaybeUninit::uninit()) that I was just writing :joy:

rdrpenguin04 commented 8 months ago

I agree that it is possible to obtain an uninitialized array other ways, but uninit_array is the most obvious way to do it, and it's the cleanest solution I've found too. Aside from that, what's blocking stabilization?

nazar-pc commented 6 months ago

I am interested in <&MaybeUninit<[T; N]>>::transpose() and <&mut MaybeUninit<[T; N]>>::transpose(). Specifically this is helpful when allocating Box<[T; N]> where [T; N] is too large to fit on the stack, it would make it possible to use with iterators.

Would it be considered related to this issue?

dtolnay commented 3 months ago

@rust-lang/libs-api: @rfcbot fcp close

(Only in regard to MaybeUninit::uninit_array, not the other unstable APIs still tracked by this issue.)

I propose that we accept https://github.com/rust-lang/rust/pull/125082 to remove MaybeUninit::uninit_array() in favor of having callers use [MaybeUninit::uninit(); N] and [const { MaybeUninit::uninit() }; N]. (The const block is required for T: !Copy.)

When uninit_array was originally introduced, it was useful because it was a safe wrapper around an unsafe implementation: unsafe { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() }. The best equivalent safe way to express what this does used to be significantly more verbose:

fn workaround<T, const N: usize>() -> [MaybeUninit<T>; N] {
    trait Workaround: Sized {
        const UNINIT: MaybeUninit<Self>;
    }
    impl<T> Workaround for T {
        const UNINIT: MaybeUninit<Self> = MaybeUninit::uninit();
    }
    [<T as Workaround>::UNINIT; N]
}

These days, with const {…} expressions stabilizing in Rust 1.79 (https://github.com/rust-lang/rust/pull/104087), the justification for a dedicated uninit_array is weaker and limited to convenience and discoverability.

My opinion aligns with @kpreid's characterization in the PR description:

The only remaining question is whether it is an important enough convenience to keep it around.

I believe it is net good to remove this function, on the principle that it is better to compose two orthogonal features (MaybeUninit and array construction) than to have a specific function for the specific combination, now that that is possible.

The counter perspective is the one in https://github.com/rust-lang/rust/pull/125082#issuecomment-2108313867.

I still prefer MaybeUninit::uninit_array() to [const { MaybeUninit::uninit() }; N], it's shorter and more readable IMHO.

I do not dispute that it is 25% fewer characters.

For the common case of copyable contents like an uninit u8 buffer, [MaybeUninit::uninit(); N] is even shorter than that.

Regarding readability, the comparison assumes one has read and retained this part of https://doc.rust-lang.org/std/mem/union.MaybeUninit.html, which is not free. Superseding a part of the extensive, ad-hoc API of MaybeUninit with better composable semantics is probably good for readability.

For discoverability, I'd expect [MaybeUninit::uninit(); N] is more discoverable than MaybeUninit::uninit_array(). From there, a compiler diagnostic can hint to add const if dealing with a non-Copy element type.

dtolnay commented 3 months ago

@rfcbot fcp close

rfcbot commented 3 months ago

Team member @dtolnay has proposed to close this. The next step is review by the rest of the tagged team members:

[x] @Amanieu
[ ] @BurntSushi
[x] @dtolnay
[x] @joshtriplett
[ ] @m-ou-se

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

dtolnay commented 2 months ago

I realized I'll need to file a compiler diagnostics bug. Currently on nightly:

use std::mem::MaybeUninit;

fn main() {
    let _: [MaybeUninit<String>; 2] = [MaybeUninit::uninit(); 2];
}

error[E0277]: the trait bound `String: Copy` is not satisfied
 --> src/main.rs:4:40
  |
4 |     let _: [MaybeUninit<String>; 2] = [MaybeUninit::uninit(); 2];
  |                                        ^^^^^^^^^^^^^^^^^^^^^ the trait `Copy` is not implemented for `String`, which is required by `MaybeUninit<String>: Copy`
  |
  = note: required for `MaybeUninit<String>` to implement `Copy`
  = note: the `Copy` trait is required because this value will be copied for each element of the array
  = help: create an inline `const` block, see RFC #2920 <https://github.com/rust-lang/rfcs/pull/2920> for more information
help: consider creating a new `const` item and initializing it with the result of the function call to be used in the repeat position
  |
4 ~     const ARRAY_REPEAT_VALUE: MaybeUninit<String> = MaybeUninit::uninit();
5 ~     let _: [MaybeUninit<String>; 2] = [ARRAY_REPEAT_VALUE; 2];
  |

We should change this to suggest [const { MaybeUninit::uninit() }; 2], instead of the current suggestion which is:

const ARRAY_REPEAT_VALUE: MaybeUninit<String> = MaybeUninit::uninit();
let _: [MaybeUninit<String>; 2] = [ARRAY_REPEAT_VALUE; 2];

rfcbot commented 2 months ago

:bell: This is now entering its final comment period, as per the review above. :bell:

knickish commented 2 months ago

Is the intention that a new tracking issue be opened for the remaining methods?

kpreid commented 2 months ago

A FCP to close doesn't mean that the issue has to be actually closed after it completes. dtolnay wrote

(Only in regard to MaybeUninit::uninit_array, not the other unstable APIs still tracked by this issue.)

rfcbot commented 2 months ago

The final comment period, with a disposition to close, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

Ciel-MC commented 1 month ago

This should be closed since inline const blocks has already landed in stable right? Or is that only part of this issue? Just going thru MaybeUninit and seeing the new uninit array method is still on nightly

RalfJung commented 1 month ago

See here.

uninit_array should be removed, but the issue still tracks other methods.

rust-lang / rust