Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type"

Centril commented 6 years ago

NEW TRACKING ISSUE = https://github.com/rust-lang/rust/issues/63566

This is a tracking issue for the RFC "Deprecate uninitialized in favor of a new MaybeUninit type" (rust-lang/rfcs#1892).

Steps:

[x] Implement the RFC (cc @rust-lang/libs)
[x] Adjust documentation (in https://github.com/rust-lang/rust/pull/60445)
[x] Stabilization PR (in https://github.com/rust-lang/rust/pull/60445)

Unresolved questions:

~~Should we have a safe setter that returns an &mut T?~~
~~Should we rename MaybeUninit? (https://github.com/rust-lang/rust/pull/56138)~~
~~Should we rename into_inner? Should it be more like take instead and take &mut self?~~
~~Should MaybeUninit<T> be Copy for T: Copy?~~
Should we allow calling get_ref and get_mut (but not reading from the returned references) before data got initialized? (AKA: "Are references to uninitialized data insta-UB, or only UB when being read from?") Should we rename it similar to into_inner?
~~Can we make into_inner (or whatever it ends up being called) panic when T is uninhabited, like mem::uninitialized does currently?~~ (done)
Seems like we want to not deprecate mem::zeroed. We should however remember to also update its documentation together with MaybeUninit, make sure people are aware that this is insta-UB if all-0-bits does not satisfy the type's validity invariant.

eddyb commented 6 years ago

cc @RalfJung

japaric commented 6 years ago

[ ] Implement the RFC

I can help implement the RFC.

RalfJung commented 6 years ago

Awesome, I can help reviewing :)

japaric commented 6 years ago

I'd like some clarification on this part of the RFC:

Make calling uninitialized on an empty type trigger a runtime panic which also prints the deprecation message.

Should only mem::uninitialized::<!>() panic? Or should this also cover structs (and maybe enums?) that contain the empty type (e.g. (!, u8))?

RalfJung commented 6 years ago

AFAIK we only do the really harmful code generation for !. Most other uses of mem::uninitialized are just as incorrect, but the compiler does not happen to exploit them.

So I'd do it for ! only, but also for mem::zeroed. (I forgot to amend that part when I added zeroed to the RFC, it seems.)

eddyb commented 6 years ago

We could start off by making this: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/intrinsic.rs#L184-L198

check whether fn_ty.ret.layout.abi is Abi::Uninhabited and at the very least emit a trap, e.g.: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/mir/operand.rs#L400-L403

Once you've seen the trap (i.e. intrinsics::abort) in action, you can see if there's any nice way of triggering a panic. It'' be tricky because of unwinding, we'll need to special-case them here: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/mir/block.rs#L445-L447

To actually panic, you'd need something like this: https://github.com/rust-lang/rust/blob/8928de74394f320d1109da6731b12638a2167945/src/librustc_codegen_llvm/mir/block.rs#L360-L407 (you can ignore the EvalErrorKind::BoundsCheck arm)

japaric commented 6 years ago

@eddyb Thanks for the pointers.

I'm now fixing (several) deprecation warnings and I feel (very) tempted to just run sed -i s/mem::uninitialized()/mem::MaybeUninit::uninitialized().into_inner()/g but I guess that would miss the point ... Or is that OK if I know that the value is a concrete (Copy) type? e.g. let x: [u8; 1024] = mem::uninitialized();.

RalfJung commented 6 years ago

That would exactly miss the point, yeah.^^

At least for now, I would like to consider mem::MaybeUninit::uninitialized().into_inner() UB for all non-union types. Notice that Copy is certainly not sufficient; both bool and &'static i32 are Copy and your snippet is intended to be insta-UB for them. We may want an exception for "types where all bit patterns are okay" (integer types, essentially), but I would be opposed to making such an exception because undef is not a normal bit pattern. That's why the RFC says you need to fully initialize before calling into_inner.

It also says that for get_mut, but the RFC discussion brought up desired by some folks to relax the restriction here. That's an option I could live with. But not for into_inner.

I'm afraid all these uses of uninitialized will have to be more carefully reviewed, and in fact this was one of the intents of the RFC. We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner immediately then the RFC was worthless.

Centril commented 6 years ago

We'd like the wider ecosystem to be more careful here, if everyone just uses into_inner immediately then the RFC was worthless.

This gives me an idea... perhaps we should lint (group: "correctness") for this sort of code? cc @oli-obk

SimonSapin commented 6 years ago

I'm now fixing (several) deprecation warnings

We should only ship Nightly with those warnings once the recommended replacement is available at least on Stable. See similar discussion at https://github.com/rust-lang/rust/pull/52994#issuecomment-411413493

cramertj commented 6 years ago

@RalfJung

We may want an exception for "types where all bit patterns are okay" (integer types, essentially)

You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).

I would be opposed to making such an exception because undef is not a normal bit pattern.

Yeah, this is an aspect in which mem::zeroed() is significantly different from mem::uninitialized().

gnzlbg commented 6 years ago

@cramertj

You've participated in discussion about this before, but I'll post here to circulate more widely: this is already something we have many existing use-cases for in Fuchsia, and we have a trait for this (FromBytes) and a derive macro for these types. There was also an internals Pre-RFC for adding these to the standard library (cc @gnzlbg @joshlf).

Those discussions were about ways of allowing safe memcpys across types, but I think that's pretty much orthogonal to whether the memory being copied is initialized or not - if you put uninitialized memory in, you get uninitialized memory out.

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

IIRC, nobody there suggested or discussed any approach in which you could put uninitialized memory in and get initialized memory out, so I don't follow what those discussions have to do with this one. To me they are completely orthogonal.

joshlf commented 6 years ago

To drive the point home a bit more, LLVM defines uninitialized data as Poison, which is distinct from "some arbitrary but valid bit pattern." Branching based on a Poison value or using it to compute an address which is then dereferenced is UB. So, unfortunately, "types where all bit patterns are okay" are still not safe to construct because using them without separately initializing them will be UB.

cramertj commented 6 years ago

Right, sorry, I should have clarified what I meant. I was trying to say that "types where all bit patterns are okay" is already something that we're interested in defining for other reasons. Like @RalfJung said above,

I would be opposed to making such an exception because undef is not a normal bit pattern.

joshlf commented 6 years ago

Thank god there are people who can read, because apparently I can't...

RalfJung commented 6 years ago

Right, so what I meant to say is: We definitely have types where all initialized bit patterns are okay -- all the i* and u* types, raw pointers, I think f* as well and then tuples/structs only consisting of such types.

What is an open question is under which circumstances which of these types are allowed to be uninitialized, i.e., poison. My own preferred answer is "never".

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

Reading padding bytes as MaybeUninit<u8> should be fine.

gnzlbg commented 6 years ago

The consensus also was that it would be unsound for any approach discussed to allow reading padding bytes, which are a form of uninitialized memory, in safe Rust. That is if you put initialized memory in, you can't get uninitialized memory out.

Reading padding bytes as MaybeUninit should be fine.

The discussion in a nutshell was about providing a trait, Compatible<T>, with a safe method fn safe_transmute(self) -> T that "reinterprets"/"memcpys" the bits of self into a T. The guarantee of this method is that if self is properly initialized, so is the resulting T. It was proposed for the compiler to fill in transitive implementations automatically, e.g., if there is an impl Compatible<V> for U, and animpl Compatible<W> for V then there is an impl Compatible<W> for U (either because it was provided manually, or the compiler auto generates it - how this could be implemented was completely handwaved).

It was proposed that it should be unsafe to implement the trait: if you implement it for a T that has padding bytes where Self has fields, then everything is fine at least until you try to use the T and your program behavior ends up depending on the contents of the uninitialized memory.

I have no idea what any of this has to do with MaybeUninit<u8>, maybe you could elaborate on that?

The only thing I can imagine is that we could add a blanket impl: unsafe impl<T> Compatible<[MaybeUninit<u8>; size_of::<T>()]> for T { ... } since transmuting any type into a [MaybeUninit<u8>; N] of its size is safe for all types. I don't know how useful such an impl would be, given that MaybeUninit is an union, and whoever uses the [MaybeUninit<u8>; N] has no idea of whether a particular element of the array is initialized or not.

RalfJung commented 6 years ago

@gnzlbg back then you were talking about FromBits<T> for [u8]. That is where I say we have to use [MaybeUninit<u8>] instead.

joshlf commented 6 years ago

I discussed this proposal with @nikomatsakis at RustConf, and he encouraged me to go forward with an RFC. I was going to do it in a few weeks, but if there's interest, I can try getting one done this weekend. Would that be useful for this discussion?

RalfJung commented 6 years ago

@joshlf which proposal are you talking about?

gnzlbg commented 6 years ago

@RalfJung

@gnzlbg back then you were talking about FromBits for [u8]. That is where I say we have to use [MaybeUninit] instead.

Gotcha, fully agree here. Had completely forgotten that we also wanted to do that 😆

joshlf commented 6 years ago

@joshlf which proposal are you talking about?

A FromBits/IntoBits proposal. TLDR: T: FromBits<U> means that any bit pattern which is a valid U corresponds to a valid T. U: IntoBits<T> means the same thing. The compiler automatically infers both for all pairs of types given certain rules, and this unlocks lots of fun goodness that currently requires unsafe. There's a draft of this RFC here that I wrote a while back, but I intend to change large parts of it, so don't take that text as anything more than a rough guide.

RalfJung commented 6 years ago

@joshlf I think such a pair of traits would more build on top of this discussion than be part of it. AFAIK we have two open questions in terms of validity:

Does it recurse below references? I more and more strongly think it should not, as we see more examples. So likely we should adapt the MaybeUninit::get_mut docs accordingly (it is not actually UB to use that before completing initialization, but it is UB to dereference it before completing initialization). However, we first have to make that decision for validity, and I am not sure what the right venue is for that. Probably a dedicated RFC?
Does a u8 (and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit<u8>::uninitialized().into_inner() insta-UB? I think so, but mostly based on a gut feeling that we want to keep the places where we allow poison/undef to a minimum. However, I could be persuaded otherwise if there are plenty of uses of this pattern (and I hope to use miri to help determining this).

gnzlbg commented 6 years ago

Does it recurse below references?

@RalfJung can you show an example of what you mean with "recursing below references"?

Does a u8 (and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit::uninitialized().into_inner() insta-UB?

What happens if it isn't instant UB? What can I do with that value? Can I match on it? If so, is the program behavior deterministic?

I feel like if I can't match on the value without introducing UB, then we have re-invented mem::uninitialized. If I can match on the value and the same branch is always taken across all architectures, opt-levels, etc. we have re-invented mem::zeroed (and are kind of making the use of the MaybeUninit type a bit moot). If the program behavior isn't deterministic, and changes with optimization levels, across architectures, depending on external factors (like whether the OS gave the process zeroed pages), etc., then I feel like we would be introducing a huge footgun into the language.

joshlf commented 6 years ago

Does a u8 (and other integer types, floating point, raw pointer) have to be initialized, i.e., is MaybeUinit<u8>::uninitialized().into_inner() insta-UB? I think so, but mostly based on a gut feeling that we want to keep the places where we allow poison/undef to a minimum. However, I could be persuaded otherwise if there are plenty of uses of this pattern (and I hope to use miri to help determining this).

FWIW, two of the benefits of this not being UB are that a) it lines up with what LLVM does and, b) it allows more flexibility wrt optimizations. It also seems more consistent with your recent proposal for defining safety at use time, not at construction time.

What happens if it isn't instant UB? What can I do with that value? Can I match on it? If so, is the program behavior deterministic?

I feel like if I can't match on the value without introducing UB, then we have re-invented mem::uninitialized. If I can match on the value and the same branch is always taken across all architectures, opt-levels, etc. we have re-invented mem::zeroed (and are kind of making the use of the MaybeUninit type a bit moot). If the program behavior isn't deterministic, and changes with optimization levels, across architectures, depending on external factors (like whether the OS gave the process zeroed pages), etc., then I feel like we would be introducing a huge footgun into the language.

Why would you want to be able to match on something that's uninitialized? Defining it as UB to branch or index based on uninitialized values affords LLVM more room to optimize, so I don't think tying its hands more is a good idea, especially if there's not a compelling use case.

gnzlbg commented 6 years ago

Why would you want to be able to match on something that's uninitialized?

I didn't say I wanted to, I stated that if this cannot be done, I don't understand the difference between MaybeUinit<u8>::uninitialized().into_inner() and just mem::uninitialized().

RalfJung commented 6 years ago

@RalfJung can you show an example of what you mean with "recursing below references"?

Essentially, the question is whether we allow the following:

let mut b = MaybeUninit::<bool>::uninitialized();
let bref = b.get_mut(); // insta-UB?

If we decide that a reference is valid only if it points to something valid (that's what I mean by "recursing below references"), this code is UB.

What happens if it isn't instant UB? What can I do with that value? Can I match on it? If so, is the program behavior deterministic?

You cannot inspect an uninitialized u8 in any way. match can do many things, both binding names and actually testing for equality; the former is okay but the latter not. But you can write it back to memory.

Essentially, this is what miri currently implements.

I feel like if I can't match on the value without introducing UB, then we have re-invented mem::uninitialized.

Why that? The biggest problem with mem::uninitialized was around types that have restrictions for what their valid values are. We could decide that u8 has no such restrictions, so mem::uninitialized() was okay for u8. It was just almost impossible to use correctly in generic code, so it is better to entirely get rid of it. Either way, it is still not okay to pass an uninitialized u8 to safe code, but it might be okay to carefully use it in unsafe code.

You cannot "match" on a &mut pointing to invalid data either. IOW, I think that the bool example I gave above is fine, but the following is certainly not:

let mut b = MaybeUninit::<bool>::uninitialized();
let bref = b.get_mut();
match bref {
  &b => // insta-UB! We have a bad bool in scope.
}

This is using match to do a normal pointer dereference.

FWIW, two of the benefits of this not being UB are that a) it lines up with what LLVM does and, b) it allows more flexibility wrt optimizations. It also seems more consistent with your recent proposal for defining safety at use time, not at construction time.

Which optimizations would this allow? Notice that LLVM does optimizations on essentially untyped code, so none of this is a concern there. We are talking only about MIR optimizations here.

I am essentially coming form the perspective that we should allow as little as possible until we have a clear use. We can always allow more stuff later, but not the other way around. That said, some good uses of byte slices that can old any data have come up recently, which might be enough of an argument to do this at least for u* and i*.

gnzlbg commented 6 years ago

If we decide that a reference is valid only if it points to something valid (that's what I mean by "recursing below references"), this code is UB.

Gotcha.

The biggest problem with mem::uninitialized was around types that have restrictions for what their valid values are.

mem::uninitialized also has the problem that you pointed out above: that creating a reference to an uninitialized value might be undefined behavior (or not). So is the following UB?

let mut b = MaybeUninit::<u8>::uninitialized().into_inner();
let bref = &mut b; // Insta UB ?

I thought that one of the reasons for introducing MaybeUninit was to avoid this problem by always having the union initialized (e.g. to unit), which allows you to take a reference to it, and mutate its contents, by e.g. setting the active field to the u8 and giving it a value via ptr::write without introducing UB.

So this is why I am a bit confused. I don't see how into_inner is any better than:

let mut b: u8 = uninitialized();
let bref = &mut b; // Insta UB ?

Both look like undefined behavior time bombs to me.

joshlf commented 6 years ago

Which optimizations would this allow? Notice that LLVM does optimizations on essentially untyped code, so none of this is a concern there. We are talking only about MIR optimizations here.

If we say that undefined memory has some value, and thus you're allowed to branch on it according to the Rust semantics, then we can't lower it to LLVM's version of undefined, because it'd be unsound.

I am essentially coming form the perspective that we should allow as little as possible until we have a clear use. We can always allow more stuff later, but not the other way around.

That's fair.

That said, some good uses of byte slices that can old any data have come up recently, which might be enough of an argument to do this at least for u* and i*.

Do any of these use cases include having byte slices which hold uninitialized values?

sfackler commented 6 years ago

One place that an uninitialized-but-not-poison &mut [u8] could be valuable is for Read::read - we'd like to be able to avoid needing to zero the buffer just because some weird Read impl could read out of it rather than just writing into it.

joshlf commented 6 years ago

One place that an uninitialized-but-not-poison &mut [u8] could be valuable is for Read::read - we'd like to be able to avoid needing to zero the buffer just because some weird Read impl could read out of it rather than just writing into it.

I see, so the idea is that MaybeUninit would represent a type which is initialized, but with undefined contents, while other types of uninitialized data (e.g., padding fields) would still be fully uninitialized in the LLVM poison sense?

sfackler commented 6 years ago

I don't think it'd need to apply to MaybeUninit generally. There could in theory be some API to "freeze" the contents from undefined to defined-but-arbitrary.

RalfJung commented 6 years ago

If we say that undefined memory has some value, and thus you're allowed to branch on it according to the Rust semantics, then we can't lower it to LLVM's version of undefined, because it'd be unsound.

That was never the proposal. It is and will remain UB to branch on poison.

The question is whether it is UB to merely "have" a poison in a local u8.

Do any of these use cases include having byte slices which hold uninitialized values?

Slices are like references, so &mut [u8] of uninitialized data is fine as long as it is only written into (assuming that is the solution we take for reference validity).

@sfackler

One place that an uninitialized-but-not-poison &mut [u8] could be valuable is for Read::read - we'd like to be able to avoid needing to zero the buffer just because some weird Read impl could read out of it rather than just writing into it.

Well, without &out you will only ever be able to do that if you know the impl. The question is not whether safe code has to handle poison in u8 (it does not, that is not an okay use of safe code!), the question is whether unsafe code may carefully handle it this way. (See that blog post that I wanted to write today about the distinction between safety invariants and validity invariants...)

Kixunil commented 6 years ago

Maybe I'm late, but I'd suggest changing signature of set() method to return &mut T. This way, it'd be safe to write completely safe code working with MaybeUninit (at least in some situations).

fn init(dest: &mut MaybeUninit<u8>) -> &mut u8 {
    dest.set(produce_value())
}

This is practically a static guarantee that init() will either initialize the value or diverge. (If it tried to return something else, the lifetime would be wrong and &'static mut u8 is impossible in safe code.) Maybe it could be used as a part of placer API in the future.

RalfJung commented 6 years ago

@Kixunil It has been that way before, and I agree it is nice. I just find the same set confusing for a function that returns something.

comex commented 6 years ago

@Kixunil

This is practically a static guarantee that init() will either initialize the value or diverge. (If it tried to return something else, the lifetime would be wrong and &'static mut u8 is impossible in safe code.)

Not quite; you can get one with Box::leak.

In a codebase I wrote recently, I came up with a similar scheme; it's a bit more complicated, but does provide a true static guarantee that the provided reference was initialized. Instead of

fn init(dest: &mut MaybeUninit<u8>) -> &mut u8

I have

fn init<'a>(dest: Uninitialized<'a, u8>) -> DidInit<'a, u8>

The trick is that Uninitialized and DidInit are both invariant on their lifetime parameters, so there's no way to reuse a DidInit with a different lifetime parameter, even e.g. 'static.

DidInit impls Deref and DerefMut, so safe code can use it as a reference, like in your example. But the guarantee that it was actually the original passed-in reference that got initialized, not some random other reference, is helpful for unsafe code. It means you can define initializers structurally:

struct Foo {
    a: i32,
    b: u8,
}

fn init_foo<'a>(dest: Uninitialized<'a, Foo>,
                init_a: impl for<'x> FnOnce(Uninitialized<'x, i32>) -> DidInit<'x, i32>,
                init_b: impl for<'x> FnOnce(Uninitialized<'x, u8>) -> DidInit<'x, u8>)
                -> &'a mut DidInit<'a, Foo> {
    let ptr: *mut Foo = dest.ptr;
    unsafe {
        init_a(Uninitialized::new(&mut (*ptr).a));
        init_b(Uninitialized::new(&mut (*ptr).b));
        dest.did_init()
    }
}

This function initializes a pointer to struct Foo by initializing each of its fields in turn, using the user-provided initialization callbacks. It requires that the callbacks return DidInits, but doesn't care about their values; the fact that they exist is enough. Once all fields have been initialized, it knows that the entire Foo is valid – so it calls did_init() on the Uninitialized<'a, Foo>, which is an unsafe method that just casts it to the corresponding DidInit type, which init_foo then returns.

I also have a macro that automates the process of writing such functions, and the real version is a bit more careful about destructors and panics (though it needs improvement).

Anyway, I wonder if something like this could be implemented in the standard library.

Playground link

(Note: DidInit<'a, T> is actually a type alias for &'a mut _DidInitMarker<'a, T>, to avoid lifetime issues with DerefMut.)

comex commented 6 years ago

By the way, whereas the above-linked approach ignores destructors, a slightly different approach would be to make DidInit<‘a, T> responsible for running T’s destructor. In this case it would have to be a struct, not an alias; and it could only hand out references to T that live as long as the DidInit itself, not for all of ’a (since otherwise you could continue accessing it after destruction).

cramertj commented 6 years ago

+1 for including a method to give the behavior I had previously asked for in set, but I'm fine with it being available via another name.

RalfJung commented 6 years ago

Any good ideas for what that name could be? set_and_as_mut?^^

Havvy commented 6 years ago

set_and_borrow_mut?

cramertj commented 6 years ago

insert/insert_mut? The Entry type has a somewhat similar or_insert method (but OccupiedEntry also has insert which returns the old value, so that's not similar at all).

Is there a really compelling reason for having two separate methods? It seems simple enough to ignore the return value, and I'd imagine the function would be marked as #[inline] so I wouldn't expect any real runtime cost.

RalfJung commented 6 years ago

Is there a really compelling reason for having two separate methods? It seems simple enough to ignore the return value

I guess the only reason is that seeing set return something is rather surprising.

Pzixel commented 6 years ago

Maybe I'm missing something, but what could save us from having invalid value? I mean if we

let mut foo: MaybeUninit<T> = MaybeUninit {
    uninit: (),
};
let mut foo_ref = &mut foo as *mut MaybeUninit<T>;

unsafe {
    some_native_function(&mut (*foo_ref).value, val);
}

what if some_native_function is no-op and doesn't actually init the value? Is it still UB? How could it be handled?

RalfJung commented 6 years ago

@Pzixel this is all covered by the API documentation for MaybeUninit.

If some_native_function is a NOP, nothing happens; if you then later use foo_ref.value (or rather do foo_ref.as_mut() as you can only use the public API), that is UB because the function may only be called once everything is initialized.

MaybeUninit does not prevent having invalid values -- if it could, it would be safe, but that's not possible. However, it makes working with invalid values less of a footgun because now the information that the value might be invalid is encoded in the type, for both the compiler and the programmer to see.

jethrogb commented 6 years ago

I wanted to document an IRC conversation I had with @sfackler regarding a hypothetical issue that could arise in the future.

The main question is whether mem::zeroed is a valid in-memory representation for the current implementation proposal for MaybeUninit<NonZeroU8>. In my thinking, in the “uninit” state the value is only padding, which the compiler can use for any purpose, and in the "value" state, all possible values except mem::zeroed are valid (because of NonZero).

A future type layout system with more advanced enum discriminant packing (than we have now) might then store a discriminant in the padding of the "uninit" state/zeroed memory in the "value" state. In that hypothetical system, the size of Option<MaybeUninit<NonZeroU8>> is 1, whereas it currently is 2. Furthermore, in that hypothetical system, Some(MaybeUninit::uninitialized()) would be indistinguishable from None. I think we can probably fix this by changing the implementation of MaybeUninit (but not its public API) once we do move to such a system.

RalfJung commented 6 years ago

I see no difference between NonZeroU8 and &'static i32 in this regard. Both of these are types where "0" is not valid. So for both of these, MaybeUninit<T>::zeroed().into_inner() is insta-UB.

Whether Option<Union> can do layout optimizations depends on what validity for a union is. This is not decided yet for all cases, but there is general agreement that for unions that have a variant of type (), any bit-pattern is valid and hence no layout optimizations are possible. This covers MaybeUninit. So Option<MaybeUninit<NonZeroU8>> will never have size 1.

jethrogb commented 6 years ago

there is general agreement that for unions that have a variant of type (), any bit-pattern is valid and hence no layout optimizations are possible.

Is this a special case for “unions that have a variant of type ()”? Does the stabilization of this feature implicitly stabilize that part of the Rust ABI? What about a union containing struct UnitType; or struct NewType(());? What about struct Padded (below)? What about a union containing struct Padded?

#[repr(C, align(4))]
struct Padded {
    a: NonZeroU8,
    b: (),
    c: NonZeroU16
}

RalfJung commented 6 years ago

My wording was awfully specific because this is literally the only thing that I am pretty sure we have general agreement on. :) I think we would want to make this dependent on the size only (i.e., all ZSTs would get this), but actually I think this variant shouldn't even be needed and unions will just never get layout optimizations by default (but eventually users may be able to opt-in using attributes). But that is just my opinion.

We will have a proper discussion to gauge the current consensus and maybe get agreement on more things in one of the next discusions in the UCG repo, and you are welcome to join there when it happens.

Does the stabilization of this feature implicitly stabilize that part of the Rust ABI?

We are talking about validity invariants here, not data layout (which I assume you refer to when you bring up the ABI). So none of this would stabilize any ABI. These are related but distinct, and in fact there is currently an ongoing discussion on the ABI of unions.

gnzlbg commented 6 years ago

These are related but distinct, and in fact there is currently an ongoing discussion on the ABI of unions.

AFAICT that discussion is about the memory representation of unions only, and does not include how the unions are passed through function boundaries and other things that might be relevant for an ABI. I don't think the objective of the UCG repo is to create an ABI for Rust.

RalfJung commented 6 years ago

Well, the objective is to define enough things for interop with C. Things like "Rust bool and C bool are ABI-compatible".

But indeed, for repr(Rust), I think there are no plans to define a function call ABI -- but that should ideally be an explicit statement in whatever form the resulting document takes, not just an omission.

rust-lang / rust

Tracking issue for RFC 1892, "Deprecate uninitialized in favor of a new MaybeUninit type" #53491

NEW TRACKING ISSUE = https://github.com/rust-lang/rust/issues/63566