Untagged unions (tracking issue for RFC 1444)

nikomatsakis commented 8 years ago

Tracking issue for rust-lang/rfcs#1444.

Unresolved questions:

[x] Does assigning directly to a union field trigger a drop of the previous contents?
[x] When moving out of one field of a union, are the others considered invalidated? (1, 2, 3, 4)
- Answered by https://github.com/rust-lang/rfcs/pull/2514.
[ ] Under what conditions can you implement Copy for a union? For example, what if some variants are of non-Copy type? All variants?
[ ] What interaction is there between unions and enum layout optimizations? (https://github.com/rust-lang/rust/issues/36394)

Open issues of high import:

[x] https://github.com/rust-lang/rust/issues/47412 -- MIR-based unsafety checker sometimes accepts unsafe accesses to union fields in presence of uninhabited fields

sfackler commented 8 years ago

I may have missed it in the discussion on the RFC, but am I correct in thinking that destructors of union variants are never run? Would the destructor for the Box::new(1) run in this example?

union Foo {
    f: i32,
    g: Box<i32>,
}

let mut f = Foo { g: Box::new(1) };
f.g = Box::new(2);

solson commented 8 years ago

@sfackler My current understanding is that f.g = Box::new(2) will run the destructor but f = Foo { g: Box::new(2) } would not. That is, assigning to a Box<i32> lvalue will cause a drop like always, but assigning to a Foo lvalue will not.

sfackler commented 8 years ago

So an assignment to a variant is like an assertion that the field was previously "valid"?

solson commented 8 years ago

@sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though. I see it as an instantiation of the general rule that writing to a Drop lvalue causes a destructor call.

ohAitch commented 8 years ago

Should a &mut union with Drop variants be a lint?

On Friday, 8 April 2016, Scott Olson notifications@github.com wrote:

@sfackler https://github.com/sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-207634431

joshtriplett commented 8 years ago

On April 8, 2016 3:36:22 PM PDT, Scott Olson notifications@github.com wrote:

@sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though.

I should have covered that case explicitly. I think both behaviors are defensible, but I think it'd be far less surprising to never implicitly drop a field. The RFC already recommends a lint for union fields with types that implement Drop. I don't think assigning to a field implies that field was previously valid.

sfackler commented 8 years ago

Yeah, that approach seems a bit less dangerous to me as well.

solson commented 8 years ago

Not dropping when assigning to a union field would make f.g = Box::new(2) act differently from let p = &mut f.g; *p = Box::new(2), because you can't make the latter case not drop. I think my approach is less surprising.

It's not a new problem, either; unsafe programmers already have to deal with other situations where foo = bar is UB if foo is uninitialized and Drop.

joshtriplett commented 8 years ago

I personally don't plan to use Drop types with unions at all. So I'll defer entirely to people who have worked with analogous unsafe code on the semantics of doing so.

retep998 commented 8 years ago

I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.

ohAitch commented 8 years ago

I don't intend to use mutable references to unions, and probably just "weirdly-tagged" ones with Into

On Friday, 8 April 2016, Peter Atashian notifications@github.com wrote:

I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-207653168

nikomatsakis commented 8 years ago

Seems like this is a good issue to raise up as an unresolved question. I'm not sure yet which approach I prefer.

joshtriplett commented 8 years ago

@nikomatsakis As much as I find it awkward for assigning to a union field of a type with Drop to require previous validity of that field, the reference case @tsion mentioned seems almost unavoidable. I think this might just be a gotcha associated with code that intentionally disables the lint for putting a type with Drop in a union. (And a short explanation of it should be in the explanatory text for that lint.)

solson commented 8 years ago

And I'd like to reiterate that unsafe programmers must already generally know that a = b means drop_in_place(&mut a); ptr::write(&mut a, b) to write safe code. Not dropping union fields would be one more exception to learn, not one less.

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I support having a default warning against Drop variants in unions that people have to #[allow(..)] since this is a fairly non-obvious detail.

nikomatsakis commented 8 years ago

@tsion this is not true for a = b and maybe only sometimes true for a.x = b but it is certainly true for *a = b. This uncertainty is what made me hesitant about it. For example, this compiles:

fn main() {
  let mut x: (i32, i32);
  x.0 = 2;
  x.1 = 3;
}

(though trying to print x later fails, but I consider that a bug)

solson commented 8 years ago

@nikomatsakis That example is new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for a.x = b?

Say, if x.0 had a type with a destructor, surely that destructor is called:

fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called before writing new value
}

arielb1 commented 8 years ago

Maybe just lint against that kind of write?

nikomatsakis commented 8 years ago

My point is only that = does not always run the destructor; it uses some knowledge about whether the target is known to be initialized.

On Tue, Apr 12, 2016 at 04:10:39PM -0700, Scott Olson wrote:

@nikomatsakis That example new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for 'a.x = b'?

Say, if x.0 had a type with a destructor, surely that destructor is called:
fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called
}

arielb1 commented 8 years ago

@nikomatsakis

It runs the destructor if the drop flag is set.

But I think that kind of write is confusing anyway, so why not just forbid it? You can always do *(&mut u.var) = val.

solson commented 8 years ago

My point is only that = does not always run the destructor; it uses some knowledge about whether the target is known to be initialized.

@nikomatsakis I already mentioned that:

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I didn't account for dynamic checking of drop flags, so this is definitely more complicated than I considered.

arielb1 commented 8 years ago

@tsion

Drop flags are only semi-dynamic - after zeroing drop is gone, they are a part of codegen. I say we forbid that kind of write because it does more confusion than good.

ghost commented 8 years ago

Should Drop types even be allowed in unions? If I'm understanding things correctly, the main reason to have unions in Rust is to interface with C code that has unions, and C doesn't even have destructors. For all other purposes, it seems that it's better to just use an enum in Rust code.

Amanieu commented 8 years ago

There is a valid use case for using a union to implement a NoDrop type which inhibits drop.

joshtriplett commented 8 years ago

As well as invoking such code manually via drop_in_place or similar.

RumataEstor commented 8 years ago

To me dropping a field value while writing to it is definitely wrong because the previous option type is undefined.

Would it be possible to prohibit field setters but require full union replacement? In this case if the union implements Drop full union drop would be called for the value replaced as expected.

joshtriplett commented 8 years ago

I don't think it makes sense to prohibit field setters; most uses of unions should have no problem using those, and fields without a Drop implementation will likely remain the common case. Unions with fields that implement Drop will produce a warning by default, making it even less likely to hit this case accidentally.

Stebalien commented 8 years ago

For the sake of discussion, I intend to expose mutable references to fields in unions and put arbitrary (possibly Drop) types into them. Basically, I would like to use unions to write custom space-efficient enums. For example,

union SlotInner<V> {
    next_empty: usize, /* index of next empty slot */
    value: V,
}

struct Slot<V> {
    inner: SlotInner<V>,
    version: u64 /* even version -> is_empty */
}

joshtriplett commented 8 years ago

@nikomatsakis I'd like to propose a concrete answer to the question currently listed as unresolved here.

To avoid unnecessarily complex semantics, assigning to a union field should act like assigning to a struct field, which means dropping the old contents. It's easy enough to avoid this if you know about it, by assigning to the whole union instead. This is still slightly surprising behavior, but having a union field that implements Drop at all will produce a warning, and the text of that warning can explicitly mention this as a caveat.

Would it make sense to provide an RFC pull request amending RFC1444 to document this behavior?

aturon commented 8 years ago

@joshtriplett Since @nikomatsakis is away on vacation, I'll answer: I think it's great form to file an amendment RFC for resolving questions like this. We'd often fast-track such RFC PRs when appropriate.

joshtriplett commented 8 years ago

@aturon Thanks. I've filed the new RFC PR https://github.com/rust-lang/rfcs/issues/1663 with these clarifications.to RFC1444, to resolve this issue.

Stebalien commented 8 years ago

_{_{(@aturon you can check-off that unresolved question now.)}}

petrochenkov commented 8 years ago

I have some preliminary implementation in https://github.com/petrochenkov/rust/tree/union.

Status: Implemented (modulo bugs), PR submitted (https://github.com/rust-lang/rust/pull/36016).

joshtriplett commented 8 years ago

@petrochenkov Awesome! Looks great so far.

petrochenkov commented 8 years ago

I'm not quite sure how to treat unions with non-Copy fields in move checker. Suppose u is an initialized value of union U { a: A, b: B } and now we move out of one of the fields:

1) A: !Copy, B: !Copy, move_out_of(u.a) This is simple, u.b is also put into uninitialized state. Sanity check: union U { a: T, b: T } should behave exactly like struct S { a: T } + field alias.

2) A: Copy, B: !Copy, move_out_of(u.a) Supposedly u.b should still be initialized, because move_out_of(u.a) is simply a memcpy and doesn't change u.b in any way.

2) A: !Copy, B: Copy, move_out_of(u.a) This is the strangest case; supposedly u.b should also be put into uninitialized state despite being Copy. Copy values can be uninitialized (e.g. let a: u8;), but changing their state from initialized to uninitialized is something new, AFAIK.

petrochenkov commented 8 years ago

@retep998 I know this is completely irrelevant to FFI needs :) The good news is that it's not a blocker, I'm going to implement whatever behavior is simpler and submit PR this weekend.

nikomatsakis commented 8 years ago

@petrochenkov my instinct is that unions are a "bit-bucket", essentially. You are responsible for tracking whether the data is initialized or not and what it's true type is. This is very similar to the referent of a raw pointer.

This is why we can't drop the data for you, and also why any access to the fields is unsafe (even if, say, there is only one variant).

By these rules, I would expect unions to implement Copy if copy is implemented for them. Unlike structs/enums, however, there would be no internal sanity checks: you can always implement copy for a union type if you like.

nikomatsakis commented 8 years ago

Let me give some examples to clarify:

union Foo { ... } // contents don't matter

This union is affine, because Copy has not been implemented.

union Bar { x: Rc<String> }
impl Copy for Bar { }
impl Clone for Bar { fn clone(&self) -> Self { *self } }

This union type Bar is copy, because Copy has been implemented.

Note that if Bar were a struct, it would be an error to implement Copy because of the type of the field x.

Huh, I guess I'm not actually answering your question though, now that I re-read it. =)

nikomatsakis commented 8 years ago

OK, so, I realize I was not answering your question at all. So let me try again. Following the "bit bucket" principle, I would still expect that we can move out from a union at will. But of course another option would be to treat it like we treat a *mut T, and require you to use ptr::read to move out.

EDIT: I'm not actually entire sure why we would prohibit such moves. It might have had to do w/ moving drop -- or perhaps just because it's easy to make a mistake and it seems better to make "moves out" more explicit? I am having trouble remembering the history here.

petrochenkov commented 8 years ago

@nikomatsakis

my instinct is that unions are a "bit-bucket", essentially.

Ha, I, on the contrary, would like to give as many guarantees about union's content as we can for such a dangerous construct.

The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value (unless unsafe code is involved).

All the borrow/move rules in the current implementation support this guarantee, simultaneously this is the most conservative interpretation, that allows us to go either the "safe" way (for example, allowing safe access to unions with equally typed fields, this can be useful) or the "bit bucket" way in the future, when more experience with Rust unions is gathered.

Actually, I'd like to make it even more conservative as described in https://github.com/rust-lang/rust/pull/36016#issuecomment-242810887

nikomatsakis commented 8 years ago

@petrochenkov

The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value (unless unsafe code is involved).

Note that unsafe code is always involved, when working with a union, since every access to a field is unsafe.

The way I think of it is, I think, similar. Basically a union is like an enum but it can be in more than one variant simultaneously. The set of valid variants is not known to the compiler at any point, though sometimes we can figure out that the set is empty (i.e., the enum is uninitialized).

So I see any use of some_union.field as basically an implicit (and unsafe) assertion that the set of valid variants currently includes field. This seems compatible with how the borrow-checker integration works; if you borrow field x and then try to use y, you are getting an error because you are basically saying that the data is simultaneously x and y (and it is borrowed). (In contrast, with a regular enum, it is not possible to inhabit more than one variant at a time, and you can see this in how the borrowck rules play out).

Anyway, the point is, when we "move" from one field of a union, the question at hand I guess is whether we can deduce that this implies that interpreting the value as the other variants is no longer valid. I think it'd be not so hard to argue either way, though. I consider this a grey zone.

The danger of being conservative is that we might well rule out unsafe code that would otherwise make sense and be valid. But I'm ok with starting out tighter and deciding whether to loosen later.

We should discuss the matter of what conditions are needed to implement Copy on a union -- also, we should make sure we have a complete list of these grey areas listed above to make sure we address and document before stabilization!

petrochenkov commented 8 years ago

Basically a union is like an enum but it can be in more than one variant simultaneously.

One argument against the "more than one variant" interpretation is how unions behave in constant expressions - for these unions we always know the single active variant and also can't access inactive variants because transmuting at compile time is generally bad (unless we are trying to turn the compiler into some kind of partial target emulator). My interpretation is that at runtime inactive variants are still inactive but can be accessed if they are layout compatible with the union's active variant (more restrictive definition) or rather with union's fragment assignment history (more vague, but more useful).

we should make sure we have a complete list of these grey areas

I'm going to amend the union RFC in some not-so-remote future! The "enum" interpretation has pretty fun consequences.

solson commented 8 years ago

transmuting at compile time is generally bad (unless we are trying to turn the compiler into some kind of partial target emulator)

@petrochenkov This is one of the goals of my Miri project. Miri can already do transmutes and various raw pointer shenanigans. It would be a small amount of work to make Miri handle unions (nothing new on the raw memory handling side).

And @eddyb is pushing to replace rustc constant evaluation with a version of Miri.

nikomatsakis commented 8 years ago

@petrochenkov

One argument against the "more than one variant" interpretation is how unions behave in constant expressions...

How to best support the use of unions in constants is an interesting question, but I see no problem with restricting constant expressions to a subset of runtime behavior (this is what we always do, anyhow). That is, just because we may not be able to fully support some particular transmute at compilation time doesn't mean it's illegal at runtime.

My interpretation is that at runtime inactive variants are still inactive but can be accessed if they are layout compatible with the union's active variant

Hmm, I'm trying to think how this is different from saying that the union belongs to all of those variants simultaneously. I don't really see a difference yet. :)

I feel like this interpretation has odd interactions with moves in general. For example, if the data is "really" an X, and you interpret it as a Y, but Y is affine, then is it still an X?

Regardless, I think it's fine that having a move of any field consume the entire union can be seen as consistent with any of these interpretations. For example, in the "set of variants" approach, the idea is just that moving the value deinitializes all existing variants (and of course the variant you used must be one of the valid set). In your version, it would seem to "transmute" into that variant (and consume the original).

I'm going to amend the union RFC in some not-so-remote future! The "enum" interpretation has pretty fun consequences.

Such confidence! You're going to try ;)

Care to shed a few more details on what concrete changes you have in mind?

petrochenkov commented 8 years ago

Care to shed a few more details on what concrete changes you have in mind?

More detailed description of the implementation (i.e. better documentation), some small extensions (like empty unions and .. in union patterns), two main (contradicting) alternatives of union evolution - more unsafe and less restrictive "scratch space" interpretation and more safe and more restrictive "enum with unknown discriminant" interpretation - and their consequences for move/initialization checker, Copy impls, unsafety of field access, etc.

petrochenkov commented 8 years ago

It would also be useful to define when accessing an inactive union field is UB, e.g.

union U { a: u8, b: () }
let u = U { b: () };
let a = u.a; // most probably an UB, equivalent to reading from `mem::uninitialized()`

but this is an infinitely tricky area.

ohAitch commented 8 years ago

Sounds likely, cross-field semantics are basically a pointer cast right? (() as *u8)

On Thursday, 1 September 2016, Vadim Petrochenkov notifications@github.com wrote:

It would also be useful to define when accessing an inactive union field is UB, e.g.

union U { a: u8, b: () } let u = U { b: () }; let a = u.a; // most probably an UB, equivalent to reading from mem::uninitialized()

but this is an infinitely tricky area.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-244154751, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhi68qRITTFW5iJn6omZQQBQgzweNks5qlw4qgaJpZM4IDXsj .

ohAitch commented 8 years ago

Isn't field access always unsafe?

On Thursday, 1 September 2016, Vadim Petrochenkov notifications@github.com wrote:

Care to shed a few more details on what concrete changes you have in mind?

More detailed description of the implementation (i.e. better documentation), some small extensions (like empty unions and .. in union patterns), two main (contradicting) alternatives of union evolution - more unsafe and less restrictive "scratch space" interpretation and more safe and more restrictive "enum with unknown discriminant" interpretation - and their consequences for move/initialization checker, Copy impls, unsafety of field access, etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-244151164, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhuHStN8AFhR3KYDU27U29MiMpN5Bks5qlws9gaJpZM4IDXsj .

petrochenkov commented 8 years ago

Isn't field access always unsafe?

It can be made safe sometimes, e.g.

assignment to trivially-destructible union fields is safe.
any access to fields of a union U { f1: T, f2: T, ..., fN: T } (i.e. all fields have the same type) is safe in "enum with unknown discriminant" interpretation.

cuviper commented 8 years ago

It seems better not to apply special conditions to this, from a user point of view. Just call it unsafe, always.

joshtriplett commented 8 years ago

Currently testing out the support for unions in the latest rustc from git. Everything I've tried works perfectly.

rust-lang / rust

Untagged unions (tracking issue for RFC 1444) #32836