Closed nikomatsakis closed 5 years ago
I may have missed it in the discussion on the RFC, but am I correct in thinking that destructors of union variants are never run? Would the destructor for the Box::new(1)
run in this example?
union Foo {
f: i32,
g: Box<i32>,
}
let mut f = Foo { g: Box::new(1) };
f.g = Box::new(2);
@sfackler My current understanding is that f.g = Box::new(2)
will run the destructor but f = Foo { g: Box::new(2) }
would not. That is, assigning to a Box<i32>
lvalue will cause a drop like always, but assigning to a Foo
lvalue will not.
So an assignment to a variant is like an assertion that the field was previously "valid"?
@sfackler For Drop
types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo
constructor form or ptr::write
. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though. I see it as an instantiation of the general rule that writing to a Drop
lvalue causes a destructor call.
Should a &mut union with Drop variants be a lint?
On Friday, 8 April 2016, Scott Olson notifications@github.com wrote:
@sfackler https://github.com/sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-207634431
On April 8, 2016 3:36:22 PM PDT, Scott Olson notifications@github.com wrote:
@sfackler For
Drop
types, yeah, that's my understanding. If they weren't previously valid you need to use theFoo
constructor form orptr::write
. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though.
I should have covered that case explicitly. I think both behaviors are defensible, but I think it'd be far less surprising to never implicitly drop a field. The RFC already recommends a lint for union fields with types that implement Drop. I don't think assigning to a field implies that field was previously valid.
Yeah, that approach seems a bit less dangerous to me as well.
Not dropping when assigning to a union field would make f.g = Box::new(2)
act differently from let p = &mut f.g; *p = Box::new(2)
, because you can't make the latter case not drop. I think my approach is less surprising.
It's not a new problem, either; unsafe
programmers already have to deal with other situations where foo = bar
is UB if foo
is uninitialized and Drop
.
I personally don't plan to use Drop types with unions at all. So I'll defer entirely to people who have worked with analogous unsafe code on the semantics of doing so.
I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.
I don't intend to use mutable references to unions, and probably
just "weirdly-tagged" ones with Into
On Friday, 8 April 2016, Peter Atashian notifications@github.com wrote:
I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-207653168
Seems like this is a good issue to raise up as an unresolved question. I'm not sure yet which approach I prefer.
@nikomatsakis As much as I find it awkward for assigning to a union field of a type with Drop to require previous validity of that field, the reference case @tsion mentioned seems almost unavoidable. I think this might just be a gotcha associated with code that intentionally disables the lint for putting a type with Drop in a union. (And a short explanation of it should be in the explanatory text for that lint.)
And I'd like to reiterate that unsafe
programmers must already generally know that a = b
means drop_in_place(&mut a); ptr::write(&mut a, b)
to write safe code. Not dropping union fields would be one more exception to learn, not one less.
(NB: the drop doesn't happen when a
is statically known to already be uninitialized, like let a; a = b;
.)
But I support having a default warning against Drop
variants in unions that people have to #[allow(..)]
since this is a fairly non-obvious detail.
@tsion this is not true for a = b
and maybe only sometimes true for a.x = b
but it is certainly true for *a = b
. This uncertainty is what made me hesitant about it. For example, this compiles:
fn main() {
let mut x: (i32, i32);
x.0 = 2;
x.1 = 3;
}
(though trying to print x
later fails, but I consider that a bug)
@nikomatsakis That example is new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.
But I'm not sure I see the relevance of that example. Why is what I said not true for a = b
and only sometimes for a.x = b
?
Say, if x.0
had a type with a destructor, surely that destructor is called:
fn main() {
let mut x: (Box<i32>, i32);
x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
x.0 = Box::new(3); // x.0 destructor is called before writing new value
}
Maybe just lint against that kind of write?
My point is only that =
does not always run the destructor; it
uses some knowledge about whether the target is known to be
initialized.
On Tue, Apr 12, 2016 at 04:10:39PM -0700, Scott Olson wrote:
@nikomatsakis That example new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.
But I'm not sure I see the relevance of that example. Why is what I said not true for
a = b
and only sometimes for 'a.x = b'?Say, if
x.0
had a type with a destructor, surely that destructor is called:fn main() { let mut x: (Box<i32>, i32); x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called x.0 = Box::new(3); // x.0 destructor is called }
@nikomatsakis
It runs the destructor if the drop flag is set.
But I think that kind of write is confusing anyway, so why not just forbid it? You can always do *(&mut u.var) = val
.
My point is only that
=
does not always run the destructor; it uses some knowledge about whether the target is known to be initialized.
@nikomatsakis I already mentioned that:
(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)
But I didn't account for dynamic checking of drop flags, so this is definitely more complicated than I considered.
@tsion
Drop flags are only semi-dynamic - after zeroing drop is gone, they are a part of codegen. I say we forbid that kind of write because it does more confusion than good.
Should Drop
types even be allowed in unions? If I'm understanding things correctly, the main reason to have unions in Rust is to interface with C code that has unions, and C doesn't even have destructors. For all other purposes, it seems that it's better to just use an enum
in Rust code.
There is a valid use case for using a union to implement a NoDrop
type which inhibits drop.
As well as invoking such code manually via drop_in_place
or similar.
To me dropping a field value while writing to it is definitely wrong because the previous option type is undefined.
Would it be possible to prohibit field setters but require full union replacement? In this case if the union implements Drop full union drop would be called for the value replaced as expected.
I don't think it makes sense to prohibit field setters; most uses of unions should have no problem using those, and fields without a Drop implementation will likely remain the common case. Unions with fields that implement Drop will produce a warning by default, making it even less likely to hit this case accidentally.
For the sake of discussion, I intend to expose mutable references to fields in unions and put arbitrary (possibly Drop
) types into them. Basically, I would like to use unions to write custom space-efficient enums. For example,
union SlotInner<V> {
next_empty: usize, /* index of next empty slot */
value: V,
}
struct Slot<V> {
inner: SlotInner<V>,
version: u64 /* even version -> is_empty */
}
@nikomatsakis I'd like to propose a concrete answer to the question currently listed as unresolved here.
To avoid unnecessarily complex semantics, assigning to a union field should act like assigning to a struct field, which means dropping the old contents. It's easy enough to avoid this if you know about it, by assigning to the whole union instead. This is still slightly surprising behavior, but having a union field that implements Drop
at all will produce a warning, and the text of that warning can explicitly mention this as a caveat.
Would it make sense to provide an RFC pull request amending RFC1444 to document this behavior?
@joshtriplett Since @nikomatsakis is away on vacation, I'll answer: I think it's great form to file an amendment RFC for resolving questions like this. We'd often fast-track such RFC PRs when appropriate.
@aturon Thanks. I've filed the new RFC PR https://github.com/rust-lang/rfcs/issues/1663 with these clarifications.to RFC1444, to resolve this issue.
(@aturon you can check-off that unresolved question now.)
I have some preliminary implementation in https://github.com/petrochenkov/rust/tree/union.
Status: Implemented (modulo bugs), PR submitted (https://github.com/rust-lang/rust/pull/36016).
@petrochenkov Awesome! Looks great so far.
I'm not quite sure how to treat unions with non-Copy
fields in move checker.
Suppose u
is an initialized value of union U { a: A, b: B }
and now we move out of one of the fields:
1) A: !Copy, B: !Copy, move_out_of(u.a)
This is simple, u.b
is also put into uninitialized state.
Sanity check: union U { a: T, b: T }
should behave exactly like struct S { a: T }
+ field alias.
2) A: Copy, B: !Copy, move_out_of(u.a)
Supposedly u.b
should still be initialized, because move_out_of(u.a)
is simply a memcpy
and doesn't change u.b
in any way.
2) A: !Copy, B: Copy, move_out_of(u.a)
This is the strangest case; supposedly u.b
should also be put into uninitialized state despite being Copy
. Copy
values can be uninitialized (e.g. let a: u8;
), but changing their state from initialized to uninitialized is something new, AFAIK.
@retep998 I know this is completely irrelevant to FFI needs :) The good news is that it's not a blocker, I'm going to implement whatever behavior is simpler and submit PR this weekend.
@petrochenkov my instinct is that unions are a "bit-bucket", essentially. You are responsible for tracking whether the data is initialized or not and what it's true type is. This is very similar to the referent of a raw pointer.
This is why we can't drop the data for you, and also why any access to the fields is unsafe (even if, say, there is only one variant).
By these rules, I would expect unions to implement Copy
if copy is implemented for them. Unlike structs/enums, however, there would be no internal sanity checks: you can always implement copy for a union type if you like.
Let me give some examples to clarify:
union Foo { ... } // contents don't matter
This union is affine, because Copy
has not been implemented.
union Bar { x: Rc<String> }
impl Copy for Bar { }
impl Clone for Bar { fn clone(&self) -> Self { *self } }
This union type Bar
is copy, because Copy
has been implemented.
Note that if Bar
were a struct, it would be an error to implement Copy
because of the type of the field x
.
Huh, I guess I'm not actually answering your question though, now that I re-read it. =)
OK, so, I realize I was not answering your question at all. So let me try again. Following the "bit bucket" principle, I would still expect that we can move out from a union at will. But of course another option would be to treat it like we treat a *mut T
, and require you to use ptr::read
to move out.
EDIT: I'm not actually entire sure why we would prohibit such moves. It might have had to do w/ moving drop -- or perhaps just because it's easy to make a mistake and it seems better to make "moves out" more explicit? I am having trouble remembering the history here.
@nikomatsakis
my instinct is that unions are a "bit-bucket", essentially.
Ha, I, on the contrary, would like to give as many guarantees about union's content as we can for such a dangerous construct.
The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value (unless unsafe code is involved).
All the borrow/move rules in the current implementation support this guarantee, simultaneously this is the most conservative interpretation, that allows us to go either the "safe" way (for example, allowing safe access to unions with equally typed fields, this can be useful) or the "bit bucket" way in the future, when more experience with Rust unions is gathered.
Actually, I'd like to make it even more conservative as described in https://github.com/rust-lang/rust/pull/36016#issuecomment-242810887
@petrochenkov
The interpretation is that union is an enum for which we don't know the discriminant, i.e. we can guarantee that at any moment of time at least one of union's variants has valid value (unless unsafe code is involved).
Note that unsafe code is always involved, when working with a union, since every access to a field is unsafe.
The way I think of it is, I think, similar. Basically a union is like an enum but it can be in more than one variant simultaneously. The set of valid variants is not known to the compiler at any point, though sometimes we can figure out that the set is empty (i.e., the enum is uninitialized).
So I see any use of some_union.field
as basically an implicit (and unsafe) assertion that the set of valid variants currently includes field
. This seems compatible with how the borrow-checker integration works; if you borrow field x
and then try to use y
, you are getting an error because you are basically saying that the data is simultaneously x
and y
(and it is borrowed). (In contrast, with a regular enum, it is not possible to inhabit more than one variant at a time, and you can see this in how the borrowck rules play out).
Anyway, the point is, when we "move" from one field of a union, the question at hand I guess is whether we can deduce that this implies that interpreting the value as the other variants is no longer valid. I think it'd be not so hard to argue either way, though. I consider this a grey zone.
The danger of being conservative is that we might well rule out unsafe code that would otherwise make sense and be valid. But I'm ok with starting out tighter and deciding whether to loosen later.
We should discuss the matter of what conditions are needed to implement Copy
on a union -- also, we should make sure we have a complete list of these grey areas listed above to make sure we address and document before stabilization!
Basically a union is like an enum but it can be in more than one variant simultaneously.
One argument against the "more than one variant" interpretation is how unions behave in constant expressions - for these unions we always know the single active variant and also can't access inactive variants because transmuting at compile time is generally bad (unless we are trying to turn the compiler into some kind of partial target emulator). My interpretation is that at runtime inactive variants are still inactive but can be accessed if they are layout compatible with the union's active variant (more restrictive definition) or rather with union's fragment assignment history (more vague, but more useful).
we should make sure we have a complete list of these grey areas
I'm going to amend the union RFC in some not-so-remote future! The "enum" interpretation has pretty fun consequences.
transmuting at compile time is generally bad (unless we are trying to turn the compiler into some kind of partial target emulator)
@petrochenkov This is one of the goals of my Miri project. Miri can already do transmutes and various raw pointer shenanigans. It would be a small amount of work to make Miri handle unions (nothing new on the raw memory handling side).
And @eddyb is pushing to replace rustc constant evaluation with a version of Miri.
@petrochenkov
One argument against the "more than one variant" interpretation is how unions behave in constant expressions...
How to best support the use of unions in constants is an interesting question, but I see no problem with restricting constant expressions to a subset of runtime behavior (this is what we always do, anyhow). That is, just because we may not be able to fully support some particular transmute at compilation time doesn't mean it's illegal at runtime.
My interpretation is that at runtime inactive variants are still inactive but can be accessed if they are layout compatible with the union's active variant
Hmm, I'm trying to think how this is different from saying that the union belongs to all of those variants simultaneously. I don't really see a difference yet. :)
I feel like this interpretation has odd interactions with moves in general. For example, if the data is "really" an X, and you interpret it as a Y, but Y is affine, then is it still an X?
Regardless, I think it's fine that having a move of any field consume the entire union can be seen as consistent with any of these interpretations. For example, in the "set of variants" approach, the idea is just that moving the value deinitializes all existing variants (and of course the variant you used must be one of the valid set). In your version, it would seem to "transmute" into that variant (and consume the original).
I'm going to amend the union RFC in some not-so-remote future! The "enum" interpretation has pretty fun consequences.
Such confidence! You're going to try ;)
Care to shed a few more details on what concrete changes you have in mind?
Care to shed a few more details on what concrete changes you have in mind?
More detailed description of the implementation (i.e. better documentation), some small extensions (like empty unions and ..
in union patterns), two main (contradicting) alternatives of union evolution - more unsafe and less restrictive "scratch space" interpretation and more safe and more restrictive "enum with unknown discriminant" interpretation - and their consequences for move/initialization checker, Copy
impls, unsafe
ty of field access, etc.
It would also be useful to define when accessing an inactive union field is UB, e.g.
union U { a: u8, b: () }
let u = U { b: () };
let a = u.a; // most probably an UB, equivalent to reading from `mem::uninitialized()`
but this is an infinitely tricky area.
Sounds likely, cross-field semantics are basically a pointer cast right? (() as *u8)
On Thursday, 1 September 2016, Vadim Petrochenkov notifications@github.com wrote:
It would also be useful to define when accessing an inactive union field is UB, e.g.
union U { a: u8, b: () } let u = U { b: () }; let a = u.a; // most probably an UB, equivalent to reading from
mem::uninitialized()
but this is an infinitely tricky area.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-244154751, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhi68qRITTFW5iJn6omZQQBQgzweNks5qlw4qgaJpZM4IDXsj .
Isn't field access always unsafe?
On Thursday, 1 September 2016, Vadim Petrochenkov notifications@github.com wrote:
Care to shed a few more details on what concrete changes you have in mind?
More detailed description of the implementation (i.e. better documentation), some small extensions (like empty unions and .. in union patterns), two main (contradicting) alternatives of union evolution - more unsafe and less restrictive "scratch space" interpretation and more safe and more restrictive "enum with unknown discriminant" interpretation - and their consequences for move/initialization checker, Copy impls, unsafety of field access, etc.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/32836#issuecomment-244151164, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxXhuHStN8AFhR3KYDU27U29MiMpN5Bks5qlws9gaJpZM4IDXsj .
Isn't field access always unsafe?
It can be made safe sometimes, e.g.
union U { f1: T, f2: T, ..., fN: T }
(i.e. all fields have the same type) is safe in "enum with unknown discriminant" interpretation.It seems better not to apply special conditions to this, from a user point of view. Just call it unsafe, always.
Currently testing out the support for unions in the latest rustc from git. Everything I've tried works perfectly.
Tracking issue for rust-lang/rfcs#1444.
Unresolved questions:
Copy
for a union? For example, what if some variants are of non-Copy type? All variants?Open issues of high import: