rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.53k stars 12.61k forks source link

ICE when assigning to a union with fields implementing Drop #41073

Closed Thinkofname closed 4 years ago

Thinkofname commented 7 years ago
#![feature(untagged_unions)]

union Test {
    a: A,
    b: B
}

#[derive(Debug)]
struct A(i32);
impl Drop for A {
    fn drop(&mut self) { println!("A"); }
}

#[derive(Debug)]
struct B(f32);
impl Drop for B {
    fn drop(&mut self) { println!("B"); }
}

fn main() {
    let mut test = Test { a: A(3) };
    println!("{:?}", unsafe { test.b });
    unsafe { test.b = B(0.5); }
}
error: internal compiler error: /checkout/src/librustc_borrowck/borrowck/mir/elaborate_drops.rs:313: drop of untracked, uninitialized value bb8, lv (_1.1: B) (Parent(Some(mp1)))
  --> <anon>:23:14
   |
23 |     unsafe { test.b = B(0.5); }
   |      

Full backtrace can be seen on the playground

https://play.rust-lang.org/?gist=bc93f5b222b09b31d3fd286cfc10858a&version=nightly&backtrace=2

petrochenkov commented 7 years ago

cc https://github.com/rust-lang/rust/issues/36246 @arielb1

arielb1 commented 7 years ago

Should this drop test.b or not? I can't figure that out from the RFC.

petrochenkov commented 7 years ago

That println is distracting, so I reworded the example a bit:

{
    let mut test = Test { a: A(3) }; // test.a is initialized -> test.b is initialized, test is initialized 
    {
        let tmp_b = test.b; // test.b is uninitialized (moved out) -> test.a is uninitialized, test is uninitialized
                            // tmp_b is initialized 
    } // tmp_b is dropped, B is printed
    test.b = B(0.5); // test.b is initialized -> test.a is initialized, test is initialized
                     // test.b is not dropped on assignment because it was uninitialized 
} // test is dropped (noop)
arielb1 commented 7 years ago

And if you move out a subfield?

#![feature(untagged_unions)]
union U {
    x: (Box<u32>, Box<u32>),
    y: Vec<u32>,
    raw: [u64; 4]
}

fn main() {
    let mut u = U { raw: [0; 4] };
    unsafe {
        drop(u.x.1);
        if true {
            u.y = Vec::new(); // what does this drop?
        } else if true {
            u.x = (Box::new(0), Box::new(1));
        } else {
            u.x.0 = Box::new(0); // and this?
        }
    }
}
petrochenkov commented 7 years ago

@arielb1

#![feature(untagged_unions)]
union U {
    x: (Box<u32>, Box<u32>),
    y: Vec<u32>,
    raw: [u64; 4]
}

fn main() {
    // init u.raw -> init (u.x, u.y) -> init u
    let mut u = U { raw: [0; 4] };
    unsafe {
        // uninit u.x.1 -> uninit u.x -> uninit (u.y, u.raw) -> uninit u
        // u.x.0 is still initialized
        drop(u.x.1);
        if true {
            // init u.y -> init (u.x, u.raw) -> init u
            // u.y was uninitialized, not dropped on assignment
            u.y = Vec::new(); // what does this drop?
        } else if true {
            // init u.x -> init (u.y, u.raw) -> init u
            // u.x was uninitialized, not dropped on assignment
            u.x = (Box::new(0), Box::new(1));
        } else {
            // assign u.x.0
            // u.x.0 was initialized, dropped on assignment
            u.x.0 = Box::new(0); // and this?
        }
    }
}

I tried to describe the scheme in https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state and https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#move-and-initialization-checking

arielb1 commented 7 years ago

@petrochenkov

That... sounds like something that would be fun to implement.

arielb1 commented 7 years ago

I think people will be confused by these run-time semantics. Should we prohibit replacing into partially-initialized unions?

nikomatsakis commented 6 years ago

Quite likely a duplicate of https://github.com/rust-lang/rust/issues/48962

nikomatsakis commented 6 years ago

cc @retep007 -- can you quickly test if this is a duplicate of #48962 ?

hrvolapeter commented 6 years ago

@nikomatsakis It seems not to be related to #48962 . I could take a look at this

hrvolapeter commented 6 years ago

When I tried to minimise example I found out that if either of println!("{:?}", unsafe { test.b });, unsafe { test.b = B(0.5); } is removed it does not ICE. Any ideas?

nikomatsakis commented 6 years ago

What is the backtrace where the ICE occurs?

nikomatsakis commented 6 years ago

Though reading the back-and-forth between @petrochenkov and @arielb1, it sounds like there is some confusion also as to what we should do upon assignment to a union field. I guess this has to do with whether to drop the pre-existing data or not (in part because the compiler doesn't have visibility into whether or not the union is initialized). I don't know that we can blame this ICE on NLL per se.

arielb1 commented 6 years ago

@nikomatsakis

This is not an NLL problem - it occurs even without #[feature(nll)], but rather is a problem with the unstable #[feature(untagged_unions)]. The problem is that the spec for when union fields are valid had never been implemented in dropck. Reading the old RFC 1444 move semantics @petrochenkov linked to (which are what is currently implemented in AST borrowck - I'm not sure this is specified clearly enough in MIR borrowck) again, I find it implementable enough (although I haven't looked at all edge cases).

Note that the move semantics were removed from the newest version of the union spec.

However, I'm not sure that implementing the specification as written is the best idea - these move semantics interact in a very weird and inconsistent-seeming way with assignments.

union U {
    a: (Box<u32>, Box<u32>),
    b: (Box<u32>, Box<u32>),
}

let mut u = U { a: (Box::new(0), Box::new(1)) };
// u.a & u.b are both initialized
u.b.0 = Box::new(2); // this would call a destructor
drop(u.a.0);

// u.a.0 is uninitialized, u.b is also now uninitialized, 
println!("{:?}", u.a.1); // prints `1`
u.a.1 = Box::new(3); // this would call a destructor

// u.b is *still* uninitialized, same as before.
println!("{:?}", u.b.1); //~ ERROR use of moved value
u.b.1 = Box::new(4); // this would *not* call a destructor, and would leak a `Box`

I feel that implementing this "as specified" will lead to a situation where it will be hard to predict whether writing to a union field will call a destructor or not.

I can think of 2 ways of fixing this inconsistency:

  1. Making the inconsistent cases a compile-time error. To do this, we could probably find a precise notion of "inconsistent cases" that works, while not preventing useful examples. I'm not sure what are the use-cases of unions of fields with destructors, so maybe just flat-out prohibiting assignments to subfields of unions with destructors would work.
    • Even if the notion of "inconsistent cases" here is hard for users to predict, this can only lead to hard-to-predict compiler errors, which are better than hard-to-predict runtime errors.
  2. Having a more predictable option for this case. For example, always call the destructor on every assignment, even if the field was moved out (this can lead to double frees).
arielb1 commented 6 years ago

cc #32836 - this has to resolved one way or another before untagged_unions is fully stabilized.

joshtriplett commented 6 years ago

@arielb1 I'm looking at the example you gave, and that seems like unsafe code doing something wrong, by accessing u.b after manipulating u.a. I do agree that it seems error-prone, though, and it seems quite possible that people would make invalid assumptions about what they can do safely based on how fields of the same type overlap.

That said: this seems like it only applies to union fields that implement Drop. We've stabilized union fields without Drop in 1.19, and stabilized ManuallyDrop (which has only one variant). We don't have stable unions with Drop fields, as far as I know. So, this doesn't seem like it affects anything we have stabilized, right?

petrochenkov commented 6 years ago

@arielb1

I can think of 2 ways of fixing this inconsistency:

I'd rather prefer something like 1 with more errors to something like 2 with double frees, to keep "union newtypes" with a single field like NoDrop completely safe and to keep the "enum interpretation" of unions in general.

petrochenkov commented 6 years ago

FWIW, I still see the behavior of code in https://github.com/rust-lang/rust/issues/41073#issuecomment-380248006 as logical and "following from the first principles", but I agree that nobody would want to deal with such puzzles in practice, so I'm all for generating more errors for sanity's sake even if they are "artificial" in some sense.

RalfJung commented 6 years ago

to keep "union newtypes" with a single field like NoDrop completely safe and to keep the "enum interpretation" of unions in general.

I don't think union "newtypes" with a single field should behave like newtypes at all. Unions are inherently unsafe and only when accessed can we make any assumptions about what they contain. I think an "enum interpretation" is incorrect.

Unions were added primarily for FFI purposes, and in C they most certainly do not have an "enum interpretation". For example, there are libraries that use unions as a way to be extensible -- as the library evolves, new union variants are added, and this is binary compatible with old clients that know about fewer variants. (IIRC it was @joshtriplett who brought this up.) A single-variant union just means that this is the only variant we know of, but unless we actually use that variant, we should not assume that this is the variant we are in.

TBH the most reasonable behavior for assigning to union fields that I can come up with is to never drop anything. That's at least predictable, and assignment is unsafe after all. I think any scheme that attempts to track union initialization state is bound to backfire. And I am certainly extremely surprised that initializing a field makes its siblings initialized. I would have expected the exact opposite---initializing a union field deinitializes its siblings. Why does the RFC make this strange choice? That's even more surprising if I consider that you want to follow an "enum interpretation", where certainly initializing one variant deinitializes the others.

One point I am missing from the RFC: What is the initialization state when a union goes into scope, e.g., when it is passed to a function?

petrochenkov commented 6 years ago

@RalfJung

And I am certainly extremely surprised that initializing a field makes its siblings initialized. I would have expected the exact opposite---initializing a union field deinitializes its siblings

Well, to support transmuting with unions

union U { f: f32, i: i32 }
let u = U { f: 1.0 }; // Initialize one field, another becomes initialized too
let i = unsafe { u.i }; // OK, `i` became initialized together with `f`, so we don't get an error here

https://github.com/rust-lang/rfcs/pull/1897 contains a number of examples that serve as motivation for the currently implemented rules, but I never got enough time to push it over finish line. It's sad that discussion of union semantics is now spread over various random issues and mostly ignores that RFC.

What is the initialization state when a union goes into scope, e.g., when it is passed to a function?

To pass something (not necessarily a union) to a function you have to borrow or move/copy it and you can't borrow/move/copy something uninitialized, so all function parameters are in initialized state initially.

(EDIT: here and above I'm talking about initialization state as it's tracked by the compiler statically (with move checker) and dynamically (with drop flags), not stuff like let x = mem::uninitialized() of course).

RalfJung commented 6 years ago

To pass something (not necessarily a union) to a function you have to borrow or move/copy it and you can't borrow/move/copy something uninitialized, so all function parameters are in initialized state initially.

Ah, that's a good point. I hadn't thought of this because I did not really accept the "writing to a union field initializes the siblings" rule, but with that rule, this actually makes sense.

However, doesn't that mean if I have two types that implement Drop, say

union Foo {
  x: String,
  y: Box<i32>
}

and then I do something like

let u = Foo { x: "abc".to_owned() };
u.y = Box::new(42);

then everything explodes because this will cause drop to run on u.y prior to the assignment?

The more I look at this, the more I think that maybe having Drop fields in unions at all is just a mistake. People should write

union Foo {
  x: ManuallyDrop<String>,
  y: ManuallyDrop<Box<i32>>
}

It's sad that discussion of union semantics is now spread over various random issues and mostly ignores that RFC.

Yeah, it's a huge mess :/ Also, I wasn't even ware of the extent of this RFC. Clearly I didn't do my research. Sorry for that! I hope to find some time to read it soon, that's lots of text.

petrochenkov commented 6 years ago

everything explodes because this will cause drop to run on u.y prior to the assignment?

Yes, that's why assignments to Drop fields require an unsafe block so you can explicitly promise to the compiler that dropping that field won't cause any bad consequences. (So the author of unsafe { ... } would be responsible for the explosion in this case.)

u.y = Box::new(42); //~ ERROR assignment to non-`Copy` union field requires unsafe function or block
RalfJung commented 6 years ago

Yes, that's why assignments to Drop fields require an unsafe block so you can explicitly promise to the compiler that dropping that field won't cause any bad consequences.

Fair enough. This is still a footgun though, e.g. all this may happen in an unsafe fn so the author doesn't even realize that the assignment is unsafe. TBH this reminds me of C making integer addition unsafe: There should be some context-free syntactic way to easily tell that an operation is unsafe. People thinking "unions are for FFI so they work like they do in C" will have a hard time, and probably write buggy code.

joshtriplett commented 6 years ago

I do think that unions with Drop fields are a massive footgun regardless. They do require unsafe code, at least.

RalfJung commented 6 years ago

I am reminded of what I recently said in the context of references to fields of a packed struct:

I also feel this violates the usual pattern in Rust that you should not make unsafe operations accidentally. The only other non-function-call operations we have that are unsafe involve dereferencing a raw pointer, and everyone kind-of knows why that is unsafe (though they may forget about alignment) [and I should have added reading a union field here]. That the mere act of taking a reference assigning to a field can be UB will probably be surprising to many. If they are doing this e.g. in an unsafe fn, they will never even see the warning/error, no matter how much we improve it.

Notice that assigning to a union field is always safe in C; only reading it can yield UB.

petrochenkov commented 6 years ago

@RalfJung

Unions were added primarily for FFI purposes, and in C they most certainly do not have an "enum interpretation".

Unions in (standard) C/C++ have much stronger enum interpretation than in Rust, C and C++ have a notion of "active field" of a union and you can't even access non-active fields (i.e. transmute with unions) in general.

(Of course, C/C++ have much looser static analysis so you can create and pass around uninitialized unions, but accessing them is still UB.)

RalfJung commented 6 years ago

C and C++ have a notion of "active field" of a union and you can't even access non-active fields (i.e. transmute with unions) in general.

AFAIK the following is legal:

   union int_or_short { int x; short y; } u = { .x = 3 };
   printf("%d\n", u.y);

You cannot access non-active fields through pointers, but you are generally allowed to access it "through the union type". (The standard is somewhat ambiguous about this, but Wikipedia agrees and GCC explicitly says it is okay.)

Above, we were talking about accessing the union directly, i.e. through the union type, in which case C/C++ does not apply restrictions based on the "active variant".

The restriction for accessing the "wrong" field through a pointer is necessary to make type-based alias analysis work. I didn't check the old standards, but I'd guess before the aliasing restrictions were introduced, unions were purely access-based as well.

petrochenkov commented 6 years ago

@RalfJung Yeah, that's why I said "in general", there are limited ways to do type punning through unions in standard C (depending on version, IIRC) but not in C++. My point was that the "base" model in those languages is the enum one, limited type punning is an extension out of necessity.

https://github.com/rust-lang/rfcs/pull/1897 proposes roughly the same model, i.e. "enum" interpretation is the base, so there's always an active field that's guaranteed to be valid, but you can also access inactive fields at your own risk as an extension.

RalfJung commented 6 years ago

My point was that the "base" model in those languages is the enum one, limited type punning is an extension out of necessity.

I actually disagree. I think the enum one exists solely to enable type-based alias analysis (which we decisively do not want in Rust); if it weren't for that, C would be just fine without any notion of an "active variant".

rust-lang/rfcs#1897 proposes roughly the same model, i.e. "enum" interpretation is the base, so there's always an active field that's guaranteed to be valid, but you can also access inactive fields at your own risk as an extension.

I'm not at all happy about this proposal. This extra hidden state makes things much more complicated. I think a model where unions are just bytes, and then on every read/write we decide what to do with these bytes, is much simpler.

For example, imagine you'd want to extend miri to actually be able to fully check violations of these rules for accessing union fields. I think that is a very desirable feature to have---it's not just practically useful (we could use miri to test for UB violations in unsafe code), it also means that it is even possible to have such a checker. IMO the fact that such a checker is likely impossible for C is one of the biggest faults of the language, and a mistake I'd like to not repeat in Rust. How would such a miri extension work in an "active field" situation? It seems it'd have to somehow track this extra state on the side, and be very careful about updating that state when pointers are used to access the union. It all seems rather messy to me. In contrast, if every access is just based on the bytes in memory and the type and offset of the field that is accessed, then we already have the full checker implemented in miri -- this is literally how every other access in Rust works as well. I am strongly opposed to unions introducing vast amounts of complexity in this machinery.

Given that we don't want TBAA, I see no reason to adapt an "enum interpretation". It's a much more complicated model, and I don't see any tangible benefits spelled out in the RFC.

pnkfelix commented 5 years ago

Now that RFC 2514 has been accepted (tracking issue #55149), the plan of record is to not allow unions to have a field with drop glue.

(Thus, the main issue here, once #55149 is implemented, will be to just make sure we signal a proper error in this case, rather than ICE.)