rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.72k stars 12.64k forks source link

Goal: Accept partial initialization + use of records created via such #54987

Open pnkfelix opened 6 years ago

pnkfelix commented 6 years ago

Spawned off of #21232

In the long-term, we want to accept code like this:

struct S { x: u32, y: u32 }
fn main() { let mut s: S; s.x = 10; s.y = 30; a_use_of(&s); another_use_of(s); }
fn a_use_of(_s: &S) { /* ... */ }
fn another_use_of(_s: S) { /* ... */ }

We probably also want to start accepting this too:

struct S { x: u32, y: u32 }
fn main() { let mut s: S; s.x = 10; a_use_of_field(s.x); }
fn a_use_of_field(_x: u32) { /* ... */ }

(But that second example is more debatable. I don't think there's any debate about the first example.)


See #54986 for the short-term goal of rejecting all partial initializations until we can actually use the results you get from partial initialization.

leonardo-m commented 6 years ago
fn main() { let mut s: S; s.x = 10; a_use_of_field(s.x); }

If there's an invariant of the struct fields you can't use one struct field before all of them are initialized.

pnkfelix commented 6 years ago

I think @leonardo-m 's previous comment is meant to be an argument against accepting the code snippet they quoted.

Here are my main responses to that:

pnkfelix commented 6 years ago

tagging as NLL-deferred because we know we aren't going to try to implement this as part of the 2018 edition.

pnkfelix commented 5 years ago

@Centril has an in-progress draft of an RFC here: https://github.com/Centril/rfcs/pull/16

pnkfelix commented 5 years ago

Re-triaging for #56754. P-low since any change here deserves a (currently "unwritten" or at least unposted) RFC.

jethrogb commented 5 years ago

There are some concerns about the current state of this in #60450

tmandry commented 5 years ago

How should we handle yielding from a generator while a local variable is partially initialized? e.g.

let mut gen = || {
  let mut x: (i32, i32);
  x.0 = 11;
  yield;
  x.1 = 22;
}

I'd prefer that this wasn't allowed when generators stabilize. It would simplify some layout code and potentially enable more optimizations. It's possible we decide to allow it eventually, but I'd rather not be locked into that decision from the start.

Note that I'm fine with allowing this in cases where the local is not stored in the generator (i.e. it is never used after the yield). This is only useful in the case of the second example from the issue description.

cc #60889

RalfJung commented 5 years ago

See https://github.com/rust-lang/rust/pull/63230#issuecomment-517952035 for some discussion of this subject, and in particular the interaction with "uninhabited structs are ZST" (which seems trivial to do if we entirely rule out partial initialization).

So, not allowing partial initialization is beneficial not just for generators, but also for type layout in general.

hanna-kruppe commented 5 years ago

As @RalfJung himself noted in the linked discussion, the proposal seems at least very difficult for anything that works with "generic MIR + substs" rather than monomorphizing, chiefly miri. One could imagine performing the equivalent work every time a function is entered, but the cost is likely considerable.

A similar but even more serious problem I see is about unsafe code that performs partial or piecewise initalization manually. For example, adapting the usual (String, !) example (using https://github.com/rust-lang/rfcs/pull/2582 syntax to avoid UB):

fn construct_pair_inplace(
    out: &mut MaybeUninit<(String, !)>,
    mk_string: impl FnOnce() -> String,
    mk_void: impl FnOnce() -> !,
) {
    let ptr = out.as_mut_ptr();
    ptr::write(&raw mut (*ptr).0, mk_string());
    ptr::write(&raw mut (*ptr).1, mk_void());
}

There's no writes to a "sibling of an uninhabited field" visible here, just taking pointers and offsetting them and passing them to other functions (and if you want to consider ptr::write an intrinsic, just imagine it being hidden behind more function boundaries). Those pointers have to point to something large enough to hold the inhabited String, and considering everything else that may happen with the pointers (e.g., returning them up the stack slot, stashing them in a long-lived data structure) they can't just point to made-up local temporaries.

RalfJung commented 5 years ago

@rkruppe so under a proposal that makes (String, !) a ZST, that code would be UB. Do you think that is a problem on its own? I think I'd be fine with it.

I am not sure if "the proposal" you refer to is "make structs with uninhabited fields ZST", or "make structs with uninhabited fields ZST and still allow partial initialization in safe code".

hanna-kruppe commented 5 years ago

Yes, I think it's a problem, otherwise I wouldn't have brought it up. Well, not literally that one piece of code, more the general pattern of piecewise initalization via out pointer (whether MaybeUninit is involved or not). It's is an extremely reasonable thing to do, and it's especially attractive while we still don't have reliable NRVO/emplacement, which doesn't seem to change any time soon.

If that kind of code suddenly becomes UB as soon as any component of the overall aggregate being initialized is uninhabited, then that would be pretty uncool. Everyone writing such code would need to remember to add subtle additional checks1 (even more subtle than the special casing already required for ZSTs in some code, as uninhabited types are even rarer and more obscure), and if they care about still executing the side effects of the initialization of the inhabited parts, they'll need a lot of extra complexity for that too. I don't think that'll work out, instead we'll probably get a pile of unsound code that will rarely get in contact with uninhabited types but blow up every time it does.

Keep in mind that it'll rarely be as obviously involving an uninhabited type as in the above (kind of minimal) example, more commonly it will involve generics or a large (possibly auto-generated) struct/enum. Essentially, "what if one of these types is uninhabited" will have to get a dedicated slot on unsafe code audit checklists.

Edit: 1or rule out that any uninhabited types can be involved, but this seems difficult to ensure in most code for reasons outlined above, and in any case still adds an extra burden to those writing that code.

I am not sure if "the proposal" you refer to is "make structs with uninhabited fields ZST", or "make structs with uninhabited fields ZST and still allow partial initialization in safe code".

Sorry for being unclear, I meant the latter w.r.t. complexity imposed on miri (and other consumers of generic MIR).

RalfJung commented 5 years ago

Sorry for being unclear, I meant the latter w.r.t. complexity imposed on miri (and other consumers of generic MIR).

Now I am confused because the rest of your text applies equally to the former.^^

I think for unsafe code it is okay to put the burden on that code to show that it is not initializing something uninhabited. But maybe I am underestimating the complexity of that.

Looking at the generalized version of your example:

fn construct_pair_inplace<T, U>(
    out: &mut MaybeUninit<(T, U)>,
    mk0: impl FnOnce() -> T,
    mk1: impl FnOnce() -> U,
) {
    let ptr = out.as_mut_ptr();
    ptr::write(&raw mut (*ptr).0, mk0());
    ptr::write(&raw mut (*ptr).1, mk1());
}

I think saying this is UB if either T or U are uninhabited is fine. If we don't care where inside this function the UB arises, the only difference from today is that right now this is not UB if U = ! and mk0 diverges or panics.

hanna-kruppe commented 5 years ago

Now I am confused because the rest of your text applies equally to the former.^^

Ugh, sorry for all the confusion. I was trying to say "the problem for miri applies only to one of these proposals, the other one regarding unsafe code applies to everything", but code patterns like let x: (String, !) = (..., panic!()); are allowed anyway and generate exactly the kind of MIR that's problematic for miri. So scratch that, my position is both of these problems apply to any proposal that makes uninhabited aggregates ZSTs.

I think saying this is UB if either T or U are uninhabited is fine. If we don't care where inside this function the UB arises, the only difference from today is that right now this is not UB if U = ! and mk0 diverges or panics.

(I have misgivings about using this very simplified, if generic, example as running example, but let's go with it for now.)

One question to answer is, how can someone rewrite that function to be sound? There's a few ways I can think of, but none of them are good. One could simply bail out if the overall type is a ZST:

if size_of_val(out) == 0 {
    return;
}

but this omits side effects from mk0 and mk1, and also affects inhabited ZSTs. The latter can be fixed with a new intrinsic, I think, but even then it makes the function actually return when it really shouldn't (because logically it's trying to initialize something that can't be initialized). To do better, we'll have to proceed and check field-by-field, like so:

if is_uninhabited {
    mk0(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk0());
}
if is_uninhabited {
    mk1(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk1());
}
// ... and so on for all other fields

(I can imagine a variant that doesn't try to preserve all side effects and just preserves divergence behavior, but it'll still need an amount of code linear in the number of fields to figure out which part it needs to call to diverge, assuming the exact kind of divergence matters.)

This is... not great. And it's just what programmers would write if they knew to do that. As I said before, I think this is an obscure obligation that most programmers will not get right, or even think to try to handle. It's exactly the kind of pitfalls that makes it unreasonably difficult to write sound unsafe code IMO. I am really unsure whether being able to make uninhabited aggregates into ZSTs is worth this. Does anyone know of real instances of performance/memory problems caused by them not being ZSTs? (Keep in mind that only product types are affected, Result<T, !> can still be the same size as T regardless.)

RalfJung commented 5 years ago

code patterns like let x: (String, !) = (..., panic!());

That's a very good point! MIR construction would have to be very smart about this.

@tmandry but isn't this also a problem for generators? What happens with something like

let x: (String, !) = (String::new(), { yield; panic!() });

In MIR, this looks a lot like partial initialization. So won't it have all the same problems?

One question to answer is, how can someone rewrite that function to be sound?

So you are saying this function actually is sound under our current semantics? Wow, I think you are right. And it wouldn't be sound any more with new semantics using the following safe client:

let mut x = MaybeUninit::uninit();
construct_pair_inplace(&mut x,
    || 0i32,
    || panic!(),
);

I think you convinced me that we cannot make these types ZST.

hanna-kruppe commented 5 years ago

That's a very good point! MIR construction would have to be very smart about this.

Worse, since this too can be made generic, there's really nothing MIR construction can do about it (unless we give up on placement entirely and always evaluate subexpressions into isolated temporaries first, but that's unacceptable IMO). It has to be "fixed up" at monomorphization time, with all the problems that entails.

tmandry commented 5 years ago

@tmandry but isn't this also a problem for generators? What happens with something like

let x: (String, !) = (String::new(), { yield; panic!() });

In MIR, this looks a lot like partial initialization. So won't it have all the same problems?

Today it's not a problem, because of how we emit the MIR (we assign each piece to a temporary, and then build the aggregate all at once). I asked @eddyb if depending on this was too brittle, and they said yes, it might change in the future.

So it seems likely that #63230 was just a stopgap and we'll have to be smarter and/or more conservative in the future about the possibility of partial initialization in generators.

Aaron1011 commented 5 years ago

It has to be "fixed up" at monomorphization time, with all the problems that entails.

@rkruppe: Can you elaborate on that? The way I see it, this shouldn't be too complicated. Going back to your example user-written code:

if is_uninhabited {
    mk0(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk0());
}
if is_uninhabited {
    mk1(); // eval into a temporary for side effects and/or divergence
} else {
    ptr::write(..., mk1());
}

this is just want we'd want monomorphization/miri to do. Setting aside partial initialization for the moment, we know that the uninhabited type we're 'constructing' can never escape the function (as any code after the creationof uninhabited type is unreachable).

For example, when monomorphizing or interpreting:

let x: (String, !) = (String::new(), { yield; panic!() });

We can simply write the String to a temporary. This wouldn't require us to make any changes to the MIR itself - that is, we could choose to assign directly to the (possibly uninhabited field), and detect this when the concrete type is known.

While this would result in some additional complexity in codegen and Miri, it would both simply and optimize actual Rust code. 'Uninhabited types are zero size' is a much nicer rule than 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable).'

RalfJung commented 5 years ago

@tmandry

we'll have to be smarter and/or more conservative in the future about the possibility of partial initialization in generators.

Is there any place we are tracking this specifically for generators? Or is the issue here sufficient?

@Aaron1011

The way I see it, this shouldn't be too complicated.

What you are proposing sounds extremely complicated to me.^^

If that chain of ifs is your proposal for how the MIR would look like, one big issue with that is that it duplicates the initializer expression. Sure, here it is just mk0(), but it can be arbitrarily complicated, and we probably don't want to blow up the MIR by duplicating them.

(FWIW, the test could be just "if size of struct greater 0", because if this is a ZST struct then "copying" the result of the computation into the struct is redundant in any case. We don't need a new "is uninhabited" intrinsic.)

While this would result in some additional complexity in codegen and Miri,

Given that this affects MIR semantics, I think it would also result in additional complexity in any other MIR consumer -- borrowcheck and MIR optimizations, specifically. And making the life of these passes as easy as we can is basically the only reason MIR exists at all. So if they all become too complicated through things like this, MIR failed at its job.

'Uninhabited types are zero size' is a much nicer rule than 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable).'

'The size of a struct is the sum of the sizes of its fields and padding' is also a much nicer rule than 'The size of a struct is the sum of the sizes of its fields and padding, except under certain circumstances (despite still being partially initializable).'

I would not say 'Uninhabited types are zero size, except under certain circumstances (despite still being unconstructable)' is a rule we currently have, so it seems silly to compare with that. There just is no connection between inhabitedness and size, end of story. So we are trading the 'size of a struct is the sum' rule against the 'uninhabited types are ZST' rule.

And note that even with your proposal 'uninhabited types are ZST' will likely not be true. For example, there are proposals to make &! intrinsically uninhabited so that we can just remove &self methods on ! (and more generally all uninhabited types) from the generated code -- but &! can never be a ZST, references are layout-compatible with pointers. So the rule would be more like 'many uninhabited types are ZST, but not all of them'.

tmandry commented 5 years ago

Is there any place we are tracking this specifically for generators? Or is the issue here sufficient?

Opened #63616.