rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.35k stars 12.72k forks source link

NLL: refine liveness with "maybe initialized" analysis #45665

Closed nikomatsakis closed 6 years ago

nikomatsakis commented 7 years ago

As part of https://github.com/rust-lang/rust/pull/45538, we are incorporating knowledge of which variables are live (i.e., may be used later). We also distinguish "drop-live", which means that they may be dropped later.

However, in MIR, we sometimes generate drops that we can see statically will be a no-op. For example, consider this program:

// compile-flags:-Znll
fn main() {
    let mut v = 1;
    let p = Wrap { value: &v }; // freezes `v` so long as `p` is in use
    ignore(p); // moves `p`
    v += 1; // Errors, even in NLL! (At least as implemented now.)
} // <-- p is dropped here, but we **know** this to be a no-op

fn ignore<T>(x: T) { }

struct Wrap<T> { value: T }

impl<T> Drop for Wrap<T> {
    fn drop(&mut self) { }
}

The problem is that p is considered "drop-live" at the point where v += 1, because there is a DROP in the MIR, even though we know that p must have been moved (and hence will not be dropped) at that point.

One way to account for this is to do a maybe initialized analysis. This is a forwards analysis that tells you, at any given point, whether a given value could possibly be initialized that this point. The idea then would be to change the code which invokes the add_drop_live_constraint function to take the "maybe initialized" state into account. In particular, if the local variable that is being dropped cannot be initialized (i.e., maybe initialized is false), then we can ignore the fact that it is drop live and just not invoke add_drop_live_constraint with its type.

We are already computing a "maybe initialized" in the borrow checker (the variable flow_inits). However, this analysis works a bit differently than the NLL code. For example, it is computed at the level of fragments, which are paths like a.b.c. This is done because it is possible that a was initialized, but some subpart of it (e.g., a.b.c) was moved, while the rest remains intact.

Honestly, all the data structures are a bit out of cache for me, but the high-level idea then is that, when we know that a variable X is drop-live, we want to iterate over all of the subpaths of a that may be initialized (which the existing data structures should be able to tell us) and then invoke the existing add_drop_live_constraint function on the types of those fragments.

cc @arielb1 @pnkfelix -- any notes you care to leave on how to find the maybe initialized fragments given a mir::Local would be appreciated. Else I will dig into the data structures at some point.

arielb1 commented 7 years ago

We should have the right sort of data in flow_inits, and I don't see why can't we have liveness fragment-by-fragment: structs with destructors can't be partially initialized anyway - see nikomatsakis/nll-rfc#42.

nikomatsakis commented 7 years ago

Let me leave some notes on how things work now. I've been digging about in the code a bit to get a better understanding of what's going to be required here. The NLL code approaches things just a hair differently from the existing borrowck data-flow operator, but I think the two should match up well enough.

First, a bit of background. In order to track which paths are moved etc, we currently construct up a tree of paths. Each such path is called a MovePath, and they are indexed by the MoveData struct. Typically though we don't pass around a MovePath directly, [but rather a MovePathIndex.

The move paths are arranged in a tree, so e.g. if you have a move path a, as well as a.b and a.c, you can walk from a to a.b and a.c -- basically, each move data has a link to its child (first_child) and then that child has links to its siblings (next_sibling). The helper has_any_child_of, for example, shows how to iterate through all the descendants of a node using these fields. (We'll need a similar helper, I think, but implemented on FlowInProgress<MaybeInitializedLvals<'b, 'gcx, 'tcx>> instead of FlowInProgress<MaybeUninitializedLvals<'b, 'gcx, 'tcx>> -- we should be able to share this code.)

It is instructive to look at where has_any_child_of is used today, since it is doing a similar thing. The check_if_path_is_moved function is invoked when there is an access to a path like a.b.c -- it checks whether that path is accessible or not. So, for example, it would be illegal to do some_function(a.b.c) if a.b.c.d has been moved, since a.b.c is not a "complete" value (it's also illegal to do some_function(a.b.c) if a or b has been moved).

In order to detect the case where a.b.c or some subpath of has been moved, the code works in two steps:

https://github.com/rust-lang/rust/blob/9f3b09116b742b2606dc5f36f9145e0c89e4010b/src/librustc_mir/borrow_check.rs#L737-L743

First, it looks for a move-path for a.b.c, on line 738. There may be no such path, if the function only moves a or a.b (but never directly accesses the field a.b.c). If there is such a path, then on line 739 it looks for all maybe-uninitialized children of that path using the has_any_child_of function we saw earlier.

nikomatsakis commented 7 years ago

Initial mentoring instructions: step 1

First, we are going to have to do a bit of surgery to the current control-flow, I think. Right now, the borrowck begins by invoking gather_moves as the first step, and then invokes the compute_regions call later on. This is actually a mistake, I believe, and we should be invoking compute_regions earlier -- we don't want to be doing the move analysis on the original MIR and then editing it, since the types and things will be changing, which could mess up some of the hashing etc.

Probably, also, it would be useful to make borrow_check into a directory, move the existing code into borrow_check/mod.rs, and move nll.rs from transform::nll into borrow_check::nll. That might be a good first PR.

Next, I think what we really want is to split up the compute_regions code into two steps: the "renumbering", which is what mutates the MIR, and then everything else. That way, the overall flow of borrowck can be:

OK, ran out of time, will write more later!

spastorino commented 7 years ago

I'm taking this one :smile:

nikomatsakis commented 6 years ago

@Nashenas88 wound up doing this, but it's done. There's follow-up work to be done, I'll file a separate issue for that.