nikomatsakis / borrowck

Modeling NLL and the Rust borrowck
Apache License 2.0
75 stars 11 forks source link

free region support #12

Open nikomatsakis opened 7 years ago

nikomatsakis commented 7 years ago

There is preliminary support for "free regions" in master now but it's kind of broken. The rough idea is to extend the control-flow graph to include synthetic nodes that contain the "end" of the various free regions. But there are some flaws with the current implementation. This issue exists to take some notes.

If there are multiple free regions, we are not handling their relationships together correctly. We want to create sort of "pessimistic" edges. This is a bit trickier, I think, then I first considered. Here is an example:

'a: 'b, 'b: 'a, 'c: 'b, 'd: 'a
// [ End(a, b) ] -> [End(c)] -> [End(static)]
//                     v ^        ^
//               -> [End(d)] -----+

Key here is that 'a and 'b get merged (because they outlive one another). 'c and 'd do not -- so we want to ensure that either End(c) or End(d) could occur first. These graphs are in some sense not ideal, because they include infeasible paths (i.e., d might never end), but I think it's ok for our purposes, since we care about possible paths that do exist (we don't have any intersection analyses, right?).

Also, we need to ensure that borrowing data of lifetime 'b for lifetime 'a will result in an error (presuming no ordering between the two). I had imagined this working as a result of the inference edges that result. I think that will still work, but not the way that free regions are currently implemented (since they map to lifetime variables). In particular, we need to add "verification constraints" that force (e.g., in example above) 'b to not contain the end region nodes for anything but 'a (since 'b: 'a) and the control-flow graph.

arielb1 commented 7 years ago

How do we translate something like <'a, 'b, 'c> 'a: 'b? The ordering 'a ends -> 'c ends is legal, the ordering 'c ends -> 'b ends is legal, but their combination 'a ends -> 'c ends -> 'b ends is not.

arielb1 commented 7 years ago

I think the best way to solve this is to A) treat every free lifetime to the combination of all CFG nodes in the current function + an "skolemized continuation lifetime" that represents its lifetime through the continuation. B) when 1 skolemized continuation lifetime would be required to outlive another (wlog, at the "point at infinity" which is the merge of the unwind and return points, which is always reachable because we allow every back-edge or potentially-nonterminating expression to spontaneously unwind), then either B.1) We have a constraint that the lifetimes are related in the right way - in that case, just allow it. B.2) We don't have such a constraint - then they might not be related in the way we want, and that's an error we can report to the user.

I think the "abstract lifetime" is a better way of thinking about skolemized lifetimes than trying to shoehorn them into the CFG.

nikomatsakis commented 7 years ago

@arielb1

I think the "abstract lifetime" is a better way of thinking about skolemized lifetimes than trying to shoehorn them into the CFG.

Hmm, so I also find this appealing, and in the past I went down this road for a bit. I remember getting concerned that sometimes it would interact poorly, causing excessively long borrows. I am having a bit of trouble recovering the precise case I was concerned about. I know that it had something to do with the get-default pattern, which is roughly this:

START { 
  // borrow here
  goto SOME NONE;
}
SOME {
  // return borrow here
  goto END
}
NONE {
  // mutate here, return something else
  goto END
}
END {
  // actual return
}

Here, the key point is that we start borrow (in START) that is conditionally returned. If we call the lifetime of that borrow 'b, and we call the lifetime of the returned result 'r, then this means you have a constraint like:

('b: 'r) @ SOME/0

The way the RFC describes it, this means that we trace (and include) all points reachable from SOME/0 that are also included in 'r. In this example so far, that seems ok -- presumably 'r is considered to include the entire CFG.

Maybe, to make this more concrete, we would say that regions are defined by (P, E), where P are points in the CFG, and E are "end points" for skolemized regions, which I will write as End('r). So, then the skolemized region 'r would be {START,SOME,NONE,END,End('r)}, more or less, and hence 'b would wind up as {START,SOME,END,End('r)}. The key point is that 'b (the borrow) still doesn't have to include NONE. This is actually a slightly different formulation than the approach I was working on before. Maybe it works better.

So, if we did something like that, how does the outlives relation work?

I think, within the CFG, you copy points over "as today", and the "skolemized end" points are not relevant to that bit. However, if the "source" (sub) region ('r, above) includes a skolemized end End('x) (it does, in this example), then the outlives relation holds only if the "target" (sup) region ('b, above) includes either that skolemized end point End('x), or some other skolemized end point End('y) where 'y: 'x is declared. (Put another way, we can kind of "elaborate" the end points to include the end points of things that are outlived.)

Does this make sense so far? Seem kind of like what you were thinking?

nikomatsakis commented 7 years ago

OK, I did a quick sort of prototype for the approach I was describing on this branch https://github.com/nikomatsakis/nll/tree/issue-12-free-regions. I've not had much time to play around with it yet, and it's missing some bits, but the key idea is that:

I am not sure if this is correct, particularly the final propagation rule, but it gives us something to play with and think about more thoroughly. One thing I want to revisit in particular is this example that @arielb1 gave earlier (to make sure it behaves as I would like it to):

fn foo<'a>(d: &'a mut Result<fn(), usize>) {
    fn innocent() {}
    *d = Ok(innocent);
    send_to_other_thread(&mut *d);
    loop {
        *d = Err(0x6861636b65642100);
    }
}

I think this will work out fine -- the borrow for send_to_other_thread must last for 'a, which includes the whole CFG, and hence will be still in scope at *d = Err(...) (this is related though to https://github.com/nikomatsakis/nll-rfc/issues/35).

arielb1 commented 7 years ago

I think "post-end skolemized variables" is a better name than "end points" - they aren't just points (if you "inlined" a function, they would turn into the subgraph of the CFG that contains the lifetime of the region after the function return) but can have arbitrary relationships between them.

When you have A: B @ P, if we find that the exit from the CFG is reachable from P via B, we add any skolemized end-points in B to A.

That sounds sound. If control flow can never reach the exit, all free regions are guaranteed to be live, and therefore equal.

RalfJung commented 7 years ago

Just a clarification question here, do you use "skolemized" as a synonym here for "free lifetimes", a.k.a. "the lifetimes quantified over in the function type"? (That does not at all match the way I would use "skolemized", so I am somewhat confused. ;) )

arielb1 commented 7 years ago

I'm using "skolemized" for what rustc calls "skolemized" actually closer to "Herbrandized"actually that depends on your POV - to mean free lifetimes that are supposed to be "closed over" by a universal quantifier we're trying to prove, and therefore can't acquire any "incidental" constraints (as opposed to lifetimes that are supposed to be "closed over" by universal quantifiers we're assuming/existential quantifiers we're proving, which can and do acquire any set of consistent constraints).

nikomatsakis commented 7 years ago

Heh. To be honest, I've been wondering myself why I'm using the term skolemized and not "free" all of a sudden. I guess I just think it's a fun word to say and not wholly inappropriate. =) Anyway, for the purpose of this discussion, I am referring to the named lifetime parameters that are in scope within a function.

(It is, also, probably useful to distinguish the skolemized lifetimes that we create when solving a higher-ranked trait bound, as I imagine we'll wind up treating them somewhat differently internally, even if they seem conceptually quite similar (albeit with no known relation to the CFG).)

RalfJung commented 7 years ago

as opposed to lifetimes that are supposed to be "closed over" by universal quantifiers we're assuming/existential quantifiers we're proving, which can and do acquire any set of consistent constraints

I see. In Coq lingo, I would call the free/skolemized lifetimes just "lifetime variables (in the current context)" -- things that are set by the context and that we do not have control over -- while the lifetimes we still have to pick are "lifetime evars". Anyway, now I understand what you are talking about :)

nikomatsakis commented 7 years ago

@arielb1 I agree that the "end regions" are not really points, I just found that the most convenient way to hack up the basic idea. I think it'd be clearer to treat them differently in the code (and description).