rust-lang / polonius

Defines the Rust borrow checker.
Apache License 2.0
1.35k stars 73 forks source link

possible refinement to avoid tracking subset relations #107

Open nikomatsakis opened 5 years ago

nikomatsakis commented 5 years ago

Current status: As it stands, this idea is flawed. We're leaving the issue open because there might be a kernel of an idea here, but it'll take some refinement to find it.


As discussed on Zulip, @aatxe and I realized that we can refine the polonius analysis to be more precise and (potentially) faster.

The key idea is that we do not need to propagate the subset relation forward through time (i.e., across the control-flow graph). The important thing is for us to distinguish instantaneous flow of data from persistent flow. In our current setup, persistent flow corresponds to invariance -- basically cycles in the subset relation. (I think this is an interesting observation and points to there being some kind of refinement possible to be more precise, but that's neither here nor there for the moment).

As an example, the following assignment represents an instantaneous flow of data:

let p: &'p u32 = ...;
let q: &'q u32 = p;

This would create a 'p: 'q subset relation (read as 'p flows into 'q). But the point is that this occurs once. If 'p later comes to refer to new loans somehow, that doesn't affect 'q.

In contrast, with a mutable reference, you do want the effects to propagate:

let v: Vec<&'v u32> = vec![];
let p: &mut Vec<&'p u32> = &mut v;
p.push(...); // any changes to 'p here *should* affect 'v

The rough idea then is to say that when you have equality of two regions (in this case, of 'v and 'p) polonius should propagate that forward, but not a mere subset relation. Another way to think of it is that v and p ought to have been given the same region in the first place, in order for the type check to succeed:

let v: Vec<&'v u32> = vec![];
let p: &mut Vec<&'v u32> = &mut v;
p.push(...);

Anyway, we ought to be able to prototype this like so:

subset(R1, R2, P) :- subset_base(R1, R2, P).
subset(R1, R3, P) :- subset(R1, R2, P), subset_base(R2, R3, P).
equal(R1, R2) :-
  subset(R1, R2, P), // this is the transitive relation
  subset(R2, R1, P).
contains(R1, L1, P) :-
  equals(R1, R2),
  contains(R2, L1, P).

contains(R1, L1, P) :- borrow_region(R1, L1, P).

contains(R2, L1, P) :- 
  contains(R1, L1, P),
  subset(R1, R2, P).

I .. think that's it?

nikomatsakis commented 5 years ago

I realized an interesting example last night -- it may be that this analysis is itself not as precise as it could be. The following example (playground) is rejected by today's NLL and I think would be rejected by the formulation I described above, but is accepted by today's polonius. In order to accept it, we would have to make equal be tied to program points -- the tricky part here is that down one branch of the if, we equate the regions in p and a, and down the other half, we equate the regions in p and b. When we introduce a borrow into p, if equate is not flow-sensitive, it affects both a and b when it really only needs to affect one or the other.

#![allow(warnings)]

fn main() {
    let mut a: Vec<&u32> = vec![];
    let mut b: Vec<&u32> = vec![];
    let mut c: Vec<&u32> = vec![];
    let mut p: &mut Vec<&u32> = &mut c;
    let mut i = 22;
    if foo() {
        p = &mut a;
        p.push(&i);
        drop(a);
        i += 1; // `i` is no longer borrowed, ok
        drop(b);
    } else {
        p = &mut b;
        p.push(&i);
        drop(b);
        i += 1; // `i` is no longer borrowed, ok
        drop(a);
    }
}

fn foo() -> bool {
    true
}
nikomatsakis commented 4 years ago

We've done some more digging into this and it appears that the idea, as described, is somewhat flawed.

First of all, I want to link to this playground which contains the original motivational example (or a variation of it). This was buried in the Zulip thread above.

In any case, we ran into a problem with the promoted bounds example. Digging into the problem in this Zulip thread, we found that the problem is kind of "foundational" in the way that this issue is described. In particular, there are cases where we do have to propagate "subset" relations forward, because they establish important relationships between otherwise independent types and not just "instantaneous effects".

Here is a variant of the "promoted bounds" example (playground) that I found identified the problem even more crisply:

fn shorten_lifetime<'a, 'b, 'min>(a: &'a i32, b: &'b i32) -> &'min i32 { ... }

fn main() {
    let promoted_fn_item_ref: fn(_, _) -> _ = shorten_lifetime;

    let a = &5;
    let ptr = {
        let l = 3;
        let b = &l; //~ ERROR does not live long enough
        let c = promoted_fn_item_ref(a, b);
        c
    };

    drop(ptr);
}

So what is happening here? The interesting part is this first line:

    let promoted_fn_item_ref: fn(_, _) -> _ = shorten_lifetime;

What is going to happen here is that we are going to assign promoted_fn_item_ref a type like fn(?A, ?B) -> ?C. And then we are going to, based on the signature of shorten_lifetime, infer those types to be &'a i32, &'b i32, and &'c i32, respectively. Crucially, we are also going to establish a subset relationship such that 'a: 'c and 'b: 'c (actually we'll do that somewhat indirectly, with a bunch of annoying and hard to track intermediate origins, but it should occur regardless). These relationships come from the where clauses on shorten_lifetime, in particular.

What's important is that the fn() type itself doesn't carry any relationships between its arguments, those relationships are all carried through the where clauses (and the subset relations) that we instantiated in the first line.

In the equality model, those subset relationships would be "instantaneously" enforced on the first line, but that's not good enough, they have to persist until the function is called. On the first line, 'a, 'b and 'c are all an empty set of loans, but they will get populated later on.

You could construct a similar example, I imagine, using a data structure. The idea would be that you have some kind of Vec<A, B> such that A values can be "transformed" somehow to B values, and there is a "transformer" between them, but we establish the transformer early on and then rely on it later on.