rust-lang / polonius

Defines the Rust borrow checker.
Apache License 2.0
1.35k stars 73 forks source link

subset errors #153

Open nikomatsakis opened 4 years ago

nikomatsakis commented 4 years ago

So @lqd and I had a big discussion today and resolved the major open questions with respect to reporting "subset errors". In short, we decided to adopt the approach laid out in the original smallcultfollowing blog post, but with a bunch of other details. =) This issue tracks the implementation of that work and explains the rationale.

The problem

Rust functions contain "placeholder origins" like 'a and 'b in this example:

fn foo<'a, 'b>(x: &'a u32) -> &'b u32 { x }

We want this code to be an error. But the error is not due to an illegal access error -- it is because data with the origin 'a is returned with the origin 'b, and there is no 'a: 'b where clause.

Alternative A: Use subsets and declare errors

The original blog post proposed an approach to solving this problem where we looking for a subst relation subset('a, 'b, _) that involves two placeholder origins 'a and 'b, and then compare that against a set of "known subsets" (derived from the where clauses). If we find a subset relation that is not known to be valid, that's an error. This can (ultra naively) be expressed with a single rule:

subset_error(Origin1, Origin2) :-
    subset(Origin1, Origin2, _),
    placeholder_region(Origin1),
    placeholder_region(Origin2),
    !known_subset(Origin1, Origin2).

Alternative B

The main competitor that we were considering was the idea of using 'placeholder loans'. In this approach, for each placeholder origin, we have a corresponding placeholder loan. We can then compute the subset and contains relations as normal, and declare an error if a placeholder loan finds its way into a placeholder origin where it doesn't belong (i.e., in the above example, the placeholder origin for 'b would contain the loan for 'a). So the error rule would like something like this:

subset_error(Origin1, Origin2) :-
    contains(Origin2, Loan1, _),
    placeholder_origin(Origin1, Loan1),
    placeholder_origin(Origin2, Loan2),
    !known_subset(Origin1, Origin2).

but we also need some rules like

contains(Origin1, Loan1, Node) :-
    cfg_node(Node),
    placeholder_origin(Origin1, Loan1).

Consideration: Elegance

Naturally we want to find relatively minimal rules that are clean and easy to understand. The subset approach largely wins on this approach, but as we'll see later there are a few complications.

Consideration: Origins are just sets of loans, right?

One of the original motivations for pursuing the "placeholder loan" approach was that we wanted to be able to treat origins as if they were just sets of loans and didn't have an identity of their own. However, in practice, we can't quite do this anyway, because we have to track subset relations forward through the graph (see https://github.com/rust-lang/polonius/issues/107 for related discussion about that). So it seems like either way origins have some identity of their own in the polonius framework.

Consideration: Location Insensitive analysis

One of the things we were considering is the ability to optimize the naive and optimized analysis based on a preliminary location insensitive run. This location insensitive run will produce results that can be used to limit the transitive closure computations and make everything more performant.

nikomatsakis commented 4 years ago

Design we landed on

We ultimately landed on using the Approach A, as described here. First, the compiler produces the following input sets for polonius:

placeholder_origin(Origin)
known_subset(Origin1, Origin2) // for placeholders only

To optimize the naive rules using the outputs from the LI analysis, however, we can limit

nikomatsakis commented 4 years ago

Location insensitive (LI)

The location insensitive analysis as currently specified doesn't compute the transitive subset relation, only the transitive contains relation. So we add a new transitive analysis for subset_placeholder:

subset_placeholder(Origin1, Origin2) :-
    placeholder_origin(Origin1),
    subset_base(Origin1, Origin2).
subset_placeholder(Origin1, Origin3) :-
    subset_placeholder(Origin1, Origin2),
    subset_base(Origin2, Origin3).

then we can compute errors using subset_placeholder and the original rule given in Approach A.

This yields the following results:

nikomatsakis commented 4 years ago

Naive, plus Naive optimized with Location Insensitive results

The naive analysis can be implemented using the rule originally given.

But we can also optimize the naive analysis based on the LI results:

nikomatsakis commented 4 years ago

Optimized

The "optimized" analysis avoids computing the transitive subset and contains relationships in full. Instead, it computes them only when necessary. In particular, when an origin goes dead, it will remove that origin, but also compute the transitive closure of subset relations and things that origin may have been involved in, so that if you have O1: O2: O3 and O2 goes dead, we can add a subset(O1, O3) tuple (we will also add any loans found in O2 to O3 at this time). This means that the optimized errors never have a complete picture of which loans belong to which origins, and we also don't know the full set of "superset origins".

This works out well for illegal access errors: for that purpose, we only care which loans are live, we don't care which origin they are found in. So we don't need to know those full sets. Also, this can be optimized using LI results in the same fashion as naive.

However, for subset errors, this is a problem. We can't just lean on the subset relation because it is incomplete. (Similarly, if we were using Alternative B, we couldn't just rely on the contains relation to contain the placeholder loans.)

Therefore, we choose to build up a separate, transitive relation that propagates placeholder origins forward:

subset_placeholder(Origin1, Origin2, Node) :-
    placeholder_origin(Origin1),
    subset(Origin1, Origin2, Node).
subset_placeholder(Origin1, Origin3, Node) :-
    subset_placeholder(Origin1, Origin2),
    subset(Origin2, Origin3, Node).

For efficiency, this relation can be filtered as well. We only need to consider the placeholder origins in O_Error, and we only need to propagate them into origins that are members of O_Required.