Open nikomatsakis opened 4 years ago
We ultimately landed on using the Approach A, as described here. First, the compiler produces the following input sets for polonius:
placeholder_origin(Origin)
known_subset(Origin1, Origin2) // for placeholders only
To optimize the naive rules using the outputs from the LI analysis, however, we can limit
contains
only propagate those loans in L_Error
subset(O1, O2)
to only propagate if
O1
and O2
are in O_Requited
The location insensitive analysis as currently specified doesn't compute the transitive subset
relation, only the transitive contains
relation. So we add a new transitive analysis for subset_placeholder
:
subset_placeholder(Origin1, Origin2) :-
placeholder_origin(Origin1),
subset_base(Origin1, Origin2).
subset_placeholder(Origin1, Origin3) :-
subset_placeholder(Origin1, Origin2),
subset_base(Origin2, Origin3).
then we can compute errors using subset_placeholder
and the original rule given in Approach A.
This yields the following results:
L_Error
that are involved in (potential) illegal access errorsO_Error
that are involved in (potential) subset errors (each O
where subset_error(O, _)
)O_Required
which:
L_Error
(using contains
)O_Error
regionsO_Error
(using subset_placeholder
)The naive analysis can be implemented using the rule originally given.
But we can also optimize the naive analysis based on the LI results:
contains
for loans in L_Error
subset(O1, O2, N)
if O1
and O2
are both in O_Required
The "optimized" analysis avoids computing the transitive subset and contains relationships in full. Instead, it computes them only when necessary. In particular, when an origin goes dead, it will remove that origin, but also compute the transitive closure of subset relations and things that origin may have been involved in, so that if you have O1: O2: O3
and O2
goes dead, we can add a subset(O1, O3)
tuple (we will also add any loans found in O2
to O3
at this time). This means that the optimized errors never have a complete picture of which loans belong to which origins, and we also don't know the full set of "superset origins".
This works out well for illegal access errors: for that purpose, we only care which loans are live, we don't care which origin they are found in. So we don't need to know those full sets. Also, this can be optimized using LI results in the same fashion as naive.
However, for subset errors, this is a problem. We can't just lean on the subset
relation because it is incomplete. (Similarly, if we were using Alternative B, we couldn't just rely on the contains
relation to contain the placeholder loans.)
Therefore, we choose to build up a separate, transitive relation that propagates placeholder origins forward:
subset_placeholder(Origin1, Origin2, Node) :-
placeholder_origin(Origin1),
subset(Origin1, Origin2, Node).
subset_placeholder(Origin1, Origin3, Node) :-
subset_placeholder(Origin1, Origin2),
subset(Origin2, Origin3, Node).
For efficiency, this relation can be filtered as well. We only need to consider the placeholder origins in O_Error
, and we only need to propagate them into origins that are members of O_Required
.
So @lqd and I had a big discussion today and resolved the major open questions with respect to reporting "subset errors". In short, we decided to adopt the approach laid out in the original smallcultfollowing blog post, but with a bunch of other details. =) This issue tracks the implementation of that work and explains the rationale.
The problem
Rust functions contain "placeholder origins" like
'a
and'b
in this example:We want this code to be an error. But the error is not due to an illegal access error -- it is because data with the origin
'a
is returned with the origin'b
, and there is no'a: 'b
where clause.Alternative A: Use subsets and declare errors
The original blog post proposed an approach to solving this problem where we looking for a subst relation
subset('a, 'b, _)
that involves two placeholder origins'a
and'b
, and then compare that against a set of "known subsets" (derived from the where clauses). If we find a subset relation that is not known to be valid, that's an error. This can (ultra naively) be expressed with a single rule:Alternative B
The main competitor that we were considering was the idea of using 'placeholder loans'. In this approach, for each placeholder origin, we have a corresponding placeholder loan. We can then compute the
subset
andcontains
relations as normal, and declare an error if a placeholder loan finds its way into a placeholder origin where it doesn't belong (i.e., in the above example, the placeholder origin for'b
would contain the loan for'a
). So the error rule would like something like this:but we also need some rules like
Consideration: Elegance
Naturally we want to find relatively minimal rules that are clean and easy to understand. The subset approach largely wins on this approach, but as we'll see later there are a few complications.
Consideration: Origins are just sets of loans, right?
One of the original motivations for pursuing the "placeholder loan" approach was that we wanted to be able to treat origins as if they were just sets of loans and didn't have an identity of their own. However, in practice, we can't quite do this anyway, because we have to track
subset
relations forward through the graph (see https://github.com/rust-lang/polonius/issues/107 for related discussion about that). So it seems like either way origins have some identity of their own in the polonius framework.Consideration: Location Insensitive analysis
One of the things we were considering is the ability to optimize the naive and optimized analysis based on a preliminary location insensitive run. This location insensitive run will produce results that can be used to limit the transitive closure computations and make everything more performant.