rust-lang / datafrog

A lightweight Datalog engine in Rust
Apache License 2.0
796 stars 43 forks source link

Disjunctive (OR) *filters* #38

Open ecstatic-morse opened 3 years ago

ecstatic-morse commented 3 years ago

@lqd opened rust-lang/polonius#157 a while ago, which solves the Location::All problem in what I think is the "correct" way. Essentially, it transforms all occurrences of origin_live_at(O, P) in rule bodies into (origin_live_at(O, P) OR placeholder(O)). In other words, placeholder regions are live at all points in the CFG.

Unfortunately it's actually kind of difficult to express that simple idea as a datafrog program with the current API. The PR manually expanded each rule body into two (one for each side of the OR), but this leads to code that is really difficult to maintain. Another option would be to create an intermediate Relation, origin_live_at_or_placeholder(O, P) defined below, to hold the disjunction. That would waste memory, however, and we would also need a new is_point relation that holds all possible points in the CFG.

origin_live_at_or_placeholder(O, P) :- origin_live_at(O, P).
origin_live_at_or_placeholder(O, P) :- placeholder(O), is_point(P).

Ideally, we would express this disjunction directly as a leaper. This is possible, but is more work than you might expect. An impl that's generic over Leapers won't work, since (among other things) there's no way to implement a disjunctive intersect efficiently with the current Leaper interface. I think you could do something like the following, but you'd need to handle heterogeneous Value types as well:

struct<A, B> LeaperOr(A, B);

/* This is the combination of concrete leapers we need, though ideally all combinations would be implemented.*/
impl<...> Leaper<...> for LeaperOr<ExtendWith<...>, FilterWith<...>> {
    /* ... */
}

Obviously, this doesn't scale, but maybe it's okay to implement just what we need for rust-lang/polonius#157? The alternative is to adjust the Leaper interface so that it composes better, but I don't see a straightforward way to do this without resorting to boxed trait objects, which is not an option since they would be dispatched in a tight loop (GATs + RPIT in trait fns would solve this, however).

ecstatic-morse commented 3 years ago

@regexident Is this something that interests you? I think it's feasible, but definitely not trivial. I'm willing to help, of course. We should wait for @lqd to confirm that rust-lang/polonius#157 is still the way forward, and that this would actually help them.

lqd commented 3 years ago

We unfortunately didn't manage to talk about rust-lang/polonius#157 at the last sprint, but hopefully we may be able to at the next one. (It's also still WIP in that it also lacks the required antijoins that are currently implicit: there are more details in the PR description).

It looks like the correct way forward to me too. The maintainability of these manual disjunctive queries is its main drawback, and that would be greatly improved if datafrog offered an API like this issue suggests.

(Just for clarification to @ecstatic-morse, rust-lang/polonius#157 doesn't fix the "Location::All problem" even though it's of course related to the domain of "things that are true at all points of the CFG". The PR would fix a small inefficiency in the liveness of placeholders by removing the materialization with disjunctive queries. The Location:All problem -- whose name is not at all descriptive enough as you'll see -- is in rustc fact generation and unrelated to placeholders or liveness: when type ascription is involved, the outlives relationships are emitted at all points in the CFG. Sometimes this OOMs for big CFGs with many subsets, sometimes this just causes many duplicates to be computed and propagated in the TC, just to be deduped at the next CFG point.)

ecstatic-morse commented 3 years ago

Ah, okay. I guess I conflated "placeholder origin is live at all points" with "placeholder origin subset constraint must hold at all points". It seemed like the first would help the second, since subset constraints propagate when an origin is live, but we already make placeholder origins live everywhere (by generating a tuple for every point in the control-flow graph) and that hasn't solved rust-lang/polonius#24. My bad.

ecstatic-morse commented 3 years ago

Ugh. In addition to being wrong about the motivation for rust-lang/polonius#157, I was also wrong about what kind of leaper we need to express its rules for placeholder liveness. When we have a rule like:

live_to_dying_regions(origin1, origin2, point1, point2) :-
  subset(origin1, origin2, point1), % main input 
  cfg_edge(point1, point2), % Leaper 1
  (origin_live_on_entry(origin1, point2); placeholder_origin(origin1)), % Leaper 2
  !origin_live_on_entry(origin2, point2). % Leaper 3

Leaper 2 (the disjunction) cannot propose any values because point2 is only bound on one side. In the current framework, it would have to be a ValueFilter, and Leaper 1 would be exclusively responsible for proposing values. This is fine, since cfg_edge should almost always be a better choice to propose than origin_live_on_entry.

So basically, all that's needed is to implement an adapter that can take two ValueFilters and OR them together. The Leaper interface is still not suitable for this, but the idea should be pretty easy to implement manually. At some point we'll want to implement caching like in #35 as well.

ecstatic-morse commented 3 years ago

Hey @regexident, let me know if you want to work on this. We will need the ability to compose ValueFilters as described in https://github.com/rust-lang/datafrog/issues/38#issuecomment-904060973 going forward, but it'll take me another week or two to get around to it. I'm happy to write some corrected mentoring instructions if you're interested. This change shouldn't be too hard, but you'll need to do some refactoring to make caching work.

regexident commented 3 years ago

Hey @ecstatic-morse,

sorry for not getting back to you in a more timely way. I'd love to give it a shot! I would probably need some hand-holding/mentoring on the semantics side of things though as datafrog is my very first contact with Datalog.

ecstatic-morse commented 2 years ago

No worries. I'm not a model for availability myself.

Let's talk on Zulip.