uwescience / raco

Compilation and rule-based optimization framework for relational algebra. Raco is the language, optimization, and query translation layer for the Myria project.
Other
72 stars 19 forks source link

ideas for a new Rules abstraction? #339

Open dhalperi opened 9 years ago

dhalperi commented 9 years ago

tag @7andrew7 @bmyerz

I know we have all had some issues dealing with the way rules are run in the past. In particular, I think we have been frustrated over the lack of control -- rules are always run recursively, and top down. Since rules are objects rather than factory functions, we can't use state in a rule itself. Etc.

I was thinking it might soon be time to try to abstract the rules better. Some candidate steps that seem likely good to me:

billhowe commented 9 years ago

Seems GREAT to refactor rules!!

I worry about giving individual rules control over how they are run, however. Seems hard to reason about the overall semantics. But I imagine there are use cases that motivate it -- can we enumerate a few of those as a first step?

On Wednesday, September 17, 2014, Daniel Halperin notifications@github.com wrote:

tag @7andrew7 https://github.com/7andrew7 @bmyerz https://github.com/bmyerz

I know we have all had some issues dealing with the way rules are run in the past. In particular, I think we have been frustrated over the lack of control -- rules are always run recursively, and top down. Since rules are objects rather than factory functions, we can't use state in a rule itself. Etc.

I was thinking it might soon be time to try to abstract the rules better. Some candidate steps that seem likely good to me:

  • add a .get(**kwargs) method to each rule, so that rule behavior can be controlled at invocation time.
  • enable rules to control their execution. E.g., rules can extend abstract classes that imply how they are run: PreOrderRecursiveRule, a PostOrderRecursiveRule, a RunOnceRule (that presumably controls tree traversal internally) etc.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/339.

dhalperi commented 9 years ago
  1. My "PushApply/RemoveUnusedColumns/RemoveNoOpApply" rules are paired weirdly. I really want to implement both rules in a single pass. This requires me to work from the top down in a stateful way. I would also like to be able to execute code on the way back up in a stateful way so that, e.g., I can minimize the number of "extra" Apply operators.
  2. Some rules just make much more sense when applied bottom up. E.g., a rule to fill in cardinality estimates or schemes. When we do these things top down we do a ton of redundant work.
dhalperi commented 9 years ago
  1. Andrew had to pull some crazy shenanigans with PushSelects to avoid infinite loops. He will have to fill in details.

(oops, that was supposed to be a 3, but github is too smart for its own good).

dhalperi commented 9 years ago

The bottom line is that top-down, unconditional recursion is problematic.

bmyerz commented 9 years ago

Simple problem related to infinite loops: when you have if Bar then Foo(Bar) rule, you get an infinite loop.

My way of avoiding it is encode the logic "Don't apply me to my output" as a flag on the new Bar.

Maybe we can do better. I like @dhalperi 's idea of encapsulating some of these behaviors in base or mixin classes for rules.

bmyerz commented 9 years ago

Ltac is a good (as in widely adopted) language for expressing rewrites of goals of Coq proofs. Good intro to Ltac tactics is here http://adam.chlipala.net/cpdt/html/Match.html.

Some takeaways (I think the analogies for us are tactic=rule and goal=relational_plan)

billhowe commented 9 years ago

Got it!

On Wednesday, September 17, 2014, Daniel Halperin notifications@github.com wrote:

1.

My "PushApply/RemoveUnusedColumns/RemoveNoOpApply" rules are paired weirdly. I really want to implement both rules in a single pass. This requires me to work from the top down in a stateful way. I would also like to be able to execute code on the way back up in a stateful way so that, e.g., I can minimize the number of "extra" Apply operators. 2.

Some rules just make much more sense when applied bottom up. E.g., a rule to fill in cardinality estimates or schemes. When we do these things top down we do a ton of redundant work.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/339#issuecomment-55961670.

bmyerz commented 9 years ago

This is a fundamental paper on solving the phase ordering problem in program optimization from Zach's thesis advisor. I haven't thought enough about the particular problems stated above to know if it is applicable, but it seems like good related reading.

Composing Dataflow Analyses and Transformations

tl;dr is Two conventional solutions to the phase ordering problem are 1) building super analyses out of multiple passes (e.g., "I really want to implement both rules in a single pass") and 2) fixpoint iteration of rule application (which we currently rely on).

The paper demonstrates an interface for defining optimizations to make them more modular (specifically merge the pattern recognition with the transformation). Roughly, it works by letting an optimization make optimistic assumptions. If later optimizations invalidate the assumptions, then transformations can be rolled back.

billhowe commented 9 years ago

That seems SUPER relevant!

On Mon, Dec 22, 2014 at 4:54 PM, Brandon Myers notifications@github.com wrote:

This is a fundamental paper on solving the phase ordering problem in program optimization from Zach's thesis advisor. I haven't thought enough about the particular problems stated above to know if it is applicable, but it seems like good related reading.

Composing Dataflow Analyses and Transformations

tl;dr is Two conventional solutions to the phase ordering problem are 1) building super analyses out of multiple passes (e.g., "I really want to implement both rules in a single pass") and 2) fixpoint iteration of rule application (which we currently rely on).

The paper demonstrates an interface for defining optimizations to make them more modular. Roughly, it works by letting an optimization make optimistic assumptions. If later optimizations invalidate the assumptions, then transformations can be rolled back.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/339#issuecomment-67910641.

Bill Howe Associate Director and Senior Data Science Fellow, UW eScience Institute Affiliate Faculty, Computer Science & Engineering University of Washington To acknowledge eScience: "Supported in part by the University of Washington eScience Institute"

bmyerz commented 8 years ago

Equality Saturation avoids the need to decide on the order rewrites are applied. I have a demo of it for the Raco algebra.

billhowe commented 8 years ago

Wow!! I will run this tonight....

On Thursday, February 4, 2016, Brandon Myers notifications@github.com wrote:

Equality Saturation http://www.cs.cornell.edu/%7Eross/publications/eqsat/ avoids the need to decide on the order rewrites are applied. I have a demo of it for the Raco algebra https://github.com/uwdb/PolyPEG/tree/master/clothespin.

— Reply to this email directly or view it on GitHub https://github.com/uwescience/raco/issues/339#issuecomment-180154504.