Closed johnynek closed 7 years ago
👍 looks good to me.
about the question @ttim posed: why do we have Rule
on N
rather than Literal
. The reason is that it is much easier to write rules in terms of the underlying nodes.
See the summingbird rules: https://github.com/twitter/summingbird/blob/develop/summingbird-core/src/main/scala/com/twitter/summingbird/planner/DagOptimizer.scala#L354
There are pretty easy to write and read on the original nodes, but on Literal
they would be almost impossible since you need to match on the functions that are embedded in the Unary
or Binary
operations.
@johnynek I'm not strongly against, I see a point to merge it as is and do some follow-ups.
About Rule
s on N[_]
- I'm totally agree it's simpler, what I don't understand (and that's where my mental model breaks) - if you expressed change of dag structure on the level of N[_]
how this can be reflected on dag itself?
This moves the summingbird graph code into a zero dependency module in scalding.
This will allow us to later delete it from summingbird and depend on it here if we choose. Also it allows us to use the same graph code to do transformations and optimizations of the TypedPipe graphs.
After this is merged, my next step is to use this to make the new cascading backend better (look for fan-out nodes that the user did not use
fork
on, so that we avoid recomputation. We can also use this code to apply some optimization rules (e.g. pushing upfilterKeys
on TypedPipe into the previous step, which we can't currently do).