twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

re-enable Dag Optimizer to strip names #637

Closed johnynek closed 8 years ago

johnynek commented 8 years ago

This reverts #610 except with simpler, and much better tested code (which is to say, actual tests and tests which exposed bugs).

This adds more tests, for e.g. #633

This does not yet enable turning on all the optimizations + naming, as you have to be more careful about "irreducibles", which can change under some optimizations (namely, composing functions). But it is enough to remove the code duplication between StripNameNodes and the DagOptimizer which can do the same thing in a type-safe and general way.

I'd like to get this merged, then tackle keeping names correct under general optimizations, which I think is possible. This will dramatically simplify implementing platforms if we can successfully get the optimizations + names working correctly.

Of course, the names could control potentially optimizations (since named options can be options for optimizations), but we can deal with that down the line (probably, by passing the options into the dag optimizer).

johnynek commented 8 years ago

The storm tests are so damn flakey. We really need to solve #605

ianoc commented 8 years ago

LGTM, merge when green