Closed johnynek closed 5 years ago
@ttim can you take a look at this?
@dieu on our internal test, this saved 30% CPU time and shuffle bytes. It was a job that clearly had two examples of this needless diamond, which motivated this, but you all may have a significant number of jobs at Twitter that will get a win by this optimization.
@johnynek wow, that's awesome, but we need to encourage @ttim to another release :)
one pattern that we see commonly on a system we have internally are graphs that look like:
Currently planners will see this as a fork of middle, each are mapped, then merged back together.
But these map-only operations after the fork until the merge could be done in a single flatMap operation:
That is exactly what this current optimization rule does.
cc @stephbian @ianoc