twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Globally Optimize Job-based TypedPipe.write calls #1791

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

Before this, in 0.18-develop we were calling the optimizer after each write. This not only makes a sub-optimal optimization, but also for long jobs with many writes at different stages makes optimization an O(N^2) operation (times how much it costs, which is significant, to optimize each set of TypedPipes).

This pattern comes up in summingbird, so it is a practical issue for us when trying to use scalding 0.18 with summingbird at Stripe.

ianoc-stripe commented 6 years ago

looks like a pretty straightforward win

lgtm