twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Optimize joins when the right side is summed #1785

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

Here we give a potentially very large performance optimization when joining a left side with many values against something that has 0 or 1 value. Here we can avoid recomputing the reduction on the right side, which in some cases will be pretty expensive when multiplied by the cardinality of the left.

This shows the value of our case class functions to allow us to take apart what we have done to make changes at plan time. This reduces, but not eliminates, the need to do #1743 since we should be able to see with extra work now where sums are occurring (as long as we are diligent in always making case class functions).

ianoc commented 6 years ago

lgtm