twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Add back some optimizations we had in 0.17 #1774

Closed johnynek closed 6 years ago

johnynek commented 6 years ago

This is related to #1736 and #1669

This turns back on two optimizations we had before:

  1. do filters on Iterable pipes before sending to cascading.
  2. defer merges so we can combine more operations into single cascading nodes.

We are still lacking the a ++ a == a.flatMap { v => List(v, v) } which can avoid some cases of allocating cascading Merge nodes at all. It is not clear how useful that optimization actually is in practice however.

cc @fwbrasil maybe this fixes some of the issues you saw with different behavior in 0.18.

ianoc commented 6 years ago

this looks good to me once CI is happy. 2 small questions just about commented/test changes that look like might have been for faster turn around locally?