twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

Make an ADT for CoGrouped #1698

Closed johnynek closed 7 years ago

johnynek commented 7 years ago

CoGrouped was the last scalding typed trait to be a black box that we cannot look inside at planning.

This change continues to expose the old methods so you can use the direct composition of CoGrouped, or you can take them apart and re-assemble in a backend as may be more appropriate with global information or for particular compute substrates.

After this is merged, we can pretty easily add join support to #1697 without having to directly support N-way joins on the memory backend.

This will also be useful for spark, which does not support N-way joins in the RDD API.

johnynek commented 7 years ago

cc @fwbrasil

johnynek commented 7 years ago

A normal user of scalding would never use this for anything that comes to mind now.

Someone writing some specific optimizer of scalding TypedPipes would use it, or someone writing a new scalding backend would use it.

Pretty much everyone else should never notice anything.