twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

improve spark merge handling #1904

Closed johnynek closed 5 years ago

johnynek commented 5 years ago

This should be merged after #1903 on which this is based.

The important code here is the unrollMerges code, which results in a single Union vs a series of Unions (and coalesce) calls.

We noticed this looking at some generated graphs.

CLAassistant commented 5 years ago

CLA assistant check
All committers have signed the CLA.

stephanie-stripe commented 5 years ago

cool, this lgtm