Add configurable partition count for spark

twitter / scalding

A Scala API for Cascading

http://twitter.com/scalding

Apache License 2.0

3.48k stars 703 forks source link

Add configurable partition count for spark #1903

Closed stephbian closed 5 years ago

stephbian commented 5 years ago

@johnynek and I paired to add some configuration so that we can limit the number of partitions and reducers used when running a job using scalding-spark. Our hope is that this will allow us to reduce the total size of results that are being sent back to the driver, which is currently causing us some pain.

CLAassistant commented 5 years ago

All committers have signed the CLA.

oscar-stripe commented 5 years ago

This looks great. Let's see if it solves our problems on the cluster, then maybe add the tests I suggest.

stephbian commented 5 years ago

I think I've addressed all comments here. PTAL @johnynek @ianoc

johnynek commented 5 years ago

build failure, yay tests!

https://travis-ci.org/twitter/scalding/jobs/511118778#L3307

johnynek commented 5 years ago

Thank you!