rheem-ecosystem / rheem

Rheem - a cross-platform data processing system
https://rheem-ecosystem.github.io
5 stars 0 forks source link

fixing sparkLocalCallback sink latency. #81

Closed atroudi closed 4 years ago

atroudi commented 6 years ago

SparkLocalCallbackSink is collecting output through inputRdd.toLocalIterator() which could be optimal where execution will continue in the same executor while the local callback sink will eventually send all collected data to driver node, so inputRdd.collect() is more appropriate and also with running real workloads is proving that collect is more performant.