SparkLocalCallbackSink is collecting output through inputRdd.toLocalIterator() which could be optimal where execution will continue in the same executor while the local callback sink will eventually send all collected data to driver node, so inputRdd.collect() is more appropriate and also with running real workloads is proving that collect is more performant.
SparkLocalCallbackSink
is collecting output throughinputRdd.toLocalIterator()
which could be optimal where execution will continue in the same executor while the local callback sink will eventually send all collected data to driver node, soinputRdd.collect()
is more appropriate and also with running real workloads is proving that collect is more performant.