Closed facundominguez closed 6 years ago
Work in progress: https://github.com/tweag/inline-java/compare/fd/batching
The next steps are:
I'm going to invest some time to fix distributed-closure to address one of the todo's in https://github.com/tweag/inline-java/pull/109.
Values of composite types like tuples and arrays require multiple calls to reify or reflect to marshal each value between Haskell and Java. When we have RDDs or Datasets containing these types, it is often cheaper to marshal multiple values together.
To be concrete, suppose we have an
rdd :: RDD (Double, Double)
, and we want to evaluateRDD.map (closure (static (\(x, y) -> (y, x))) rdd
. Marshaling a pair requires one reify and one reflect call per component, if we have n elements in therdd
, we do4*n
calls in total.If we instead marshal a batch of pairs, we can do one reify and one reflect call to marshal an array with the first components of all the pairs in the batch, and similarly for the second components. Thus we need only 2 reify calls and 2 reflect calls per batch. If
b
is the amount of pairs in a batch, we do4*n/b
calls.We also have this implemented in a private fork, waiting to be ported. It is implemented as a composable scheme, meaning that if one has a way to batch the components, then one can batch the composite type.