Batched marshaling for streams and vectors

facundominguez commented 6 years ago

Addresses the inline-java part of https://github.com/tweag/sparkle/issues/124.

facundominguez commented 6 years ago

Packages using jvm-streaming now need to point to the jvm-batching.jar when building and running.

facundominguez commented 6 years ago

Open question: how do we make the batch size of streams or vectors a parameter?

facundominguez commented 6 years ago

We may want to make the batches explicit in sparkle. For instance:

mapIterator
  :: (ReifyBatcher a, ReflectBatcher b)
  => Int
  -> (Stream (Of (Vector a)) IO () -> Stream (Of (Vector b)) IO ())
  -> Dataset a
  -> Dataset b

where the Int parameter gives the size of the batches in the input, and the size of the batches in the output is controlled by the user-supplied function.

This has the advantage that we can tell the user to not leave a batch half consumed before producing and output batch if the type a contains local references. Otherwise, the control may return to java invalidating those values.

facundominguez commented 6 years ago

Addressed feedback provided in a private discussion.

facundominguez commented 6 years ago

The last commit allows building packages even if they have dependencies on jars produced by Haskell dependencies. The user is responsible for pulling all the necessary dependencies in the Setup.hs script.

With this change, we can keep jvm-streaming in hackage.

tweag / inline-java

Batched marshaling for streams and vectors #109