tweag / sparkle

Haskell on Apache Spark.
BSD 3-Clause "New" or "Revised" License
447 stars 30 forks source link

Support shipping anonymous inline-java objects. #106

Closed mboes closed 7 years ago

mboes commented 7 years ago

Just like in straight Java, it's perfectly legal in an inline-java quasiquote to create an object of anonymous class. The problem is, such an object can't be deserialized from any process that hasn't yet loaded the wappers for all quasiquotes, since it is the wrappers that "define" the anonymous class.

Spark executors can be given a task by the Spark driver that includes such anonymous objects. Without the InlineJavaRegistrator provided here, it is not possible to guarantee that the inline-java wrappers have been loaded prior to the task being deserialized.

The solution here consists in choosing the Kryo serializer. It's much faster than the default JavaSerializer that Spark uses anyways. KryoSerializer provides a crucial facility that JavaSerializer does not: class registration. Spark furthermore defines "registrator" classes that when invoked perform class registration, or indeed any arbitrary action. We provide an InlineJavaRegistrator to inline-java users, which abuses class registration to first load all wrappers. This happens on all executors prior to any work being performed.

Fixes #104.

mboes commented 7 years ago

That would work too. The current design makes inline-java opt-in, but that's not an important requirement.