Closed facundominguez closed 7 years ago
Note that this problem isn't specific to inline-java: it's a problem for any use of JNI's defineClass
, which Spark currently provides no way of performing preemptively at initialization time. It sounds to me like this is an upstream issue. Which we could work around in various ways in inline-java for that particular special case.
Which we could work around in various ways in inline-java for that particular special case.
Indeed, any preference?
Since as noted above this is an upstream issue with Spark itself, I have a preference for keeping any workaround in sparkle. We could remove the workaround once the ticket you mention above is resolved. Inline code is just stubs and these stubs are best kept in the executable itself. No one other than the executable should see these stubs, nor should they be able to call them. And that way we don't need to parameterize the java
QQ with a gazillion (aka 1-3) options whose combinations are hard to test exhaustively.
The JIRA ticket you mention includes comments from several folks who successfully hooked into JavaSerializer
. We could call loadJavaWrappers
once (or all the time), from the serializer. Did the "epic struggle" you mention in inline-java#62 include that already?
I didn't try it. Apparently we need to
sparkConf.set("spark.serializer","our.serializer.class.name")
I don't see how this can be parameterized by the serializer that spark currently uses. We might have to define a different wrapper for each serializer we ever want to use.
This is not quite as usable as sparkle users would need. When the classes that loadJavaWrappers
loads depend on classes in sparkle.jar, the class loader can't find them when loadJavaWrappers
is invoked in InlineJavaRegistrator.java
.
Could we please not unearth old issues from the dead and instead create a new one?
doesn't work on multi-node setups.
The problem is that an executor receives the serialized object
new Function<Object,Object>() { Object call(Object x){ return x;} }
, and in order to deserialize it, it needs to load the anonymous class to which it belongs.The executor then notices that no jar and no class in the classpath contains the class definition and therefore it fails. Where is the class then? Currently inline-java embeds the bytecode in the Haskell executable. The embedded bytecode is sent to the JVM at runtime by
Language.Java.Inline.loadJavaWrappers
. But this function is never called on executors.The ideal fix would be for spark to provide some startup hooks, so
Language.Java.Inline.loadJavaWrappers
can be called when the executor starts. But this feature is not implemented.Calling
loadJavaWrappers
when sparkle loads is no good, because upon receiving the serialized object, the executor has no clue that it needs to load sparkle in order to have the class defined.The only workaround I've found so far, is to dump the
.class
files that inline-java produces into a folder and add them to the sparkle application jar. https://github.com/tweag/inline-java/issues/62Any preferences on how to better deal with this?