jruby-gradle / jruby-gradle-storm-plugin

JRuby Gradle plugin to manage creating Storm topology jars
MIT License
1 stars 3 forks source link

New artifact layout has issues with classloading from embedded jars on supervisors #25

Closed rtyler closed 9 years ago

rtyler commented 9 years ago

In an older Storm cluster, where storm-kafka isn't already baked in, I'm having trouble getting a topology which includes the storm-kafka jar to start properly on supervisor nodes.

The classpath and everything is correct enough to submit and load successfully (on-submit does resolve and make use of storm-kafka).

java.lang.RuntimeException: java.lang.ClassNotFoundException: storm.kafka.KafkaSpout
        at backtype.storm.utils.Utils.deserialize(Utils.java:95) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.utils.Utils.getSetComponentObject(Utils.java:235) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.task$get_task_object.invoke(task.clj:73) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.task$mk_task_data$fn__3061.invoke(task.clj:180) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.util$assoc_apply_self.invoke(util.clj:816) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.task$mk_task_data.invoke(task.clj:173) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.task$mk_task.invoke(task.clj:184) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.executor$mk_executor$fn__5510.invoke(executor.clj:321) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.core$map$fn__4207.invoke(core.clj:2485) ~[clojure-1.5.1.jar:na]
        at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
        at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
        at clojure.lang.RT.seq(RT.java:484) ~[clojure-1.5.1.jar:na]
        at clojure.core$seq.invoke(core.clj:133) ~[clojure-1.5.1.jar:na]
        at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30) ~[clojure-1.5.1.jar:na]
        at clojure.core.protocols$fn__6026.invoke(protocols.clj:54) ~[clojure-1.5.1.jar:na]
        at clojure.core.protocols$fn__5979$G__5974__5992.invoke(protocols.clj:13) ~[clojure-1.5.1.jar:na]
        at clojure.core$reduce.invoke(core.clj:6177) ~[clojure-1.5.1.jar:na]
        at clojure.core$into.invoke(core.clj:6229) ~[clojure-1.5.1.jar:na]
        at backtype.storm.daemon.executor$mk_executor.invoke(executor.clj:321) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at backtype.storm.daemon.worker$fn__5940$exec_fn__1396__auto____5941$iter__5946__5950$fn__5951.invoke(worker.clj:375) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.5.1.jar:na]
        at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.5.1.jar:na]
        at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.5.1.jar:na]
        at clojure.lang.RT.next(RT.java:598) ~[clojure-1.5.1.jar:na]
        at clojure.core$next.invoke(core.clj:64) ~[clojure-1.5.1.jar:na]
        at clojure.core$dorun.invoke(core.clj:2781) ~[clojure-1.5.1.jar:na]
        at clojure.core$doall.invoke(core.clj:2796) ~[clojure-1.5.1.jar:na]
        at backtype.storm.daemon.worker$fn__5940$exec_fn__1396__auto____5941.invoke(worker.clj:375) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
        at backtype.storm.daemon.worker$fn__5940$mk_worker__5996.doInvoke(worker.clj:347) [storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.worker$_main.invoke(worker.clj:454) [storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
        at backtype.storm.daemon.worker.main(Unknown Source) [storm-core-0.9.2-incubating.jar:0.9.2-incubating]
Caused by: java.lang.ClassNotFoundException: storm.kafka.KafkaSpout
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[na:1.7.0_75]
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[na:1.7.0_75]
        at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_75]
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[na:1.7.0_75]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[na:1.7.0_75]
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[na:1.7.0_75]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[na:1.7.0_75]
        at java.lang.Class.forName0(Native Method) ~[na:1.7.0_75]
        at java.lang.Class.forName(Class.java:274) ~[na:1.7.0_75]
        at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:625) ~[na:1.7.0_75]
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) ~[na:1.7.0_75]
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) ~[na:1.7.0_75]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) ~[na:1.7.0_75]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) ~[na:1.7.0_75]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) ~[na:1.7.0_75]
        at backtype.storm.utils.Utils.deserialize(Utils.java:89) ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
        ... 36 common frames omitted
rtyler commented 9 years ago

I've documented some of this issue after further investigation in this mailing list post

Duplicated here for posterity:

I'm working on bringing the Redstorm library
(https://github.com/jruby-gradle/redstorm) up to speed with some of the recent
updates made possible in JRuby core, primary of which is loading
resources/artifacts nested within a self-contained .jar file. What I'd like,
ideally, is some sort of pre-initialization hook on a supervisor that I can tie
into to properly set up my classpath.

JRuby supports adding jars to the classpath with a `require` statement in such
a manner that a number of Ruby gems will embed a jar file in the gem and then
use `require 'my-library'` just like they might require any other Ruby
dependency.

This presents problems when executing in the Storm runtime environment. I
cannot simply go the shaded/fat-jar route since there is Ruby code which
expects "my-library.jar" to be a file in the Ruby `$LOAD_PATH` instead of
purely relying on the presence of files in the classpath.

This has been discussed previously in threads like this:
    <http://grokbase.com/t/gg/storm-user/12c7m3zzve/what-are-the-chances-of-supporting-a-jar-of-jars-in-the-future>

This isn't a problem on topology deployment where I can easily customize the
classpath and loading semantics but when a topology is sent out to supervisor
nodes and started in workers, the "entrypoint" is something I cannot modify or
hack to update the running load path for the JRuby runtime.

Is there a viable workaround or some means of overriding/extending the worker
initialization routines?
rtyler commented 9 years ago

The work-around that I've validated works is to include two separate configurations. One jrubyStorm packs everything into the artifact verbatim (nested jars-in-jars) whereas jrubyStormClasspath unzips all the dependencies and places the raw class files inside the resulting archive.

This will provide a short term resolution for users

Related jruby-gradle/redstorm#12