mesos / storm

Storm on Mesos!
Apache License 2.0
139 stars 66 forks source link

Support Storm 1.0.0+ #177

Closed erikdw closed 7 years ago

erikdw commented 7 years ago

See individual commits in this PR for more info. Haven't tested yet, will update the PR when I have.

erikdw commented 7 years ago

I've tested with some extra changes in the vagrant setup of this project. I need to spend a bit more time refining the vagrant-related settings, but the basis of the storm-1.0+ support is here. I had to make some other minor tweaks that I've rebased appropriately.

I considered collapsing the remaining shims, but opted against it for now. I suspect they might prove useful once storm-2.0 lands given that it is a huge rewrite (storm-core being changed from Clojure to Java).

Another thing I left out is support for nimbus.seeds -- so we're still reliant on the deprecated nimbus.host config option. When we understand how to use marathon we will come back to that.

@DarinJ can you please review when you have a chance?

DarinJ commented 7 years ago

@erikdw my team was just discussing working on stom-1.0.2 a few days based off the WIP branch. I pulled this PR, will test out and make comments.

DarinJ commented 7 years ago

Did a quick review of the code ... mostly just package name changes - nothing to worry about. LGTM. Going to run some test topo's a cluster.

clharris commented 7 years ago

Hi @erikdw , you got it to run successfully on a cluster I'm guessing? I just tried running a test topo and got some version conflicts

2016-11-01 19:06:47.161 o.a.s.d.worker [ERROR] Error on initialization of server mk-worker java.lang.RuntimeException: Fail to construct messaging plugin from plugin backtype.storm.messaging.netty.Context at org.apache.storm.messaging.TransportFactory.makeContext(TransportFactory.java:53) ~[storm-core-1.0.2.jar:1.0.2] at org.apache.storm.daemon.worker$worker_data.invoke(worker.clj:266) ~[storm-core-1.0.2.jar:1.0.2] at org.apache.storm.daemon.worker$fn__8555$exec_fn__2466__auto__$reify__8557.run(worker.clj:611) ~[storm-core-1.0.2.jar:1.0.2] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_91] at javax.security.auth.Subject.doAs(Subject.java:415) ~[?:1.7.0_91] at org.apache.storm.daemon.worker$fn__8555$exec_fn__2466__auto____8556.invoke(worker.clj:609) ~[storm-core-1.0.2.jar:1.0.2] at clojure.lang.AFn.applyToHelper(AFn.java:178) ~[clojure-1.7.0.jar:?] at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.7.0.jar:?] at clojure.core$apply.invoke(core.clj:630) ~[clojure-1.7.0.jar:?] at org.apache.storm.daemon.worker$fn__8555$mk_worker__8650.doInvoke(worker.clj:583) [storm-core-1.0.2.jar:1.0.2] at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.7.0.jar:?] at org.apache.storm.daemon.worker$_main.invoke(worker.clj:771) [storm-core-1.0.2.jar:1.0.2] at clojure.lang.AFn.applyToHelper(AFn.java:165) [clojure-1.7.0.jar:?] at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.7.0.jar:?] at org.apache.storm.daemon.worker.main(Unknown Source) [storm-core-1.0.2.jar:1.0.2] Caused by: java.lang.ClassNotFoundException: backtype.storm.messaging.netty.Context at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[?:1.7.0_91] at java.net.URLClassLoader$1.run(URLClassLoader.java:355) ~[?:1.7.0_91] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.7.0_91] at java.net.URLClassLoader.findClass(URLClassLoader.java:354) ~[?:1.7.0_91] at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ~[?:1.7.0_91] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) ~[?:1.7.0_91] at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ~[?:1.7.0_91] at java.lang.Class.forName0(Native Method) ~[?:1.7.0_91] at java.lang.Class.forName(Class.java:195) ~[?:1.7.0_91] at org.apache.storm.messaging.TransportFactory.makeContext(TransportFactory.java:38) ~[storm-core-1.0.2.jar:1.0.2] ... 14 more

Any thoughts on this?

erikdw commented 7 years ago

@clharris : don't recognize those errors. Yes, I tried this on a "cluster", but really it's the vagrant setup in this project -- but it uses mesos-master, mesos-slave/agent, MesosNimbus, MesosSupervisor, storm-ui, etc. Here's the storm-ui showing the worker running:

My guess would be that this is a storm conflict between previously deployed stuff and the newer version. I'm pretty sure it's not backwards compatible. i.e., if that was the case then you would need to:

erikdw commented 7 years ago

@clharris : upon looking closer at the error, I think this is a problem with your configuration somehow:

Error on initialization of server mk-worker java.lang.RuntimeException: Fail to construct messaging plugin from plugin backtype.storm.messaging.netty.Context

^ Note: backtype.storm.messaging.netty.Context That should be: org.apache.storm.messaging.netty.Context See this line in the storm.yaml of this PR.

If not your configuration, then maybe a problem with how you're building?

Here's how I built:

STORM_RELEASE=1.0.2 MESOS_RELEASE=1.0.1 bin/build-release.sh
clharris commented 7 years ago

@erikdw, oops, you're right, I was using an old storm.yaml which had the wrong storm.messaging.transport. Sorry I didn't notice that one sooner. It still seems that I have something that's not configured quite right, but I'll have to look at it more tomorrow. Thanks for your help

erikdw commented 7 years ago

@clharris : phew, glad to help!

clharris commented 7 years ago

@erikdw: just tinkered with it again this morning and everything is running great now! LGTM

DarinJ commented 7 years ago

Tested +1 LGTM

erikdw commented 7 years ago

@DarinJ & @clharris : thanks for the help with testing and reviewing! Gonna merge and release the 1st storm-1.0+ image.

maverick2202 commented 7 years ago

I would like to test upgrading our storm-mesos framework to use storm 1.0. Is there documentation for upgrade step ? Can I just upgrade nimbus to storm 1.0 and have new workers/existing works on restart pull in storm 1.0 ?

This is assuming backward compatibility with a given topology running both new worker 1.0 and old worker 0.9.6.

erikdw commented 7 years ago

@maverick2202 : sorry for delayed response. This is not a storm-mesos question really, it's a storm question. But I can tell you that storm 1.0 is not directly backwards compatible with storm 0.9.x -- they changed the package paths for all classes from backtype.* to org.apache.*.

There is a parameter-based workaround you can try for allowing storm 0.9.x topologies to run on storm 1.0+ without needing to be recompiled, but I haven't ever tried it. It's been discussed on the storm user mailing lists. See these links:

It sounds like you need to wipe a lot of state to upgrade to storm-1.0 from the emails in that thread.