Storm 1.0.x Issues: Nimbus Restart and Supervisor Explosion

michaelmoss commented 6 years ago

mesosui-storm

I've noticed on several occasions far too many supervisors being spawned on our mesos cluster. There would be supervisors consuming mesos resources on machines where the topology wasn't even running.

In my research, I noticed that some topologies were associated with inactive mesos frameworks (what nimbus spawns), so I did the following test.

Launch Nimbus (1.0.4) on Mesos
Launch Topology. In the Mesos UI, under frameworks, you will see 1 Active Framework ("Storm104" - Nimbus) and 1 Active Task (The topology).
Kill -9 Nimbus
You will notice the original Nimbus is now under "Inactive Frameworks" and a new nimbus under "Active Frameworks", however, the Active task remains associated with the original nimbus, not the new one.

In Storm 0.9.x, restarting nimbus in this way causes the framework to re-register (instead of spawning a new one), in which case all the topologies remain associated with it.

erikdw commented 6 years ago

@michaelmoss : we do not run nimbus inside the Mesos cluster ourselves. I think you are reporting separate issues here.

nimbus stopping & starting leads to new framework instance
lingering supervisors that should be killing themselves

The 1st issue should be pretty easy to explain: the MesosNimbus code records the framework ID into the local filesystem of the Nimbus. When you restart the Nimbus it reads the framework ID from the local filesystem and uses it to reregister with Mesos. This works perfectly fine for us. I'm assuming that for you the Nimbus is not coming up with the same filesystem underneath it, since you are (somehow) launching the Nimbus into Mesos. If you can "pin" the Nimbus onto the same host with the same local filesystem path for the storm.local.dir, then you should be able to avoid this problem. FYI, there is nothing different about storm-0.9.x and storm-1.x for this behavior.

I may have observed the 2nd issue today in our test cluster, I'm trying to find time to look into it.

erikdw commented 6 years ago

@michaelmoss : I'm analyzing the logs (nimbus, supervisor, mesos-slave/agent, mesos-master) and it seems the situation leading to zombie-supervisors is something like this:

MesosNimbus schedules worker (mesos task) onto host
MesosSupervisor receives Executor.launchTask(), and sends TASK_RUNNING status back.
Nimbus-core in the meantime has scheduled the worker away from the host for some reason.
The Supervisor-core thus never tries to actually launch the worker process.
Supervisor lives forever, with a falsely "running" mesos task that doesn't actually exist as a running worker process.

So it seems this is a race condition with the hack we put in place to workaround Storm inadvertently removing the invocation of ISupervisor.assigned(). I believe that this leads to the Supervisor living forever -- maybe without that call there is never any interaction between the Supervisor-core code and the MesosSupervisor in this corner case situation.

I'm open to suggestions on how to handle it. I'm probably going to hack storm in my own public repo to call the ISupervisor.assigned() method.

michaelmoss commented 6 years ago

@erikdw, thanks for breaking this down and providing these leads.

For issue 1, "nimbus stopping & starting leads to new framework instance": In experimenting with the blobstore, supervisor.blobstore.class: "org.apache.storm.hdfs.blobstore.HdfsClientBlobStore" nimbus.blobstore.class: "org.apache.storm.hdfs.blobstore.HdfsBlobStore"

I failed to realize that while the blobstore stores the topology jars wonderfully, it doesn't store any of the other nimbus state apparently. I reverted to our old NFS dir syncing, and confirmed this is no longer an issue (on reboot, the framework re-registers).

I will dig into issue 2 and post back.

Thanks again.

erikdw commented 6 years ago

@michaelmoss : oh, interesting I didn't know that the blobstore had that functionality.

To be fair, the only state that we store in the local filesystem from MesosNimbus is the Framework ID. We've already talked about shoving that into ZK, and after @changreytang 's changes to add logviewer launching with task reconciliation, the groundwork has already been laid to easily shove the Framework ID into ZK. So that could allow you to continue using the fancy blobstore-in-HDFS functionality.

Alternatively, it might be possible that there is an API we can use from Storm itself to store the Framework ID into the blobstore wherever it exists.

Also, this relates to eventually trying to support HA Nimbus in a Storm-on-Mesos setup. We haven't spent any time investigating that.

michaelmoss commented 6 years ago

Thanks. I do wonder what the utility of running 2+ Nimbus servers in HA mode when marathon itself can keep things running (and one is ensuring that the blobs/topologies + nimbus config is backed up - as we are).

erikdw commented 6 years ago

@michaelmoss : ah, good point. I have still never used marathon, I'm only familiar with it in theory, but this could avoid us needing to spend any time on trying to support HA Nimbus directly.

erikdw commented 6 years ago

@michaelmoss: I have tested a simple candidate fix for STORM-2690, which adds a call to ISupervisor.assigned() from the storm-core supervisor code. It also requires reverting some of the changes in this project that we worked on in PR https://github.com/mesos/storm/pull/208.

With those 2 changes, the supervisors are cleaning themselves up once again, and also not shutting themselves down unnecessarily. It'll take me a bit of time to promote this properly into this repo and release a new version of storm-mesos support for 1.0.3/1.0.4. We somehow broke the build for 1.0.4, as documented in issue #222, so that's some extra work I have to do first.

michaelmoss commented 6 years ago

@erikdw what a relief. I was a bit worried I would have to jump into the abyss of that storm code to figure this out, but it looks like you know it well.

I'm watching and voted on STORM-2690, I suppose we are waiting on the storm project to get this patch in to move forward?

Do let me know how I might help.

erikdw commented 6 years ago

@michaelmoss : I'll need to propose the patch and get it merged in, yup. Don't foresee that being too much of an issue. Maybe have to negotiate / beg a bit since they probably don't want 1.0.x to have much development on it, but it's the last version where we can have storm-mesos support without extreme hijinks. See issue #214. But it will take them time to cut a new 1.0.x release I am pretty sure.

So my plan for the shorter term is treating this like we do 0.9.x -- I can create a build based on a version of storm that I build myself from the changes in https://github.com/erikdw/storm.

srikanth-viswanathan commented 6 years ago

Hi @erikdw, thanks for the support on this ticket thus far. I see that your PR to the storm (core) project was accepted and merged. I think this means the only items left are:

Wait for Apache Storm 1.0.6 to be released. Do you know what the timeline for this would be?
Update this project is to apply this diff https://gist.github.com/erikdw/7e35d455de871b73a155745fa4bd2f3f to this project and cut a new release.

Let me know if I can help with anything. Thanks again.

erikdw commented 6 years ago

@srikanth-viswanathan : I've gone ahead and merged the "reverting" change [1], built this project with our own custom version of storm-1.0.5. The new artifact is v0.2.5.

Please check out the other changes we put into the custom storm version. They're needed for us, not sure about your situation. Those changes won't make it into v1.0.6, so we're going to continue to need a custom v1.0.6 build, but it will be able to ditch the reflection-based invocation of launchDaemon().

Also please test this latest incarnation before putting it in production -- I haven't tested it in our storm-1.0.x dev environment yet.

Erik

[1] I had to reinstate the use of reflection to invoke Supervisor.launchDaemon() -- that was because the build compiles the MesosSupervisor.java against the public storm-core-1.0.5 jar, which doesn't have the afore-mentioned custom changes to storm-core.

srikanth-viswanathan commented 6 years ago

Thanks @erikdw. I took at look at the changes in the custom version, and it doesn't seem like I will need anything other than the reversion. We maintain a private fork anyway, so I'll cherry pick these changes.

Also, it doesn't appear that this is going to need to depend on storm 1.0.6 if we continue to use reflection to invoke Supervisor.launchDaemon() - I was a little worried of being blocked on that release or having to maintain a private version of Storm. I'll just go ahead and depend on storm 1.0.5 from open source.

Thanks again for your help. I'll report back with how this looks in beta and production, but our testing on development looks good thus far.

mesos / storm

Storm 1.0.x Issues: Nimbus Restart and Supervisor Explosion #227