yahoo / storm-yarn

Storm-yarn enables Storm clusters to be deployed into machines managed by Hadoop YARN.
Other
417 stars 161 forks source link

Not running nimbus in YARN #66

Open ekohlwey opened 10 years ago

ekohlwey commented 10 years ago

It seems like there may be a case to be made for not running nimbus in yarn.

Nimbus is a relatively lightweight process, and the cluster only needs one of them running at any given time. Running nimbus in YARN actually reduces the availability of SOY which is not really the Right Thing(TM).

It substantially complicates cluster operations in terms of job submission to rely on the nimbus container never failing. It also makes job submission difficult.

Running nimbus separately is not unlike running an independent map/reduce history server. So it makes a lot of sense in my opinion and there are apparent design patterns in other yarn based frameworks.

At least until the time when SOY is able to properly handle topology submission along with starting the framework, this seems like a good modification to make to simplify running SOY.

I would propose adding an external nimbus configuration option in order to prevent SOY from starting one, if a user would prefer to operate that way. Thoughts?

revans2 commented 10 years ago

If you want to add an option to not launch nimbus, I think that would be fine. You probably also want to have the AM run as an unmanaged AM in that case, similar to how impala tries to integrate with YARN. When Hadoop runs with security the individual processes run as the user that launched the job. If we have one nimbus with multiple different AMs then there is the possibility that part of a topology will be running as one user, and part of it will be running as a different user.

The original plan for nimbus was to wait for the Nimbus HA pull requests to be merged in.

https://github.com/nathanmarz/storm/pull/422

Then we could write a simple plug-in to place the important state files in HDFS instead of the local file system, and we would get recovery. Especially if we combine this with the work that has been going on in YARN to not shoot worker processes when the AM dies. With storm's transition to apache the HA pull request appears to have stagnated.

Long term I also like the idea of trying to have nimbus communicate with the AM directly, which would be simpler if they are collocated on the same box. If we want the AM to be able to request resources that reduce network traffic it almost needs nimbus and the nimbus scheduler to decide where the resources should be placed, and then tell the AM that information. It would also allow the cluster to automatically respond to demand, instead of having an external call to add new nodes to the cluster.