habitat-sh / core-plans

Core Habitat Plan definitions
130 stars 252 forks source link

[zookeeper] multi-node setup issue #497

Open IxDay opened 7 years ago

IxDay commented 7 years ago

I am currently trying to add support for multi-node in zookeeper plan. All nodes have to be named in the config in the form of

server.{{id}}={{host}}:{{port}} // to simplify

I wanted to use @index from the svc context, but the id value must be between 1 and 255, and index starts at 0. as http://handlebarsjs.com/ does not support expression and evaluation, I get stuck here unless I write some kind of helper, which I think will be complicated to add as it complexify the template language. Does someone have a idea how I can bypass this restriction ?

bdangit commented 7 years ago

Would it be possible to take advantage of dynamic configuration in zookeeper instead? https://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html#ch_reconfig_dyn

You would basically not template out the config but design your hooks to make API calls to add and remove nodes.

moretea commented 7 years ago

I believe that there are still some issues here that are hard to solve. Like the fact that there should be a stable mapping of the habitat supervisor -> zookeeper id.

moretea commented 7 years ago

@bdangit see the two issues I opened up on habitat-sh/habitat

Furthermore, there are several misconfigurations possible according to https://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html, which I'm not sure that we can prevent from happening. We'd need some shared distributed memory with proper atomic operations to perform those re-configurations.

bdangit commented 7 years ago

Hrmm. I see. Wouldn't we run into an issue where all the nodes would have to be reconfigured because the template is changing? If done via dynamic reconfigure, a node could gather cluster health from Zookeeper and let Zookeeper routines manage addition/removal of nodes.

moretea commented 7 years ago

To make sure that we're on the same page: we definitely should use the reconfigure hook to keep zookeeper in the air at all times, restarting zookeeper is not an option IMHO.

We'll check if we can implement this once https://github.com/habitat-sh/habitat/issues/2224 is implemented.

bdangit commented 7 years ago

@moretea yep agree with reconfigure and then we should be able to do a rolling restart. At least that's what we needed to do to get the configs to take place right?

eeyun commented 6 years ago

We hit this in triage today! This is still a feature we want and the two referenced issues on the habitat core repo are probably the best place to track any early work on this. Without those two features we can't sanely implement this feature in the plan.

moretea commented 6 years ago

If you add special cased helper functions, you'll eventually hit another use case that you had not envisioned before, and will be unable to implement.

There are two options to solve this problem IMHO.

Proper programming language

For example, https://github.com/dhall-lang/, which gives you strong termination guarantees, and does not allow you to write random programs doing random IO (unlike how Ruby is used for Chef).

Enable external dynamic behavior

This idea is close to how Kubernetes' Controllers work, except that you'd elect a single controller in one of the supervisors in a ring. This Controller would read a stream of events on stdin, and write commands to the Supervisor on stdout. The events it gets notified about would be that a peer joined, left, got restarted, etc.