Feature request: "Leader election"

UtahDave commented 10 years ago

This feature request is to open a dialog about a request feature for "leader election".

The basic idea is that in the master config you can specify a target of some type, such as G@os:Ubuntu and Salt would automatically select a minion from that targeted group at random and assign it to be a leader. Perhaps that means a grain is set on that specific minion.

Then you could target a command such as salt -G 'os:Ubuntu' --leader db.upate and that command would only be run on the leader of that targeted group.

If the leader doesn't respond to the master, then the master would automatically choose another minion from that targeted group and promote it to leader.

What would be the best way to implement this? Is there a better way to set up the interface to use this?

basepi commented 10 years ago

I think this makes sense. A leader might be a normal member of a cluster, but have additional software installed in order to allow it to function as the leader. So you have a set of states which make a server a leader, and execute a command with that leader, and so you want to choose one server at random from the cluster, but you want that choice to be sticky.

I envision this being a separate matcher. Basically, we keep a map of all clusters with their current leader, and a list of previous leaders. When targeted, we attempt to reach the current leader. If it's up, we're good, we run the command. If it's down for some reason, we add that leader to the pool of previous leaders, and then iterate over that pool looking for a live leader. If we exhaust that list, we pick a new random leader from the cluster and make it our current leader.

Sorry for rambling, just wanted to get my thoughts on paper from my discussion with @UtahDave about this.

malinoff commented 10 years ago

Hi, can I add my 2 cents here?

I have used salt to perform an initial deployment of couchbase cluster in AWS. If you're not familiar with couchbase, its setup implies that you should create a cluster on one, dedicated node and then connect other instances to that node. Leader feature may be useful in that case, however, I don't think that automatic election is something that is expected for users like me. In my case, it would be better if I can choose the leader by myself, for example, using salt-key -A someminion --leader

If this feature is intended to be used for automatic elections and there will be a bunch of states that must always be applied to the leader, I think it will become a slow, huge mess very soon. Because if the current leader goes offline and a new leader is elected, that bunch of states must be automatically applied to it (in other words, if you don't have the leader state, you can't be the leader). That bunch of states may take a lot of time - in my case the initial creation of couchbase cluster took around 10 minutes to configure the repository, download and install the package, configure and create cluster and buckets, and so on, and it is one of the simplest leader setups.

Moreover, all other couchbase instances had the leader ip address in their configs, so if the leader goes offline and a new leader is elected, I need to re-configure all other instances which is VERY tricky.

Thanks!

cachedout commented 10 years ago

I think @malinoff has some good points here. There are a few things to consider:

1) Randomly selecting a new leader has the potential to do very bad things to an infrastructure very quickly. Rarely are infrastructure pieces truly orthogonal. Most often, a change in cluster leadership must be very carefully orchestrated, most often by applications which are purpose-built for this scenario. (Think Zookeeper, for example.) I think this concern was covered quite well by @malinoff.

2) Randomly selecting leaders doesn't really take into account the current state of a leader beyond it simply being part of a matched group. To do this sort of thing elegantly, the concept of 'availability' has to be extended further than simply 'reachability'. I.e., we should know not only whether a new potential leader of a quorum is reachable, but what its current load is and whether it can truly handle being made a leader in whatever capacity. Otherwise, you're just begging for an outage.

I think I need to better understand the use case for @UtahDave 's suggestion. I might be too focused on the potential problems and not enough on the use case to be helpful. :]

RobertFach commented 9 years ago

Hi, what is the status of this feature request, any progress here?

I really like the description and points of @UtahDave. Right now, I'm working on a replicated filesystem which I would like to be fully automatically managed by Salt. This requires, as already mentioned by @malinoff, that one of the nodes is considered to be a leader. I don't like to have to specify a grain or a pillar which tells salt which of the nodes is a "leader", which in that case only means that some commands have to be run only on one node. Therefore if an inherent feature (this should be a fully integrated leader election protocol, capable of dealing with different failure models [crash failures, byzantine?], masking different number of failures (up to N nodes) - by design) of salt would provide such a feature would be really beneficial.

Maybe, one could also live with the following idea:

add an optional external service which implements features like leader election (and more), like Zookeeper, i think there are also pure python based solutions for quorum based leader election etc.
provide salt modules/reactors that will interface that external service to provide the basic functionality required in the formulas, etc.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

saltstack / salt

Feature request: "Leader election" #14074