jrasell / sherpa

Sherpa is a highly available, fast, and flexible horizontal job scaling for HashiCorp Nomad. It is capable of running in a number of different modes to suit different requirements, and can scale based on Nomad resource metrics or external sources.
Mozilla Public License 2.0
163 stars 8 forks source link

Add Sherpa cluster leadership allowing HA server deployments. #45

Closed jrasell closed 4 years ago

jrasell commented 4 years ago

In its current state, Sherpa does not support HA deployments and relies on operators running a single instance. If two Sherpa servers are running on the same logic Nomad cluster, each server will run independently and make scaling decisions and actions. This can cause problems for the Nomad workloads being scaled and can also add unrequired load on the Nomad API.

This change adds leadership locking to the Sherpa server. A Sherpa server with leadership is able to run the policy GC, autoscaler and respond to all API requests. A Sherpa server that does not have leadership, and is therefore classed as being a standby, can only respond to API request to the system and UI paths. Requests to all other paths will currently result in a HTTP redirect returned to the client.

In the event of a leadership change, the new active server will start up the required sub-process and gain the ability to respond to all API requests. The previous leader, will gracefully stop all processes a standby should not run, and continue its leader election loop unless the server is shutting down.

Closes #42