jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

Elasticsearch master election during upgrade #344

Open cehoffman opened 6 years ago

cehoffman commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Upgrade of elasticsearch cluster resulted in multiple master elections.

What you expected to happen: Only one master election is done at end of upgrade

How to reproduce it (as minimally and precisely as possible):

  1. Create a cluster at verison X with multiple master
  2. Cause last master in statefulset to become leader
  3. Update cluster to verison Y

Anything else we need to know?: The controller-manager should delete all master pods except the current leader when doing an upgrade. The current leader should be the last pod deleted and updated.

Environment:

munnerz commented 6 years ago

This is currently fairly difficult for us to do, as we rely upon StatefulSet for the upgrade functionality under the hood, and use the RollingUpdate strategy.

If we switch to OnDelete we will then lose the 'partition' functionality which we currently rely upon to ensure updates to nodes in a cluster aren't triggered early if their pods are deleted. If we switch, when a k8s node fails in the cluster during an upgrade, any pods running on that node will be immediately upgraded next time they start (potentially breaking delicate upgrade procedures).

Therefore, the only way we can do this is to implement our own alternative to StatefulSet, which chose which replica to update based on some database specific predicate.

There has already been discussion over on the Elastic GitHub and forums about triggering manual re-elections in order to make this process more graceful as a stop-gap: https://github.com/elastic/elasticsearch/issues/17493.

Their line seems to be "it shouldn't take that long to re-elect" - but as you say, it'd be nice if we can minimise interruptions. It might be possible to achieve this with a custom discovery plugin, but right now we use the in-built SRV record discovery mechanism, so this would be a new component entirely.