Open cehoffman opened 6 years ago
This is currently fairly difficult for us to do, as we rely upon StatefulSet for the upgrade functionality under the hood, and use the RollingUpdate strategy.
If we switch to OnDelete we will then lose the 'partition' functionality which we currently rely upon to ensure updates to nodes in a cluster aren't triggered early if their pods are deleted. If we switch, when a k8s node fails in the cluster during an upgrade, any pods running on that node will be immediately upgraded next time they start (potentially breaking delicate upgrade procedures).
Therefore, the only way we can do this is to implement our own alternative to StatefulSet, which chose which replica to update based on some database specific predicate.
There has already been discussion over on the Elastic GitHub and forums about triggering manual re-elections in order to make this process more graceful as a stop-gap: https://github.com/elastic/elasticsearch/issues/17493.
Their line seems to be "it shouldn't take that long to re-elect" - but as you say, it'd be nice if we can minimise interruptions. It might be possible to achieve this with a custom discovery plugin, but right now we use the in-built SRV record discovery mechanism, so this would be a new component entirely.
Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature
What happened: Upgrade of elasticsearch cluster resulted in multiple master elections.
What you expected to happen: Only one master election is done at end of upgrade
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?: The controller-manager should delete all master pods except the current leader when doing an upgrade. The current leader should be the last pod deleted and updated.
Environment:
kubectl version
):