improbable-eng / etcd-cluster-operator

A controller to deploy and manage etcd clusters inside of Kubernetes
MIT License
129 stars 35 forks source link

Scale down #93

Closed wallrj closed 4 years ago

wallrj commented 4 years ago

Part of #35

wallrj commented 4 years ago

The E2E test failure in https://app.circleci.com/jobs/github/improbable-eng/etcd-cluster-operator/492/parallel-runs/0/steps/0-104 demonstrates that the etcd service is sometimes quite badly disrupted by the scale-down operations. I think because of 1 and sometimes 2 leader elections.

We could try and select non-leader members for removal but this would then often leave the member names out of sequence. Another good reason not to use ordinal names, perhaps?

But I also haven't had time to check whether there would be a leader election anyway. Perhaps leader elections happen as a result of the any change in cluster membership. In which case I can remove (or adjust ) the E2E cluster availability test.

wallrj commented 4 years ago

https://github.com/improbable-eng/etcd-cluster-operator/issues/101 Follow up issue for safe deletion of PVCs during scale-down.