kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.5k stars 1.3k forks source link

kube-apiserver does not gracefully terminate when a control plane machine is scaled down #10934

Open kdw174 opened 1 month ago

kdw174 commented 1 month ago

What steps did you take and what happened?

When a control plane machine is scaled down, the ectd leader is moved if it is on the control plane machine, and the etcd member is removed from the cluster. This happens before the machine object is deleted and handled by the infrastructure provider.

When the etcd member is removed, the kube-apiserver pod on that control plane node can no longer serve requests since it can't access etcd.

{
  "level": "warn",
  "ts": "2024-07-24T17:17:12.462Z",
  "logger": "etcd-client",
  "caller": "v3/retry_interceptor.go:62",
  "msg": "retrying of unary invoker failed",
  "target": "etcd-endpoints://0xc000c44700/127.0.0.1:2379",
  "attempt": 0,
  "error": "rpc error: code = Unavailable desc = error reading from server: read tcp 127.0.0.1:32866->127.0.0.1:2379: read: connection reset by peer"
}

etcd stops when it is removed from the cluster and the pod ends up in a crashloopbackoff.

Since the apiserver wont get a sigterm until after etcd has been removed from the cluster, it makes the --shutdown-delay-duration apiserver flag useless for existing connections. They will fail and have to be retried. There will be a brief period of time until the infrastructure provider removes the apiserver from the loadbalancer or healthchecks fail where requests will be routed to the apiserver that can't fulfill requests.

I've seen this mentioned in several past issues, but they have all been closed without resolution. https://github.com/kubernetes-sigs/cluster-api/issues/2652 best describes the issue that still exists today.

It seems this is just accepted today and is handled with lb healthchecks and removing the instance from load balancer backends as fast as possible. Am I missing something with the configuration to make this scale down in a more graceful manner?

What did you expect to happen?

I would expect there to be no period of time where existing connections to a kube-apiserver fail during a standard KCP upgrade. It would also be nice to not have etcd in a crashloopbackoff for alerting purposes.

Cluster API version

1.4.3 though I would expect this to happen on latest

Kubernetes version

1.25

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
fabriziopandini commented 1 month ago

Unfortunately, loadbalancer or healthchecks are out of reach from KCP

However, I agree that it will be also interesting to look into KCP and see if there are ways to minimize impacts of the deletion workflow