Open ugur99 opened 2 weeks ago
Could you fill the template ? Which playbook ? There is no much information to go on there ^
it is not a bug report it is a discussion topic thats why I did not add other infos; actually it is not a new topic I assumed you are already aware of this issue @VannTen see here
I think the linked issue is about upgrading, not scaling up though ?
What playbook has that behavior ? This seems a bit weird to me, because I've migrated whole clusters to new machines (for migrating from rhel7 to 8), including external etcd and control planes, without downtime (unless we missed it) about 2 years ago. The docs/operations/nodes.md files has the relevant docs if I remember correctly.
I think the same playbook has been used for both upgrading and scaling up the cluster;
https://github.com/kubernetes-sigs/kubespray/blob/master/playbooks/cluster.yml#L19-L20
https://github.com/kubernetes-sigs/kubespray/blob/master/playbooks/upgrade_cluster.yml#L38-L39
Here is the output of the scale-up operation from 3 controlplane nodes to 5 nodes:
RUNNING HANDLER [etcd : Reload etcd] *********************************************************************************************************************************************************************************
changed: [node5]
changed: [node4]
changed: [node3]
changed: [node1]
changed: [node2]
Friday 18 October 2024 09:38:10 +0200 (0:00:32.341) 0:08:31.081 ********
AFAICT from that log, this would be the problem right ? (This is not, in fact, a reload : https://github.com/kubernetes-sigs/kubespray/blob/5aea2abc40f9a7cbee0c0ad6bf32ec97f1ef3acf/roles/etcd/handlers/main.yml#L12-L17)
(I don't this a huge problem, because most of the time the window where etcd is unavailable would be very small).
I think this can be fixed with throttle on that case (with throttle being something like {{ groups['etcd'] | length // 2 }}
(so we keep quorum but still go as fast as possible).
host mode use Type=notify in the systemd service, so that's enough for that. Not sure about docker mode.
unfortunately for the relatively large clusters it can take ~2 mins to recover each etcd instances; and for prod clusters it is a serious problem :(
throttling is an option; but maybe restarting old replicas with the new certs before joining new members to the etcd cluster would be the cleanest way. I opened this discussion on the etcd side.
unfortunately for the relatively large clusters it can take ~2 mins to recover each etcd instances; and for prod clusters it is a serious problem :(
Yeah 2 mins is bad. How large are your clusters ? I don't have more than 200 nodes so that might be why I've never seen this. (Although, maybe cluster size is not the only factor, quantity of objects might be more relevant for etcd)
throttling is an option; but maybe restarting old replicas with the new certs before joining new members to the etcd cluster would be the cleanest way.
Aren't the etcd nodes trusting the CA ? Not sure what you mean with this exactly, in the scaling up cases if we do:
-> generate certs signed by CA for new members -> join cluster for new members
There should not be any problems ?
(I'm not completely sure what the etcd role do exactly, it's been a while since I've worked on that and it was not my primary concern, and that role is not very readable)
ah you are so right; we dont need to restart old etcds 👍 sorry for my confusion
What happened?
Currently, when scaling up etcd instances with Kubespray, it restarts all etcd nodes simultaneously after generating new certificates and taking backups for each instance. This approach results in downtime for the entire cluster. Is it possible to restart the etcd instances one by one, to avoid causing any downtime for the cluster? Or what is the motivation behind this approach?
What did you expect to happen?
-
How can we reproduce it (as minimally and precisely as possible)?
-
OS
-
Version of Ansible
-
Version of Python
-
Version of Kubespray (commit)
-
Network plugin used
cilium
Full inventory with variables
-
Command used to invoke ansible
-
Output of ansible run
-
Anything else we need to know
No response