Administration with kubeadm - replacing control plane nodes

Daxcor69 commented 1 year ago

Hello,

I am having to change out all of my control plane nodes. There is zero information on the website on the process of doing this safely as to not crash the cluster. I am using kubeadm to provision the cluster. I guess the issue has to do with etcd and split brain etc. For such a dangerous task, I am surprised there is no information.

I would be happy to create this document once I know how to do it. ;)

My use case, is I have a 3 node HA control plane. I need to replace them all with new nodes with new hardware and configs. So this is a production cluster on baremetal. I have searched the google sphere and chatgpt with no luck so far.

I have the ability to add 3 additional nodes to the cluster to make it six cp nodes. The question is how do I remove the old nodes safely? This is the documentation I would love to have. I hope this make sense.

Thank you for all that you do for the project.

Ritikaa96 commented 1 year ago

/sig cluster-lifecycle /kind support

Ritikaa96 commented 1 year ago

Hi @Daxcor69 , If you have attached new nodes gracefully , then doing a kubeadm reset on new master node and then joining the required nodes should do the work easily. Also i hope you know that having etcd cluster with even number size requires all replicas to work properly and it doesn't tolerate any failures. This is why it's highly recommended to use an odd number of etcd replicas. Here is some context : https://serverfault.com/questions/1029654/deleting-a-control-node-from-the-cluster-kills-the-apiserver

Information on HA topology is here:https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

Daxcor69 commented 1 year ago

Here is first draft of my write up. I have not tested this yet. I am looking for confirmation that this a good way to proceed and to start the process of refining a how-to-document

The starting layout of the cluster is:
3 control plane nodes
10 worker nodes

The goal is to replace all of the control plane nodes with new ones. Don\'t ask why  :(

First we want to understand the layout of etcd.  The IP\'s in example are the host ip addresses of the control plane nodes.
We can do this by running the following commands on any of the control plane nodes:

sudo apt install etcd-client -y
cd /etc/kubernetes/pki/etcd/
`ETCDCTL_API=3 etcdctl endpoint status --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=10.1.1.100:2379,10.1.1.101:2379,10.1.1.102:2379 --write-out=table`

Output:

+-----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------+------------------+---------+---------+-----------+-----------+------------+
| 10.1.1.100:2379 | cf6b4f78d74de8cf |   3.5.6 |  144 MB |     false |        17 |   38133364 |
| 10.1.1.101:2379 | dfe0ecd90aef9c99 |   3.5.4 |  149 MB |     false |        17 |   38133364 |
| 10.1.1.102:2379 | aad7afaa2e5cc464 |   3.5.4 |  151 MB |      true |        17 |   38133364 |
+-----------------+------------------+---------+---------+-----------+-----------+------------+

This tells us which host/node is currently the etcd leader.  In this case it is the node at ip 10.1.1.102.  
We will need to know this later. You will need to know which kubernetes node name is associated with which ip
in the previous list.

Very simply etcd will fail if you have an even number of members in the etcd cluster and you remove one.  
As we currently have 3 members we are safe to remove a SINGLE member.  We will remove the member that is 
not the leader.  In this case it is going to be cp1.

First we need to remove the control plane node from the cluster. We will do this by running the following
commands on any host that has access to the cluster via kubectl:

`
kubectl drain cp1 --ignore-daemonsets
kubectl delete node cp1
`

At this point I would shutdown the host to make sure it doesn\'t come back up in the cluster.  You can 
do this by running the following command on the host to be removed:

`shutdown -h now`

Now we need to add our new control plane node to the cluster.  First we need to get the kubeadm join string
from one of the existing control plane nodes.  We can do this by running the following command:

`sudo kubeadm token create --print-join-command --certificate-key $(kubeadm certs certificate-key)`

You will get an output similar to the following:

kubeadm join cluster-endpoint.<redact>.local:6443 --token <redact> --discovery-token-ca-cert-hash sha256:<redact> --control-plane --certificate-key <redact>

Assuming we have a fresh host waiting in the wings ready to join the cluster, we can now paste the previous
command on the new host to join the cluster as a control plane node.

Once this is done we can run the following command on any host that has access to the cluster via kubectl to make sure it is ready:

`kubectl get nodes`

NAME              STATUS                     ROLES           AGE     VERSION
controlplane1_new Ready                      control-plane   90d     v1.26.0
controlplane2     Ready                      control-plane   90d     v1.26.0
controlplane3     Ready                      control-plane   90d     v1.26.0
wk1               Ready                      worker          2d1h    v1.26.0
wk10              Ready                      worker          29h     v1.26.0
wk11              Ready                      worker          29h     v1.26.0
wk2               Ready                      worker          32h     v1.26.0
wk3               Ready                      worker          32h     v1.26.0
wk4               Ready                      worker          32h     v1.26.0
wk5               Ready                      worker          32h     v1.26.0
wk6               Ready                      <none>          6h47m   v1.26.0
wk7               Ready                      worker          29h     v1.26.0
wk8               Ready                      worker          29h     v1.26.0
wk9               Ready                      worker          29h     v1.26.0

You should see all the nodes in the cluster and their state. If you see the new control plane node in the
cluster and it is in the Ready state then you are good to go.  If you see the new control plane node in the
cluster and it is in the NotReady state then you need to wait till it is ready.

I would run the etcd status command again to make sure the new control plane node is a member.

`ETCDCTL_API=3 etcdctl member list --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=10.1.1.110:2379,10.1.1.101:2379,10.1.1.102:2379 --write-out=table`

+------------------+---------+-------------------+-------------------------+-------------------------+
|        ID        | STATUS  |     NAME          |       PEER ADDRS        |      CLIENT ADDRS       |
+------------------+---------+-------------------+-------------------------+-------------------------+
| aad7afaa2e5cc464 | started | controlplane3     | https://10.1.1.102:2380 | https://10.1.1.112:2379 |
| cf6b4f78d74de8cf | started | controlplane1_new | https://10.1.1.110:2380 | https://10.1.1.110:2379 |
| dfe0ecd90aef9c99 | started | controlplane2     | https://10.1.1.101:2380 | https://10.1.1.111:2379 |
+------------------+---------+-------------------+-------------------------+-------------------------+

This command is slightly different in that is shows the state.  If the state is started then you are good to go.
You can see we have a new hostname and IP. This is becuase we have used a completely new host to replace the old one.

We can repeat the proccess for cp2 next. We choose this node becuase it is NOT the leader. Once you have completed the second node replacement
I would force etcd to elect a new leader.  You can do this by running the following command on the same control plane node you have been issuing the etcdcli commands.

`ETCDCTL_API=3 etcdctl elect one controlplane1_new --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=10.1.1.110:2379,10.1.1.111:2379,10.1.1.102:2379`

You can check the status again to make sure the new leader is the node you expect.

`ETCDCTL_API=3 etcdctl endpoint status --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=10.1.1.110:2379,10.1.1.111:2379,10.1.1.102:2379 --write-out=table`

Output

+-----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------+------------------+---------+---------+-----------+-----------+------------+
| 10.1.1.110:2379 | cf6b4f78d74de8cf |   3.5.6 |  144 MB |     true  |        17 |   38133364 |
| 10.1.1.111:2379 | dfe0ecd90aef9c99 |   3.5.4 |  149 MB |     false |        17 |   38133364 |
| 10.1.1.102:2379 | aad7afaa2e5cc464 |   3.5.4 |  151 MB |     false |        17 |   38133364 |
+-----------------+------------------+---------+---------+-----------+-----------+------------+

Notice the IP of the new leader and the IP of the second node we have added! We need to make sure it is on 
a NEW node. It can be on any NEW node but it must be a NEW node.

Now we can go ahead and remove the last node. Follow the same instructions as before. When you are done check the status again what you want to see is this:

`ETCDCTL_API=3 etcdctl member list --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=10.1.1.110:2379,10.1.1.101:2379,10.1.1.112:2379 --write-out=table`

+------------------+---------+-------------------+-------------------------+-------------------------+
|        ID        | STATUS  |     NAME          |       PEER ADDRS        |      CLIENT ADDRS       |
+------------------+---------+-------------------+-------------------------+-------------------------+
| aad7afaa2e5cc464 | started | controlplane3_new | https://10.1.1.112:2380 | https://10.1.1.112:2379 |
| cf6b4f78d74de8cf | started | controlplane1_new | https://10.1.1.110:2380 | https://10.1.1.110:2379 |
| dfe0ecd90aef9c99 | started | controlplane2_new | https://10.1.1.111:2380 | https://10.1.1.111:2379 |
+------------------+---------+-------------------+-------------------------+-------------------------+

confirm the leaders status:

`ETCDCTL_API=3 etcdctl endpoint status --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=10.1.1.110:2379,10.1.1.111:2379,10.1.1.112:2379 --write-out=table`

Output

+-----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------+------------------+---------+---------+-----------+-----------+------------+
| 10.1.1.110:2379 | cf6b4f78d74de8cf |   3.5.6 |  144 MB |     true  |        17 |   38133364 |
| 10.1.1.111:2379 | dfe0ecd90aef9c99 |   3.5.4 |  149 MB |     false |        17 |   38133364 |
| 10.1.1.112:2379 | aad7afaa2e5cc464 |   3.5.4 |  151 MB |     false |        17 |   38133364 |
+-----------------+------------------+---------+---------+-----------+-----------+------------+

If you have paid close attention to the details and worked carefully you should have a cluster with 3 brand new control plane nodes
with no downtime or corruption.

tengqm commented 1 year ago

This looks like a good task page. /sig-cluster-lifecycle

tengqm commented 1 year ago

/remove-kind support /kind feature

Daxcor69 commented 1 year ago

please understand I have NOT tested this... I am not an expert.. just doing the research and trying to reason it out so I don't crash my own production cluster.

tengqm commented 1 year ago

please understand I have NOT tested this... I am not an expert.. just doing the research and trying to reason it out so I don't crash my own production cluster.

Totally understand. That is the reason I'm mentioning SIG ClusterLifcycle to jump in and help validate this approach.

Ritikaa96 commented 1 year ago

Hi @Daxcor69 , that's a good starting point. For contribution purpose , see the contribution guide to make a change or add new tutorial.

sftim commented 1 year ago

/language en

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sftim commented 6 months ago

/remove-lifecycle rotten /lifecycle stale /triage accepted

kubernetes / website

Administration with kubeadm - replacing control plane nodes #39458