kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.75k stars 713 forks source link

how to renew the certificate when apiserver cert expired? #581

Closed zalmanzhao closed 6 years ago

zalmanzhao commented 6 years ago

Is this a request for help?

If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.

If no, delete this section and continue on.

What keywords did you search in kubeadm issues before filing this one?

If you have found any duplicates, you should instead reply there and close this page.

If you have not found any duplicates, delete this section and continue on.

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT or FEATURE REQUEST

Versions

kubeadm version (use kubeadm version):1.7.5

Environment:

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

errordeveloper commented 6 years ago

Duplicate of https://github.com/kubernetes/kubeadm/issues/206.

kachkaev commented 6 years ago

@zalmanzhao did you manage to solve this issue?

I created a kubeadm v1.9.3 cluster just over a year ago and it was working fine all this time. I went to update one deployment today and realised I was locked out of the API because the cert got expired. I can't even kubeadm alpha phase certs apiserver, because I get failure loading apiserver certificate: the certificate has expired (kubeadm version is currently 1.10.6 since I want to upgrade).

Adding insecure-skip-tls-verify: true to ~/.kube/configclusters[0].cluser does not help too – I see You must be logged in to the server (Unauthorized) when trying to kubectl get pods (https://github.com/kubernetes/kubernetes/issues/39767).

The cluster is working, but it lives its own life until it self-destroys or until things get fixed 😅 Unfortunately, I could not find a solution for my situation in #206 and am wondering how to get out of it. The only relevant material I could dig out was a blog post called ‘How to change expired certificates in kubernetes cluster’, which looked promising at first glance. However, it did not fit in the end because my master machine did not have /etc/kubernetes/ssl/ folder (only /etc/kubernetes/pki/) – either I have a different k8s version or I simply deleted that folder without noticing.

@errordeveloper could you please recommend something? I'd love to fix things without kubeadm reset and payload recreation.

davidcomeyne commented 6 years ago

@kachkaev Did you have any luck on renewing the certs without resetting the kubeadm? If so, please share, I'm having the same issue here with k8s 1.7.4. And I can't seem to upgrade ($ kubeadm upgrade plan) because the error pops up again telling me the the certificate has expired and that it cannot list the masters in my cluster:

[ERROR APIServerHealth]: the API Server is unhealthy; /healthz didn't return "ok"
[ERROR MasterNodesReady]: couldn't list masters in cluster: Get https://172.31.18.88:6443/api/v1/nodes?labelSelector=node-role.kubernetes.io%2Fmaster%3D: x509: certificate has expired or is not yet valid
kachkaev commented 6 years ago

Unfortunately, I gave up in the end. The solution was to create a new cluster, restore all the payload on it, switch DNS records and finally delete the original cluster 😭 At least there was no downtime because I was lucky enough to have healthy pods on the old k8s during the transition.

davidcomeyne commented 6 years ago

Thanks @kachkaev for responding. I will nonetheless give it another try. If I find something I will make sure to post it here...

danroliver commented 6 years ago

If you are using a version of kubeadm prior to 1.8, where I understand certificate rotation #206 was put into place (as a beta feature) or your certs already expired, then you will need to manually update your certs (or recreate your cluster which it appears some (not just @kachkaev) end up resorting to).

You will need to SSH into your master node. If you are using kubeadm >= 1.8 skip to 2.

  1. Update Kubeadm, if needed. I was on 1.7 previously.
$ sudo curl -sSL https://dl.k8s.io/release/v1.8.15/bin/linux/amd64/kubeadm > ./kubeadm.1.8.15
$ chmod a+rx kubeadm.1.8.15
$ sudo mv /usr/bin/kubeadm /usr/bin/kubeadm.1.7
$ sudo mv kubeadm.1.8.15 /usr/bin/kubeadm
  1. Backup old apiserver, apiserver-kubelet-client, and front-proxy-client certs and keys.
$ sudo mv /etc/kubernetes/pki/apiserver.key /etc/kubernetes/pki/apiserver.key.old
$ sudo mv /etc/kubernetes/pki/apiserver.crt /etc/kubernetes/pki/apiserver.crt.old
$ sudo mv /etc/kubernetes/pki/apiserver-kubelet-client.crt /etc/kubernetes/pki/apiserver-kubelet-client.crt.old
$ sudo mv /etc/kubernetes/pki/apiserver-kubelet-client.key /etc/kubernetes/pki/apiserver-kubelet-client.key.old
$ sudo mv /etc/kubernetes/pki/front-proxy-client.crt /etc/kubernetes/pki/front-proxy-client.crt.old
$ sudo mv /etc/kubernetes/pki/front-proxy-client.key /etc/kubernetes/pki/front-proxy-client.key.old
  1. Generate new apiserver, apiserver-kubelet-client, and front-proxy-client certs and keys.
$ sudo kubeadm alpha phase certs apiserver --apiserver-advertise-address <IP address of your master server>
$ sudo kubeadm alpha phase certs apiserver-kubelet-client
$ sudo kubeadm alpha phase certs front-proxy-client
  1. Backup old configuration files
$ sudo mv /etc/kubernetes/admin.conf /etc/kubernetes/admin.conf.old
$ sudo mv /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.old
$ sudo mv /etc/kubernetes/controller-manager.conf /etc/kubernetes/controller-manager.conf.old
$ sudo mv /etc/kubernetes/scheduler.conf /etc/kubernetes/scheduler.conf.old
  1. Generate new configuration files.

There is an important note here. If you are on AWS, you will need to explicitly pass the --node-name parameter in this request. Otherwise you will get an error like: Unable to register node "ip-10-0-8-141.ec2.internal" with API server: nodes "ip-10-0-8-141.ec2.internal" is forbidden: node ip-10-0-8-141 cannot modify node ip-10-0-8-141.ec2.internal in your logs sudo journalctl -u kubelet --all | tail and the Master Node will report that it is Not Ready when you run kubectl get nodes.

Please be certain to replace the values passed in --apiserver-advertise-address and --node-name with the correct values for your environment.

$ sudo kubeadm alpha phase kubeconfig all --apiserver-advertise-address 10.0.8.141 --node-name ip-10-0-8-141.ec2.internal
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
  1. Ensure that your kubectl is looking in the right place for your config files.
$ mv .kube/config .kube/config.old
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ sudo chmod 777 $HOME/.kube/config
$ export KUBECONFIG=.kube/config
  1. Reboot your master node
$ sudo /sbin/shutdown -r now
  1. Reconnect to your master node and grab your token, and verify that your Master Node is "Ready". Copy the token to your clipboard. You will need it in the next step.
$ kubectl get nodes
$ kubeadm token list

If you do not have a valid token. You can create one with:

$ kubeadm token create

The token should look something like 6dihyb.d09sbgae8ph2atjw

  1. SSH into each of the slave nodes and reconnect them to the master
$ sudo curl -sSL https://dl.k8s.io/release/v1.8.15/bin/linux/amd64/kubeadm > ./kubeadm.1.8.15
$ chmod a+rx kubeadm.1.8.15
$ sudo mv /usr/bin/kubeadm /usr/bin/kubeadm.1.7
$ sudo mv kubeadm.1.8.15 /usr/bin/kubeadm
$ sudo kubeadm join --token=<token from step 8>  <ip of master node>:<port used 6443 is the default> --node-name <should be the same one as from step 5>
  1. Repeat Step 9 for each connecting node. From the master node, you can verify that all slave nodes have connected and are ready with:
$ kubectl get nodes

Hopefully this gets you where you need to be @davidcomeyne.

davidcomeyne commented 6 years ago

Thanks a bunch @danroliver ! I will definitely try that and post my findings here.

ivan4th commented 5 years ago

@danroliver Thanks! Just tried it on an old single-node cluster, so did steps up to 7. It worked.

dmellstrom commented 5 years ago

@danroliver Worked for me. Thank you.

davidcomeyne commented 5 years ago

Did not work for me, had to set up a new cluster. But glad it helped others!

fovecifer commented 5 years ago

thank you @danroliver . it works for me and my kubeadm version is 1.8.5

kvchitrapu commented 5 years ago

Thanks @danroliver putting together the steps. I had to make small additions to your steps. My cluster is running v1.9.3 and it is in a private datacenter off of the Internet.

On the Master

  1. Prepare a kubeadm config.yml.
    apiVersion: kubeadm.k8s.io/v1alpha1
    kind: MasterConfiguration
    api:
    advertiseAddress: <master-ip>
    kubernetesVersion: 1.9.3
  2. Backup certs and conf files
    
    mkdir ~/conf-archive/
    for f in `ls *.conf`;do mv $f ~/conf-archive/$f.old;done

mkdir ~/pki-archive for f in ls apiserver* front-*client*;do mv $f ~/pki-archive/$f.old;done


2. The kubeadm commands on master had `--config config.yml` like this:

kubeadm alpha phase certs apiserver --config ./config.yml kubeadm alpha phase certs apiserver-kubelet-client --config ./config.yml kubeadm alpha phase certs front-proxy-client --config ./config.yml kubeadm alpha phase kubeconfig all --config ./config.yml --node-name reboot

3. Create token

### On the minions

I had to move 

mv /etc/kubernetes/pki/ca.crt ~/archive/ mv /etc/kubernetes/kubelet.conf ~/archive/ systemctl stop kubelet kubeadm join --token=eeefff.55550009999b3333 --discovery-token-unsafe-skip-ca-verification :6443

borispf commented 5 years ago

Thanks @danroliver! Only my single-node cluster it was enough to follow steps 1-6 (no reboot) then send a SIGHUP to kube-apiserver. Just found the container id with docker ps and set the signal with docker kill -s HUP <container id>.

BastienL commented 5 years ago

Thanks a lot @danroliver! On our single-master/multi-workers cluster, doing the steps from 1 to 7 were enough, we did not have to reconnect every worker node to the master (which was the most painful part).

kcronin commented 5 years ago

Thanks for this great step-by-step, @danroliver! I'm wondering how this process might be applied to a multi-master cluster (bare metal, currently running 1.11.1), and preferably without downtime. My certs are not yet expired, but I am trying to learn how to regenerate/renew them before that happens.

neolit123 commented 5 years ago

@kcronin please take a look at this new document: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/ hope that helps.

radanliska commented 5 years ago

@danroliver: Thank you very much, it's working.

It's not necessary to reboot the servers. It's enought to recreate kube-system pods (apiserver, schduler, ...) by these two commands:

systemctl restart kubelet for i in $(docker ps | egrep 'admin|controller|scheduler|api|fron|proxy' | rev | awk '{print $1}' | rev); do docker stop $i; done

pmcgrath commented 5 years ago

I had to deal with this also on a 1.13 cluster, in my case the certificates were about to expire so slightly different Also dealing with a single master\control instance on premise so did not have to worry about a HA setup or AWS specifics Have not included the back steps as the other guys have included above

Since the certs had not expired, the cluster already had workloads which I wanted to continue working Did not have to deal with etcd certs either at this time so have omitted

So at a high level I had to

# On master - See https://kubernetes.io/docs/setup/certificates/#all-certificates

# Generate the new certificates - you may have to deal with AWS - see above re extra certificate SANs
sudo kubeadm alpha certs renew apiserver
sudo kubeadm alpha certs renew apiserver-etcd-client
sudo kubeadm alpha certs renew apiserver-kubelet-client
sudo kubeadm alpha certs renew front-proxy-client

# Generate new kube-configs with embedded certificates - Again you may need extra AWS specific content - see above
sudo kubeadm alpha kubeconfig user --org system:masters --client-name kubernetes-admin  > admin.conf
sudo kubeadm alpha kubeconfig user --client-name system:kube-controller-manager > controller-manager.conf
sudo kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf
sudo kubeadm alpha kubeconfig user --client-name system:kube-scheduler > scheduler.conf

# chown and chmod so they match existing files
sudo chown root:root {admin,controller-manager,kubelet,scheduler}.conf
sudo chmod 600 {admin,controller-manager,kubelet,scheduler}.conf

# Move to replace existing kubeconfigs
sudo mv admin.conf /etc/kubernetes/
sudo mv controller-manager.conf /etc/kubernetes/
sudo mv kubelet.conf /etc/kubernetes/
sudo mv scheduler.conf /etc/kubernetes/

# Restart the master components
sudo kill -s SIGHUP $(pidof kube-apiserver)
sudo kill -s SIGHUP $(pidof kube-controller-manager)
sudo kill -s SIGHUP $(pidof kube-scheduler)

# Verify master component certificates - should all be 1 year in the future
# Cert from api-server
echo -n | openssl s_client -connect localhost:6443 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not
# Cert from controller manager
echo -n | openssl s_client -connect localhost:10257 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not
# Cert from scheduler
echo -n | openssl s_client -connect localhost:10259 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not

# Generate kubelet.conf
sudo kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf
sudo chown root:root kubelet.conf
sudo chmod 600 kubelet.conf

# Drain
kubectl drain --ignore-daemonsets $(hostname)
# Stop kubelet
sudo systemctl stop kubelet
# Delete files
sudo rm /var/lib/kubelet/pki/*
# Copy file
sudo mv kubelet.conf /etc/kubernetes/
# Restart
sudo systemctl start kubelet
# Uncordon
kubectl uncordon $(hostname)

# Check kubelet
echo -n | openssl s_client -connect localhost:10250 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not

Lets create a new token for nodes re-joining the cluster (After kubelet restart)

# On master
sudo kubeadm token create

Now for each worker - one at a time

kubectl drain --ignore-daemonsets --delete-local-data WORKER-NODE-NAME

ssh to worker node

# Stop kubelet
sudo systemctl stop kubelet

# Delete files
sudo rm /etc/kubernetes/kubelet.conf
sudo rm /var/lib/kubelet/pki/*

# Alter the bootstrap token
new_token=TOKEN-FROM-CREATION-ON-MASTER
sudo sed -i "s/token: .*/token: $new_token/" /etc/kubernetes/bootstrap-kubelet.conf

# Start kubelet
sudo systemctl start kubelet

# Check kubelet certificate
echo -n | openssl s_client -connect localhost:10250 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text -noout | grep Not
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -text -noout | grep Not

Back to master and uncordon the worker

kubectl uncordon WORKER-NODE-NAME

After all workers have been updated - Remove token - will expire in 24h but lets get rid of it

On master
sudo kubeadm token delete TOKEN-FROM-CREATION-ON-MASTER
rodrigc commented 5 years ago

@pmcgrath Thanks for posting those steps. I managed to follow them and renew my certificates, and get a working cluster.

desdic commented 5 years ago

If you are using a version of kubeadm prior to 1.8, where I understand certificate rotation #206 was put into place (as a beta feature) or your certs already expired, then you will need to manually update your certs (or recreate your cluster which it appears some (not just @kachkaev) end up resorting to).

You will need to SSH into your master node. If you are using kubeadm >= 1.8 skip to 2.

1. Update Kubeadm, if needed. I was on 1.7 previously.
$ sudo curl -sSL https://dl.k8s.io/release/v1.8.15/bin/linux/amd64/kubeadm > ./kubeadm.1.8.15
$ chmod a+rx kubeadm.1.8.15
$ sudo mv /usr/bin/kubeadm /usr/bin/kubeadm.1.7
$ sudo mv kubeadm.1.8.15 /usr/bin/kubeadm
1. Backup old apiserver, apiserver-kubelet-client, and front-proxy-client certs and keys.
$ sudo mv /etc/kubernetes/pki/apiserver.key /etc/kubernetes/pki/apiserver.key.old
$ sudo mv /etc/kubernetes/pki/apiserver.crt /etc/kubernetes/pki/apiserver.crt.old
$ sudo mv /etc/kubernetes/pki/apiserver-kubelet-client.crt /etc/kubernetes/pki/apiserver-kubelet-client.crt.old
$ sudo mv /etc/kubernetes/pki/apiserver-kubelet-client.key /etc/kubernetes/pki/apiserver-kubelet-client.key.old
$ sudo mv /etc/kubernetes/pki/front-proxy-client.crt /etc/kubernetes/pki/front-proxy-client.crt.old
$ sudo mv /etc/kubernetes/pki/front-proxy-client.key /etc/kubernetes/pki/front-proxy-client.key.old
1. Generate new apiserver, apiserver-kubelet-client, and front-proxy-client certs and keys.
$ sudo kubeadm alpha phase certs apiserver --apiserver-advertise-address <IP address of your master server>
$ sudo kubeadm alpha phase certs apiserver-kubelet-client
$ sudo kubeadm alpha phase certs front-proxy-client
1. Backup old configuration files
$ sudo mv /etc/kubernetes/admin.conf /etc/kubernetes/admin.conf.old
$ sudo mv /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.old
$ sudo mv /etc/kubernetes/controller-manager.conf /etc/kubernetes/controller-manager.conf.old
$ sudo mv /etc/kubernetes/scheduler.conf /etc/kubernetes/scheduler.conf.old
1. Generate new configuration files.

There is an important note here. If you are on AWS, you will need to explicitly pass the --node-name parameter in this request. Otherwise you will get an error like: Unable to register node "ip-10-0-8-141.ec2.internal" with API server: nodes "ip-10-0-8-141.ec2.internal" is forbidden: node ip-10-0-8-141 cannot modify node ip-10-0-8-141.ec2.internal in your logs sudo journalctl -u kubelet --all | tail and the Master Node will report that it is Not Ready when you run kubectl get nodes.

Please be certain to replace the values passed in --apiserver-advertise-address and --node-name with the correct values for your environment.

$ sudo kubeadm alpha phase kubeconfig all --apiserver-advertise-address 10.0.8.141 --node-name ip-10-0-8-141.ec2.internal
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
1. Ensure that your `kubectl` is looking in the right place for your config files.
$ mv .kube/config .kube/config.old
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
$ sudo chmod 777 $HOME/.kube/config
$ export KUBECONFIG=.kube/config
1. Reboot your master node
$ sudo /sbin/shutdown -r now
1. Reconnect to your master node and grab your token, and verify that your Master Node is "Ready". Copy the token to your clipboard. You will need it in the next step.
$ kubectl get nodes
$ kubeadm token list

If you do not have a valid token. You can create one with:

$ kubeadm token create

The token should look something like 6dihyb.d09sbgae8ph2atjw

1. SSH into each of the slave nodes and reconnect them to the master
$ sudo curl -sSL https://dl.k8s.io/release/v1.8.15/bin/linux/amd64/kubeadm > ./kubeadm.1.8.15
$ chmod a+rx kubeadm.1.8.15
$ sudo mv /usr/bin/kubeadm /usr/bin/kubeadm.1.7
$ sudo mv kubeadm.1.8.15 /usr/bin/kubeadm
$ sudo kubeadm join --token=<token from step 8>  <ip of master node>:<port used 6443 is the default> --node-name <should be the same one as from step 5>
1. Repeat Step 9 for each connecting node. From the master node, you can verify that all slave nodes have connected and are ready with:
$ kubectl get nodes

Hopefully this gets you where you need to be @davidcomeyne.

This is what I need only for 1.14.2 .. any hints on how to

I had to deal with this also on a 1.13 cluster, in my case the certificates were about to expire so slightly different Also dealing with a single master\control instance on premise so did not have to worry about a HA setup or AWS specifics Have not included the back steps as the other guys have included above

Since the certs had not expired, the cluster already had workloads which I wanted to continue working Did not have to deal with etcd certs either at this time so have omitted

So at a high level I had to

* On the master

  * Generate new certificates on the master
  * Generate new kubeconfigs with embedded certificates
  * Generate new kubelet certicates - client and server
  * Generate a new token for the worker node kubelets

* For each worker

  * Drain the worker first on the master
  * ssh to the worker, stop the kubelet, remove files and restart the kubelet
  * Uncordon the worker on the master

* On master at the end

  * Delete token
# On master - See https://kubernetes.io/docs/setup/certificates/#all-certificates

# Generate the new certificates - you may have to deal with AWS - see above re extra certificate SANs
sudo kubeadm alpha certs renew apiserver
sudo kubeadm alpha certs renew apiserver-etcd-client
sudo kubeadm alpha certs renew apiserver-kubelet-client
sudo kubeadm alpha certs renew front-proxy-client

# Generate new kube-configs with embedded certificates - Again you may need extra AWS specific content - see above
sudo kubeadm alpha kubeconfig user --org system:masters --client-name kubernetes-admin  > admin.conf
sudo kubeadm alpha kubeconfig user --client-name system:kube-controller-manager > controller-manager.conf
sudo kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf
sudo kubeadm alpha kubeconfig user --client-name system:kube-scheduler > scheduler.conf

# chown and chmod so they match existing files
sudo chown root:root {admin,controller-manager,kubelet,scheduler}.conf
sudo chmod 600 {admin,controller-manager,kubelet,scheduler}.conf

# Move to replace existing kubeconfigs
sudo mv admin.conf /etc/kubernetes/
sudo mv controller-manager.conf /etc/kubernetes/
sudo mv kubelet.conf /etc/kubernetes/
sudo mv scheduler.conf /etc/kubernetes/

# Restart the master components
sudo kill -s SIGHUP $(pidof kube-apiserver)
sudo kill -s SIGHUP $(pidof kube-controller-manager)
sudo kill -s SIGHUP $(pidof kube-scheduler)

# Verify master component certificates - should all be 1 year in the future
# Cert from api-server
echo -n | openssl s_client -connect localhost:6443 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not
# Cert from controller manager
echo -n | openssl s_client -connect localhost:10257 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not
# Cert from scheduler
echo -n | openssl s_client -connect localhost:10259 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not

# Generate kubelet.conf
sudo kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf
sudo chown root:root kubelet.conf
sudo chmod 600 kubelet.conf

# Drain
kubectl drain --ignore-daemonsets $(hostname)
# Stop kubelet
sudo systemctl stop kubelet
# Delete files
sudo rm /var/lib/kubelet/pki/*
# Copy file
sudo mv kubelet.conf /etc/kubernetes/
# Restart
sudo systemctl start kubelet
# Uncordon
kubectl uncordon $(hostname)

# Check kubelet
echo -n | openssl s_client -connect localhost:10250 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not

Lets create a new token for nodes re-joining the cluster (After kubelet restart)

# On master
sudo kubeadm token create

Now for each worker - one at a time

kubectl drain --ignore-daemonsets --delete-local-data WORKER-NODE-NAME

ssh to worker node

# Stop kubelet
sudo systemctl stop kubelet

# Delete files
sudo rm /etc/kubernetes/kubelet.conf
sudo rm /var/lib/kubelet/pki/*

# Alter the bootstrap token
new_token=TOKEN-FROM-CREATION-ON-MASTER
sudo sed -i "s/token: .*/token: $new_token/" /etc/kubernetes/bootstrap-kubelet.conf

# Start kubelet
sudo systemctl start kubelet

# Check kubelet certificate
echo -n | openssl s_client -connect localhost:10250 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -text -noout | grep Not
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text -noout | grep Not
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -text -noout | grep Not

Back to master and uncordon the worker

kubectl uncordon WORKER-NODE-NAME

After all workers have been updated - Remove token - will expire in 24h but lets get rid of it

On master
sudo kubeadm token delete TOKEN-FROM-CREATION-ON-MASTER

I know this issue is closed but I have the same problem on 1.14.2 and the guide gives no errors but I cannot connect to the cluster and reissue the token (I get auth failed)

terrywang commented 5 years ago

A k8s cluster created using kubeadm v1.9.x experienced the same issue (apiserver-kubelet-client.crt expired on 2 July) at the age of v1.14.1 lol

I had to refer to 4 different sources to renew the certificates, regenerate the configuration files and bring the simple 3 node cluster back.

@danroliver gave very good and structured instructions, very close to the below guide from IBM. [Renewing Kubernetes cluster certificates] from IBM WoW! (https://www.ibm.com/support/knowledgecenter/en/SSCKRH_1.1.0/platform/t_certificate_renewal.html)

NOTE: IBM Financial Crimes Insight with Watson private is powered by k8s, never knew that.

Problem with step 3 and step 5

Step 3 should NOT have the phase in the command

$ sudo kubeadm alpha certs renew apiserver
$ sudo kubeadm alpha certs renew apiserver-kubelet-client
$ sudo kubeadm alpha certs renew front-proxy-client

Step 5 should be using below, kubeadm alpha does not have kubeconfig all, that is a kubeadm init phase instead

# kubeadm init phase kubeconfig all
I0705 12:42:24.056152   32618 version.go:240] remote version is much newer: v1.15.0; falling back to: stable-1.14
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
neolit123 commented 5 years ago

in 1.15 we have added better documentation for certificate renewal: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/

also, after 1.15 kubeadm upgrade automatically will renewal the certificates for you!

A k8s cluster created using kubeadm v1.9.x experienced the same issue (apiserver-kubelet-client.crt expired on 2 July) at the age of v1.14.1 lol

versions older than 1.13 are already unsupported. we strongly encourage the users to keep up with this fast moving project.

currently there are discussions going on by the LongTermSupport Working Group, to have versions of kubernetes being supported for longer periods of time, but establishing the process might take a while.

ykfq commented 5 years ago

Thanks @pmorie . Works for kube version 1.13.6

williamstein commented 5 years ago

Just a comment and feature request: This cert expiration hit us in production on our Kubernetes 1.11.x cluster this morning. We tried everything above (and to links), but hit numerous errors, gave up after a few hours getting completely stuck with a large hosed cluster. Fortunately, we were about 2 weeks away from upgrading to Kubernetes 1.15 (and building a new cluster) so we ended up just creating a new 1.15 cluster from scratch and copying over all our user data.

I very much wish there had been some warning before this happened. We just went from "incredibly stable cluster" to "completely broken hellish nightmare" without any warning, and had probably our worst downtime ever. Fortunately, it was a west coast Friday afternoon, so relatively minimally impactful.

Of everything discussed above and in all the linked tickets, the one thing that would have made a massive difference for us isn't mentioned: start displaying a warning when certs are going to expire soon. (E.g., if you use kubectl, and the cert is going to expire within a few weeks, please tell me!).

neolit123 commented 5 years ago

Sorry for your troubles. Normally it is the responsibility of the operator to monitor the certs on disk for expiration. But i do agree that the lack of easy monitoring can cause trouble. That is one of the reasons we added a command to check cert expiration in kubeadm. See https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/

Also please note that after 1.15 kubeadm will auto renew certificates on upgrade. Which encourages the users to upgrade more often too. On Jul 20, 2019 03:49, "William Stein" notifications@github.com wrote:

Just a comment and feature request: This cert expiration hit us in production on our Kubernetes 1.11.x cluster this morning. We tried everything above (and to links), but hit numerous errors, gave up after a few hours getting completely stuck with a large hosed cluster. Fortunately, we were about 2 weeks away from upgrading to Kubernetes 1.15 (and building a new cluster) so we ended up just creating a new 1.15 cluster from scratch and copying over all our user data.

I very much wish there had been some warning before this happened. We just went from "incredibly stable cluster" to "completely broken hellish nightmare" without any warning, and had probably our worst downtime ever. Fortunately, it was a west coast Friday afternoon, so relatively minimally impactful.

Of everything discussed above and in all the linked tickets, the one thing that would have made a massive difference for us isn't mentioned: start displaying a warning when certs are going to expire soon. (E.g., if you use kubectl, and the cert is going to expire within a few weeks, please tell me!).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubeadm/issues/581?email_source=notifications&email_token=AACRATDWBQHYVVRG4LYVTXLQAJOJHA5CNFSM4EGBFHKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NCYFA#issuecomment-513420308, or mute the thread https://github.com/notifications/unsubscribe-auth/AACRATC437G4OZ3ZOEQM5LLQAJOJHANCNFSM4EGBFHKA .

williamstein commented 5 years ago

@neolit123 Thanks; we will add something to our own monitoring infrastructure to periodically check for upcoming cert issues, as explained in your comment.

svitlychnyi commented 5 years ago

@danroliver Thx a lot for your reply. It saved lots of time for me. One point worth mentioning is the "etcd" related certificates, which should be renewed in the same way. There is no need for configuration reloading since it is used in metadata YAML files as references.

ttarczynski commented 5 years ago

For Kubernetes v1.14 I find this procedure proposed by @desdic the most helpful:

desdic commented 5 years ago

For Kubernetes v1.14 I find this procedure the most helpful:

* https://stackoverflow.com/a/56334732/1147487

I created the fix once I had my own cluster fixed .. hoped that someone else could use it

danroliver commented 5 years ago

@danroliver gave very good and structured instructions, very close to the below guide from IBM. [Renewing Kubernetes cluster certificates] from IBM WoW! (https://www.ibm.com/support/knowledgecenter/en/SSCKRH_1.1.0/platform/t_certificate_renewal.html)

Nice! I wonder when this was published. I certainly would have found this helpful when I was going through this.

anapsix commented 4 years ago

Note about tokens in K8s 1.13.x (possibly other K8s versions) If you've ended up re-generating your CA certificate (/etc/kubernetes/pki/ca.crt), your tokens (see kubectl -n kube-system get secret | grep token) might have old CA, and will have to be regenerated. Troubled tokens included kube-proxy-token, coredns-token in my case (and others), which caused cluster-critical services to unable to authenticate with K8s API. To regenerate tokens, delete old ones, and they will be recreated. Same goes for any services talking to K8s API, such as PV Provisioner, Ingress Controllers, cert-manager, etc..

shortsteps commented 4 years ago

Thanks for this great step-by-step, @danroliver! I'm wondering how this process might be applied to a multi-master cluster (bare metal, currently running 1.11.1), and preferably without downtime. My certs are not yet expired, but I am trying to learn how to regenerate/renew them before that happens.

Hi @kcronin, how did you solved with multi-master config? I don't know how to proceed with --apiserver-advertise-address as I have 3 IPs and not only one.

Thanks

SuleimanWA commented 4 years ago

@pmcgrath In case I have 3 masters, should I repeat the steps on each master? or what is the . case

anapsix commented 4 years ago

@SuleimanWA, you can copy admin.conf over, as well as CA file, if CA was regenerated. For everything else, you should repeat steps to regenerate certs (for etcd, kubelet, scheduler, etc..) on every master

realshuting commented 4 years ago

@anapsix I'm running a 1.13.x cluster, and apiserver is reporting Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] after I renewed the certs by running kubeadm alpha certs renew all.

To regenerate tokens, delete old ones, and they will be recreated.

Which token are you referring to in this case? Is the one generated by kubeadm or how can I delete the token ?

-----UPDATE----- I figured out it's the secret itself. In my case the kube-controller was not up so the secret was not auto-generated.

yanpengfei commented 4 years ago

high version use:

kubeadm alpha certs renew all

leh327 commented 3 years ago

When first master node's kubelet down (systemctl stop kubelet), other master nodes can't contact CA on the first master node. This resulting in the following message until kubelet on original master node brought back online:

kubectl get nodes Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

Is there a way to have CA role transfer to other master nodes while the kublet on original CA node down?

sumitKash commented 3 years ago

@anapsix I'm running a 1.13.x cluster, and apiserver is reporting Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid] after I renewed the certs by running kubeadm alpha certs renew all.

To regenerate tokens, delete old ones, and they will be recreated.

Which token are you referring to in this case? Is the one generated by kubeadm or how can I delete the token ?

-----UPDATE----- I figured out it's the secret itself. In my case the kube-controller was not up so the secret was not auto-generated.

Hi, i have done this task but not on 1.13 version. May i ask few things if you have done this already? So basically i will be doing: kubeadm alpha certs renew all (which updates the control plane cert uber pki/ folder on Masters). kubeadm init phase kubeconfig to update the kube config files. (On Master and worker). Restart kubelet on all nodes.

Do i still need to create a token and run join on worker nodes? If possible, can you shares the steps you performed?

lisenet commented 3 years ago

@pmcgrath thanks a bunch for your comment, I used the instructions to update certificates on my Kubernetes 1.13 cluster.

usamacheema786 commented 3 years ago

simplest way to update your k8s certs

kubeadm alpha certs check-expiration

kubeadm alpha certs renew all
sudo kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf
systemctl daemon-reload&&systemctl restart kubelet
neolit123 commented 3 years ago

sudo kubeadm alpha kubeconfig user --org system:nodes --client-name system:node:$(hostname) > kubelet.conf

you might also what to symlink the cert / key to files if kubelet client cert is enabled (it is by default):

client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/

gemfield commented 3 years ago

For k8s 1.15 ~ 1.18, this may be helpful: https://zhuanlan.zhihu.com/p/382605009

crrazyman commented 2 years ago

For Kubernetes v1.14 I find this procedure proposed by @desdic the most helpful:

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/
$ kubeadm init phase certs all --apiserver-advertise-address <IP>
  • backup and re-generate all kubeconfig files:
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,mv kubelet.conf,scheduler.conf} ~/
$ kubeadm init phase kubeconfig all
$ reboot
  • copy new admin.conf:
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

Hello,

After following this ^ everything is OK (kubectl get nodes shows both nodes Ready) BUT! alot of pods (in kube-system and in all other namespaces) are stuck in ContainerCreating state. also, in kube-system:

root@kube-master:~# kubectl -n kube-system get pods
NAME                                                               READY   STATUS              RESTARTS   AGE
calico-kube-controllers-6fc9b4f7d9-kmt9q                           0/1     Running             1          26m
calico-node-hb6rm                                                  1/1     Running             5          380d
calico-node-vtt6l                                                  1/1     Running             11         384d
coredns-74c9d4d795-l6x9h                                           0/1     ContainerCreating   0          5m9s
coredns-74c9d4d795-m6lgf                                           0/1     ContainerCreating   0          5m9s
dns-autoscaler-576b576b74-stnrz                                    0/1     ContainerCreating   0          5m9s
kube-apiserver-kube-master.domain.com                               1/1     Running             1654       84m
kube-controller-manager-kube-master.domain.com                      1/1     Running             77         385d
kube-proxy-7bgjn                                                   1/1     Running             6          380d
kube-proxy-pq4wr                                                   1/1     Running             5          380d
kube-scheduler-kube-master.domain.com                               1/1     Running             72         385d
kubernetes-dashboard-7c547b4c64-qhk2k                              0/1     Error               0          380d
nginx-proxy-kube-node01.domain.com                                  1/1     Running             2704       380d
nodelocaldns-6c5v2                                                 1/1     Running             5          380d
nodelocaldns-nmtg6                                                 1/1     Running             11         384d

The thing is that now i think that nobody can talk to kube-apiserver:

root@kube-master:~# kubectl -n kube-system logs kube-apiserver-kube-master.domain.com
[....]
I1014 11:51:19.817069       1 log.go:172] http: TLS handshake error from 10.18.74.25:59948: remote error: tls: bad certificate
I1014 11:51:19.819972       1 log.go:172] http: TLS handshake error from 10.18.74.25:59952: remote error: tls: bad certificate
I1014 11:51:20.733394       1 log.go:172] http: TLS handshake error from 127.0.0.1:43966: remote error: tls: bad certificate
I1014 11:51:20.734734       1 log.go:172] http: TLS handshake error from 127.0.0.1:43968: remote error: tls: bad certificate
I1014 11:51:20.823670       1 log.go:172] http: TLS handshake error from 10.18.74.25:59978: remote error: tls: bad certificate
I1014 11:51:20.823905       1 log.go:172] http: TLS handshake error from 10.18.74.25:59974: remote error: tls: bad certificate

All the pods that are stuck in ContaierCreating shows this in description:

  Normal   SandboxChanged          10m (x13 over 16m)    kubelet, kube-node01.domain.com  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  92s (x18 over 11m)    kubelet, kube-node01.domain.com  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f0e95a2fec2492c63c6756e3d71be3ee911a4c872a83f6f20c98cfc261045e7f" network for pod "www-mongo-0": NetworkPlugin cni failed to set up pod "www-mongo-0_default" network: Get https://[10.233.0.1]:443/api/v1/namespaces/default: dial tcp 10.233.0.1:443: i/o timeout

I have a cluster of 2 nodes:

neolit123 commented 2 years ago

Normal SandboxChanged 10m (x13 over 16m) kubelet, kube-node01.domain.com Pod sandbox changed, it will be killed and re-created. Warning FailedCreatePodSandBox 92s (x18 over 11m) kubelet, kube-node01.domain.com (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f0e95a2fec2492c63c6756e3d71be3ee911a4c872a83f6f20c98cfc261045e7f" network for pod "www-mongo-0": NetworkPlugin cni failed to set up pod "www-mongo-0_default" network: Get https://[10.233.0.1]:443/api/v1/namespaces/default: dial tcp 10.233.0.1:443: i/o timeout

that seems like a CNI plugin problem. you could try removing Calico with kubectl delete -f .... and adding it again.

crrazyman commented 2 years ago

This cluster is made with kubespray, i cannot delete calico and add it again. Also, i don't think this is a problem with the CNI. Why does kube-apiserver logs http: TLS handshake error from 10.18.74.25:59948: remote error: tls: bad certificate ? Again, the cluster was working perfectly until i renewed the certificates

kruserr commented 2 years ago

For anyone that stumbles upon this in the future, which are running a newer version of kubernetes >1.17, this is probably the simplest way to renew your certs.

The following renews all certs, restarts kubelet, takes a backup of the old admin config and applies the new admin config:

kubeadm certs renew all
systemctl restart kubelet
cp /root/.kube/config /root/.kube/.old-$(date --iso)-config
cp /etc/kubernetes/admin.conf /root/.kube/config
amiyaranjansahoo commented 2 years ago

@danroliver, Thanks danroliver for the detailed instructions. It worked fine for single master k8s cluster ( 1 Master + 3 Worker) However i have a multi master k8s cluster ( 3 Master + 5 Worker), so do you think I should follow the same approach to renew the certificates or any additional steps would be required. FYI I am on v1.12.10 Thanks in advance..

titaneric commented 2 years ago

For this case, you may still need to ssh into these 3 master node and update the certificates by providing commands cause each master node have their individual api server.

amiyaranjansahoo commented 2 years ago

Thank you @titaneric, understood i need to recreate/renew the certificates across all master node separaetly.

What about the Step 4 and Step5

step 4 - moving the below old files


/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf

Step5 - Generating admin.conf, kubelet.conf, controller-manager.conf and scheduler.conf

Using below command sudo kubeadm alpha phase kubeconfig all --apiserver-advertise-address A.B.C.D

Because I can see only the cksum value of admin.conf is same across all master nodes but cksum of rest of the files (kubelet.conf, controller-manager.conf and scheduler.conf) are different acorss the master nodes.

Akash3221 commented 1 year ago

For anyone that stumbles upon this in the future, which are running a newer version of kubernetes >1.17, this is probably the simplest way to renew your certs.

The following renews all certs, restarts kubelet, takes a backup of the old admin config and applies the new admin config:

kubeadm certs renew all
systemctl restart kubelet
cp /root/.kube/config /root/.kube/.old-$(date --iso)-config
cp /etc/kubernetes/admin.conf /root/.kube/config

Hi @kruserr After updating the certificate successfully using above commands , when I delete the namespace it keep stuck on terminating state, does anybody has clarity on this