rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.57k stars 268 forks source link

Emit events for certificates about to expire #5620

Closed brandond closed 7 months ago

brandond commented 8 months ago

RKE2 tracking issue for

ShylajaDevadiga commented 7 months ago

Validated using latest commit id 95e13dc62fdbda33de2c709f1149b0c361d920b9 on master branch

Environment Details

Infrastructure Cloud EC2 instance

Node(s) CPU architecture, OS, and Version:

cat /etc/os-release
NAME="SLES"
VERSION="15-SP5"
VERSION_ID="15.5"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"

Cluster Configuration: 1 server 1 agent node

Config.yaml:

cat /etc/rancher/rke2/config.yaml
write-kubeconfig-mode: "0644"
token: <TOKEN>

Steps to reproduce the issue and validate the fix

  1. Copy config.yaml
  2. Set env variable CATTLE_NEW_SIGNED_CERT_EXPIRATION_DAYS=30 in /etc/default/rke2-server file
  3. Install rke2
  4. Check the warning when the certs are within 90 days of expiring

Validation results:

Details

rke2 -v rke2 version v1.29.3+dev.95e13dc6 (95e13dc62fdbda33de2c709f1149b0c361d920b9) go version go1.21.8 X:boringcrypto

ec2-user@ip-172-31-1-234:~> kubectl get event|grep -i cert 3m59s Warning CertificateExpirationWarning node/ip-172-31-1-234 Node certificates require attention - restart rke2 on this node to trigger automatic rotation: rke2-controller/client-rke2-controller.crt: certificate CN=system:rke2-controller will expire within 90 days at 2024-05-15T23:26:07Z, rke2-controller/client-rke2-controller.crt: certificate CN=system:rke2-controller will expire within 90 days at 2024-05-15T23:26:07Z, auth-proxy/client-auth-proxy.crt: certificate CN=system:auth-proxy will expire within 90 days at 2024-05-15T23:26:07Z, cloud-controller/client-rke2-cloud-controller.crt: certificate CN=rke2-cloud-controller-manager will expire within 90 days at 2024-05-15T23:26:07Z, etcd/client.crt: certificate CN=etcd-client will expire within 90 days at 2024-05-15T23:26:07Z, etcd/server-client.crt: certificate CN=etcd-server will expire within 90 days at 2024-05-15T23:26:07Z, etcd/peer-server-client.crt: certificate CN=etcd-peer will expire within 90 days at 2024-05-15T23:26:07Z, scheduler/client-scheduler.crt: certificate CN=system:kube-scheduler will expire within 90 days at 2024-05-15T23:26:07Z, supervisor/client-supervisor.crt: certificate CN=system:rke2-supervisor,O=system:masters will expire within 90 days at 2024-05-15T23:26:07Z, kube-proxy/client-kube-proxy.crt: certificate CN=system:kube-proxy will expire within 90 days at 2024-05-15T23:26:07Z, kube-proxy/client-kube-proxy.crt: certificate CN=system:kube-proxy will expire within 90 days at 2024-05-15T23:26:07Z, kubelet/client-kubelet.crt: certificate CN=system:node:ip-172-31-1-234,O=system:nodes will expire within 90 days at 2024-05-15T23:26:09Z, kubelet/serving-kubelet.crt: certificate CN=ip-172-31-1-234 will expire within 90 days at 2024-05-15T23:26:09Z, api-server/client-kube-apiserver.crt: certificate CN=system:apiserver,O=system:masters will expire within 90 days at 2024-05-15T23:26:07Z, api-server/serving-kube-apiserver.crt: certificate CN=kube-apiserver will expire within 90 days at 2024-05-15T23:26:07Z, admin/client-admin.crt: certificate CN=system:admin,O=system:masters will expire within 90 days at 2024-05-15T23:26:07Z, controller-manager/client-controller.crt: certificate CN=system:kube-controller-manager will expire within 90 days at 2024-05-15T23:26:07Z

ec2-user@ip-172-31-1-234:~> sudo /usr/local/bin/rke2 certificate check INFO[0000] Server detected, checking agent and server certificates INFO[0000] Checking certificates for admin WARN[0000] /var/lib/rancher/rke2/server/tls/client-admin.crt: certificate CN=system:admin,O=system:masters will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-admin.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for kube-proxy WARN[0000] /var/lib/rancher/rke2/server/tls/client-kube-proxy.crt: certificate CN=system:kube-proxy will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-kube-proxy.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z WARN[0000] /var/lib/rancher/rke2/agent/client-kube-proxy.crt: certificate CN=system:kube-proxy will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/agent/client-kube-proxy.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for supervisor WARN[0000] /var/lib/rancher/rke2/server/tls/client-supervisor.crt: certificate CN=system:rke2-supervisor,O=system:masters will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-supervisor.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for kubelet WARN[0000] /var/lib/rancher/rke2/agent/client-kubelet.crt: certificate CN=system:node:ip-172-31-1-234,O=system:nodes will expire within 90 days at 2024-05-15T23:26:09Z INFO[0000] /var/lib/rancher/rke2/agent/client-kubelet.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z WARN[0000] /var/lib/rancher/rke2/agent/serving-kubelet.crt: certificate CN=ip-172-31-1-234 will expire within 90 days at 2024-05-15T23:26:09Z INFO[0000] /var/lib/rancher/rke2/agent/serving-kubelet.crt: certificate CN=rke2-server-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for api-server WARN[0000] /var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt: certificate CN=system:apiserver,O=system:masters will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-kube-apiserver.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z WARN[0000] /var/lib/rancher/rke2/server/tls/serving-kube-apiserver.crt: certificate CN=kube-apiserver will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/serving-kube-apiserver.crt: certificate CN=rke2-server-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for auth-proxy WARN[0000] /var/lib/rancher/rke2/server/tls/client-auth-proxy.crt: certificate CN=system:auth-proxy will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-auth-proxy.crt: certificate CN=rke2-request-header-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for cloud-controller WARN[0000] /var/lib/rancher/rke2/server/tls/client-rke2-cloud-controller.crt: certificate CN=rke2-cloud-controller-manager will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-rke2-cloud-controller.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for controller-manager WARN[0000] /var/lib/rancher/rke2/server/tls/client-controller.crt: certificate CN=system:kube-controller-manager will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-controller.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for etcd WARN[0000] /var/lib/rancher/rke2/server/tls/etcd/client.crt: certificate CN=etcd-client will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/etcd/client.crt: certificate CN=etcd-server-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z WARN[0000] /var/lib/rancher/rke2/server/tls/etcd/server-client.crt: certificate CN=etcd-server will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/etcd/server-client.crt: certificate CN=etcd-server-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z WARN[0000] /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt: certificate CN=etcd-peer will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt: certificate CN=etcd-peer-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for scheduler WARN[0000] /var/lib/rancher/rke2/server/tls/client-scheduler.crt: certificate CN=system:kube-scheduler will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-scheduler.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z INFO[0000] Checking certificates for rke2-controller WARN[0000] /var/lib/rancher/rke2/server/tls/client-rke2-controller.crt: certificate CN=system:rke2-controller will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/server/tls/client-rke2-controller.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z WARN[0000] /var/lib/rancher/rke2/agent/client-rke2-controller.crt: certificate CN=system:rke2-controller will expire within 90 days at 2024-05-15T23:26:07Z INFO[0000] /var/lib/rancher/rke2/agent/client-rke2-controller.crt: certificate CN=rke2-client-ca@1713223567 is ok, expires at 2034-04-13T23:26:07Z

ec2-user@ip-172-31-1-234:~> kubectl get --raw /api/v1/nodes/ip-172-31-1-234/proxy/metrics|grep expiration # HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="3600"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="7200"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="21600"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="43200"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="86400"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="172800"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="345600"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="604800"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="2.592e+06"} 1 apiserver_client_certificate_expiration_seconds_bucket{le="7.776e+06"} 1 apiserver_client_certificate_expiration_seconds_bucket{le="1.5552e+07"} 1 apiserver_client_certificate_expiration_seconds_bucket{le="3.1104e+07"} 1 apiserver_client_certificate_expiration_seconds_bucket{le="+Inf"} 1 apiserver_client_certificate_expiration_seconds_sum 2.591697539218925e+06 apiserver_client_certificate_expiration_seconds_count 1 ec2-user@ip-172-31-1-234:~>

brandond commented 7 months ago

Certificate lifetime metrics are not currently available in RKE2 due to lack of a metrics endpoint in the supervisor, tracking this in https://github.com/rancher/rke2/issues/5786