Closed jlebon closed 5 years ago
@aaronlevy Can you provide where that short-lived certificate located on master or bootstrap. I am in a situation where I create cluster using libvirt (which was up and runing) then I shut it down and 2 days later it is not coming up, etcd have below logs.
I still have that setup handy if you need any more info, I just want to understand what is the best time I can shutdown my libvirt VM and then start when required without having any issue.
[root@test1-master-0 core]# crictl ps
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT
ab1c998534064 94bc3af972c98ce73f99d70bd72144caa8b63e541ccc9d844960b7f0ca77d7c4 4 minutes ago Running etcd-member 1
[root@test1-master-0 core]# crictl logs ab1c998534064
2018-12-05 09:41:31.214799 I | pkg/flags: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd
2018-12-05 09:41:31.215415 I | pkg/flags: recognized and used environment variable ETCD_NAME=etcd-member-test1-master-0
2018-12-05 09:41:31.215476 I | etcdmain: etcd Version: 3.2.14
2018-12-05 09:41:31.215489 I | etcdmain: Git SHA: fb5cd6f1c
2018-12-05 09:41:31.215494 I | etcdmain: Go Version: go1.8.5
2018-12-05 09:41:31.215499 I | etcdmain: Go OS/Arch: linux/amd64
2018-12-05 09:41:31.215505 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2018-12-05 09:41:31.215686 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-12-05 09:41:31.215720 I | embed: peerTLS: cert = /etc/ssl/etcd/system:etcd-peer:test1-etcd-0.tt.testing.crt, key = /etc/ssl/etcd/system:etcd-peer:test1-etcd-0.tt.testing.key, ca = , trusted-ca = /etc/ssl/etcd/ca.crt, client-cert-auth = true
2018-12-05 09:41:31.219274 I | embed: listening for peers on https://0.0.0.0:2380
2018-12-05 09:41:31.219572 I | embed: listening for client requests on 0.0.0.0:2379
2018-12-05 09:41:31.310536 I | etcdserver: name = etcd-member-test1-master-0
2018-12-05 09:41:31.311205 I | etcdserver: data dir = /var/lib/etcd
2018-12-05 09:41:31.311265 I | etcdserver: member dir = /var/lib/etcd/member
2018-12-05 09:41:31.311302 I | etcdserver: heartbeat = 100ms
2018-12-05 09:41:31.311393 I | etcdserver: election = 1000ms
2018-12-05 09:41:31.311473 I | etcdserver: snapshot count = 100000
2018-12-05 09:41:31.311657 I | etcdserver: advertise client URLs = https://192.168.126.11:2379
2018-12-05 09:41:31.554976 I | etcdserver: restarting member 7d3fdaaceb134d3d in cluster d98ef57fc5131193 at commit index 15764
2018-12-05 09:41:31.556475 I | raft: 7d3fdaaceb134d3d became follower at term 2
2018-12-05 09:41:31.556576 I | raft: newRaft 7d3fdaaceb134d3d [peers: [], term: 2, commit: 15764, applied: 0, lastindex: 15764, lastterm: 2]
2018-12-05 09:41:31.710712 W | auth: simple token is not cryptographically signed
2018-12-05 09:41:31.739007 I | etcdserver: starting server... [version: 3.2.14, cluster version: to_be_decided]
2018-12-05 09:41:31.744323 I | embed: ClientTLS: cert = /etc/ssl/etcd/system:etcd-server:test1-etcd-0.tt.testing.crt, key = /etc/ssl/etcd/system:etcd-server:test1-etcd-0.tt.testing.key, ca = , trusted-ca = /etc/ssl/etcd/ca.crt, client-cert-auth = true
2018-12-05 09:41:31.749681 I | etcdserver/membership: added member 7d3fdaaceb134d3d [https://test1-etcd-0.tt.testing:2380] to cluster d98ef57fc5131193
2018-12-05 09:41:31.750073 N | etcdserver/membership: set the initial cluster version to 3.2
2018-12-05 09:41:31.750222 I | etcdserver/api: enabled capabilities for version 3.2
2018-12-05 09:41:32.458097 I | raft: 7d3fdaaceb134d3d is starting a new election at term 2
2018-12-05 09:41:32.458417 I | raft: 7d3fdaaceb134d3d became candidate at term 3
2018-12-05 09:41:32.458500 I | raft: 7d3fdaaceb134d3d received MsgVoteResp from 7d3fdaaceb134d3d at term 3
2018-12-05 09:41:32.458606 I | raft: 7d3fdaaceb134d3d became leader at term 3
2018-12-05 09:41:32.458666 I | raft: raft.node: 7d3fdaaceb134d3d elected leader 7d3fdaaceb134d3d at term 3
2018-12-05 09:41:32.466818 I | embed: ready to serve client requests
2018-12-05 09:41:32.467766 I | etcdserver: published {Name:etcd-member-test1-master-0 ClientURLs:[https://192.168.126.11:2379]} to cluster d98ef57fc5131193
2018-12-05 09:41:32.468564 I | embed: serving client requests on [::]:2379
WARNING: 2018/12/05 09:41:32 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.
I believe the locations are:
opt/openshift/auth/kubeconfig-kubelet
/etc/kubernetes/kubeconfig
(this should be defined by the --bootstrap-kubeconfig
flag.The kubelet will pick a random(?) time before expiration for it to request a new cert. So anywhere in the 30min window after starting, the cert might be rotated.
Ideally we would rotate immediately after it had posted CSR / got a full client cert. This is something that @abhinavdahiya was going to look into this sprint (see https://jira.coreos.com/browse/CORS-810). But there may be some kubelet behaviors that block this.
@aaronlevy So below is the cert details of the master node where I am getting that error and I am not able to see if that is expired.
[root@test1-master-0 kubernetes]# pwd
/etc/kubernetes
[root@test1-master-0 kubernetes]# openssl x509 -in ca.crt -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 0 (0x0)
Signature Algorithm: sha256WithRSAEncryption
Issuer: OU=openshift, CN=root-ca
Validity
Not Before: Dec 5 07:20:17 2018 GMT
Not After : Dec 2 07:20:17 2028 GMT
Subject: OU=openshift, CN=root-ca
[root@test1-master-0 kubernetes]# ls -al /etc/ssl/certs/
total 12
drwxr-xr-x. 2 root root 117 Dec 5 05:55 .
drwxr-xr-x. 5 root root 81 Dec 5 05:55 ..
lrwxrwxrwx. 1 root root 49 Dec 5 05:55 ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
lrwxrwxrwx. 1 root root 55 Dec 5 05:55 ca-bundle.trust.crt -> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt
-rwxr-xr-x. 1 root root 610 Dec 5 05:55 make-dummy-cert
-rw-r--r--. 1 root root 2516 Dec 5 05:55 Makefile
-rwxr-xr-x. 1 root root 829 Dec 5 05:55 renew-dummy-cert
[root@test1-master-0 kubernetes]# openssl x509 -in /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 6828503384748696800 (0x5ec3b7a6437fa4e0)
Signature Algorithm: sha1WithRSAEncryption
Issuer: CN=ACCVRAIZ1, OU=PKIACCV, O=ACCV, C=ES
Validity
Not Before: May 5 09:37:37 2011 GMT
Not After : Dec 31 09:37:37 2030 GMT
Subject: CN=ACCVRAIZ1, OU=PKIACCV, O=ACCV, C=ES
[root@test1-master-0 kubernetes]# openssl x509 -in /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 6828503384748696800 (0x5ec3b7a6437fa4e0)
Signature Algorithm: sha1WithRSAEncryption
Issuer: CN=ACCVRAIZ1, OU=PKIACCV, O=ACCV, C=ES
Validity
Not Before: May 5 09:37:37 2011 GMT
Not After : Dec 31 09:37:37 2030 GMT
[root@test1-master-0 kubernetes]# ls -al /var/lib/kubelet/pki/
total 8
drwxr-xr-x. 2 root root 166 Dec 5 07:35 .
drwxr-xr-x. 7 root root 153 Dec 5 07:28 ..
-rw-------. 1 root root 1187 Dec 5 07:28 kubelet-client-2018-12-05-07-28-01.pem
lrwxrwxrwx. 1 root root 59 Dec 5 07:28 kubelet-client-current.pem -> /var/lib/kubelet/pki/kubelet-client-2018-12-05-07-28-01.pem
-rw-------. 1 root root 1240 Dec 5 07:35 kubelet-server-2018-12-05-07-35-14.pem
lrwxrwxrwx. 1 root root 59 Dec 5 07:35 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2018-12-05-07-35-14.pem
[root@test1-master-0 kubernetes]# openssl x509 -in /var/lib/kubelet/pki/kubelet-client-2018-12-05-07-28-01.pem -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
09:3f:c5:f3:f8:6d:24:e6:7d:18:3e:de:a8:66:5c:bc:90:4e:a8:04
Signature Algorithm: sha256WithRSAEncryption
Issuer: OU=bootkube, CN=kube-ca
Validity
Not Before: Dec 5 07:23:00 2018 GMT
Not After : Jan 4 07:23:00 2019 GMT
Subject: O=system:nodes, CN=system:node:test1-master-0
Subject Public Key Info:
[root@test1-master-0 kubernetes]# openssl x509 -in /var/lib/kubelet/pki/kubelet-server-2018-12-05-07-35-14.pem -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
3e:3d:e3:cc:c8:02:ca:22:d6:1f:1f:e3:70:b0:35:45:8d:04:3c:3c
Signature Algorithm: sha256WithRSAEncryption
Issuer: OU=bootkube, CN=kube-ca
Validity
Not Before: Dec 5 07:30:00 2018 GMT
Not After : Jan 4 07:30:00 2019 GMT
Subject: O=system:nodes, CN=system:node:test1-master-0
From what you posted in https://github.com/openshift/installer/issues/167#issuecomment-444426953
WARNING: 2018/12/05 09:41:32 Failed to dial 0.0.0.0:2379:
etcd is what is listening on :2379, so I don't believe this is the same issue as the original. Might be better to open a new issue to discuss the separate problem you're having. On a side note - I'm unsure why it would be dialing 0.0.0.0 -- fine to listen on all interfaces, but that seems wrong / maybe etcd DNS is configured improperly?
etcd is what is listening on :2379, so I don't believe this is the same issue as the original. Might be better to open a new issue to discuss the separate problem you're having.
Already moved to coreos/kubecsr#22 ;).
cert rotation and lifetimes are not something the installer will be addressing. Please work with the master team (preferably in BZ) for further discussion if you are having problems.
In the local dev case, one may only have provisioned a single master. If one restart the master, then on restart, the kubelet will fail like so if the certificate expired:
@aaronlevy says: