platform9 / cctl

Apache License 2.0
47 stars 8 forks source link

After etcd recovery, kube-apiserver reports TLS errors until it is restarted #98

Closed dlipovetsky closed 6 years ago

dlipovetsky commented 6 years ago

kube-apiserver reports healthy if it can reach the etcd cluster, but the etcd cluster is unhealthy (https://github.com/kubernetes/kubernetes/pull/49412). When etcd quorum is lost, some peers may continue to listen, and therefore some kube-apiservers are not restarted.

When we recover the etcd cluster, etcd peers are unavailable very briefly. If kubelet happens to be checking kube-apiserver's health during this very brief period, it will restart kube-apiserver. Most of the time, this won't be the case.

When kube-apiserver is not restarted after the etcd cluster is recovered, it reports TLS errors.

# kubectl get componentstatuses
NAME                 STATUS      MESSAGE                                                                 ERROR
etcd-0               Unhealthy   Get https://127.0.0.1:2379/health: remote error: tls: bad certificate
controller-manager   Healthy     ok
scheduler            Healthy     ok

Also, these appear in the kube-apiserver log:

E0814 00:12:06.436224       1 storage_factory.go:305] failed to load key pair while getting backends: open /etc/etcd/pki/apiserver-etcd-client.crt: no such file or directory
E0814 00:12:06.436251       1 storage_factory.go:312] failed to read ca file while getting backends: open /etc/etcd/pki/ca.crt: no such file or directory