kube-apiserver reports healthy if it can reach the etcd cluster, but the etcd cluster is unhealthy (https://github.com/kubernetes/kubernetes/pull/49412). When etcd quorum is lost, some peers may continue to listen, and therefore some kube-apiservers are not restarted.
When we recover the etcd cluster, etcd peers are unavailable very briefly. If kubelet happens to be checking kube-apiserver's health during this very brief period, it will restart kube-apiserver. Most of the time, this won't be the case.
When kube-apiserver is not restarted after the etcd cluster is recovered, it reports TLS errors.
# kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
etcd-0 Unhealthy Get https://127.0.0.1:2379/health: remote error: tls: bad certificate
controller-manager Healthy ok
scheduler Healthy ok
Also, these appear in the kube-apiserver log:
E0814 00:12:06.436224 1 storage_factory.go:305] failed to load key pair while getting backends: open /etc/etcd/pki/apiserver-etcd-client.crt: no such file or directory
E0814 00:12:06.436251 1 storage_factory.go:312] failed to read ca file while getting backends: open /etc/etcd/pki/ca.crt: no such file or directory
kube-apiserver reports healthy if it can reach the etcd cluster, but the etcd cluster is unhealthy (https://github.com/kubernetes/kubernetes/pull/49412). When etcd quorum is lost, some peers may continue to listen, and therefore some kube-apiservers are not restarted.
When we recover the etcd cluster, etcd peers are unavailable very briefly. If kubelet happens to be checking kube-apiserver's health during this very brief period, it will restart kube-apiserver. Most of the time, this won't be the case.
When kube-apiserver is not restarted after the etcd cluster is recovered, it reports TLS errors.
Also, these appear in the kube-apiserver log: