Closed adityadani closed 4 years ago
"ETCD has fixed this problem in the recent releases, but that involves changing grpc dependencies. Hence this is a stop gap solution." What's the plan to update to that
We need to update the vendor'ed packages for etcd/clientv3 and grpc
ETCD client has a bug in its failover/connection code where it fails to connect a secure etcd cluster if the first endpoint is down.
The change is to shuffle the list of endpoints and try connecting again. This is done regardless of what error is returned when connecting for the first time.
Not doing this for non-secure etcd, as this is already handled by the etcd client.
ETCD has fixed this problem in the recent releases, but that involves changing grpc dependencies. Hence this is a stop gap solution.
More details here: https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#known-issue-etcd-client-balancer-with-secure-endpoints
Known issue: All the watches exit if the first endpoint goes down.