kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.96k stars 4.65k forks source link

Are kops clusters subject to Kubernetes known issue: etcd client balancer with secure endpoints #7816

Closed Nuru closed 4 years ago

Nuru commented 5 years ago

Kubernetes documentation lists a "known issue": etcd client balancer with secure endpoints

The etcd v3 client, released in etcd v3.3.13 or earlier, has a critical bug which affects the kube-apiserver and HA deployments. The etcd client balancer failover does not properly work against secure endpoints. As a result, etcd servers may fail or disconnect briefly from the kube-apiserver. This affects kube-apiserver HA deployments.

This issue is reported at https://github.com/kubernetes/kubernetes/issues/83028 and https://github.com/kubernetes/kubernetes/issues/72102 and a workaround (using wildcard SANs in etcd TLS certificates) is mentioned at https://github.com/kubernetes/kubernetes/issues/72102#issuecomment-542808932. The comments suggest Kubernetes will not have a fix for this until Kubernetes v1.16.3, and that all versions of Kubernetes using etcd3 with TLS certificates (which is kops 1.12 and later, right?) are affected.

Are clusters deployed by kops affected by this issue, or has kops installed a workaround or later etcd3 version?

Please document what is and is not affected and any suggested mitigations.

rainchei commented 4 years ago

Using kops Version 1.17.0-alpha.1 (git-501baf7e5) and below manifest (remember to export KOPS_FEATURE_FLAGS=SkipEtcdVersionCheck)

  etcdClusters:
  - name: main
    version: 3.4.3
    etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
  - name: events
    version: 3.4.3
    etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c

we seems to be able to upgrade to etcd 3.4.3

./etcd-manager-ctl -backup-store=<xxx-yyy> get
Backup Store: <xxx-yyy>
member_count:3 etcd_version:"3.4.3"
fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kops/issues/7816#issuecomment-630389950): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.