kubernetes-retired / kube-aws

[EOL] A command-line tool to declaratively manage Kubernetes clusters on AWS
Apache License 2.0
1.12k stars 295 forks source link

master: controller-manager is failing due to timing out while accessing /apis/metrics.k8s.io/v1beta1 after node rolling-update #1039

Closed mumoshu closed 6 years ago

mumoshu commented 6 years ago

Probably after kube-aws render and then kube-aws update, my kube-aws cluster has fallen into the UPDATE_ROLLBACK_FAILED state.

Logging-in and then surveying various points, I've verified:

{"log":"E1127 08:25:48.551113       1 controllermanager.go:399] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.126:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.126:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-11-27T08:25:48.551355721Z"}

Also, docker logs $apiserver_container_id shows:

I1127 06:54:45.503817       1 autoregister_controller.go:136] Starting autoregister controller
I1127 06:54:45.503867       1 cache.go:32] Waiting for caches to sync for autoregister controller
I1127 06:54:45.604058       1 cache.go:39] Caches are synced for autoregister controller
I1127 06:54:47.975710       1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1127 06:55:17.976203       1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.126:443: i/o timeout'
Trying to reach: 'https://10.3.0.126:443/swagger.json', Header: map[]
I1127 06:55:17.976222       1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1127 06:56:17.976460       1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1127 06:56:47.977085       1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.126:443: i/o timeout'
Trying to reach: 'https://10.3.0.126:443/swagger.json', Header: map[]
I1127 06:56:47.977106       1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1127 06:58:47.977330       1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io

cc @camilb

mumoshu commented 6 years ago

Recap:

Context

Failure Scenario

When the final and single controller node in your cluster was recreated,

  1. apiserver on the recreated node becomes super slow
  2. controller-manager falls into a crash loopback due to apiserver unresponsiveness
  3. a daemonset-backed kube-proxy pod is never created due to controller-manager failure.
  4. As there's no kube-proxy to do so, no iptables rules hence no connection from apiserver to metricserver are created.
  5. apiserver keeps being super slow due to missing connection to metrics-server
  6. Loop

Note that sooner or later it results in cfn-signal to timeout due to one of the following reasons:

Solution

a.k.a work-around

When the apiservice exists but kube-proxy, temporary delete the apiservice(not the metrics-server service) so that apiserver stops trying to communicate with metrics-server, result in it to become responsive again, making controller-manager happy enough to finally create a daemonset-backed kube-proxy.

Also, in install-kube-system.service add --request-timeout=1s to every kubectll call to speedy apply this work-around on the slowed apiserver.

mumoshu commented 6 years ago

I did it. With the solution above, the main part of install-kube-system script now takes only 1 minute even when the failure condition was there.

      # See https://github.com/kubernetes-incubator/kube-aws/issues/1039#issuecomment-348978375
      if ks get apiservice v1beta1.metrics.k8s.io && ! ps ax | grep '[h]yperkube proxy'; then
        echo "apiserver is up but kube-proxy isn't up. We have likely encountered #1039."
        echo "Temporary deleting the v1beta1.metrics.k8s.io apiservice as a work-around for #1039"
        ks delete apiservice v1beta1.metrics.k8s.io

        echo Waiting until controller-manager stabilizes and it creates a kube-proxy pod.
        until ps ax | grep '[h]yperkube proxy'; do
            echo Sleeping 3 seconds.
            sleep 3
        done
        echo kube-proxy stared. apiserver should be responsive again.
      fi
core@ip-10-0-0-156 ~ $ journalctl -u install-kube-system | cat
-- Logs begin at Mon 2017-12-04 16:22:42 UTC, end at Mon 2017-12-04 16:45:19 UTC. --
Dec 04 16:23:09 ip-10-0-0-156.ap-northeast-1.compute.internal systemd[1]: Starting install-kube-system.service...
Dec 04 16:23:09 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: activating
Dec 04 16:23:09 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: waiting until kubelet starts
Dec 04 16:23:19 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: activating
Dec 04 16:23:19 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: waiting until kubelet starts
Dec 04 16:23:29 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: activating
Dec 04 16:23:29 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: waiting until kubelet starts
Dec 04 16:23:39 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1102]: active
Dec 04 16:23:39 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1697]: active
Dec 04 16:23:39 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:23:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:23:59 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:24:09 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:24:19 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:24:29 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:24:39 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: waiting until apiserver starts
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: {
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "major": "1",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "minor": "8+",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "gitVersion": "v1.8.4+coreos.0",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "gitCommit": "4292f9682595afddbb4f8b1483673449c74f9619",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "gitTreeState": "clean",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "buildDate": "2017-11-21T17:22:25Z",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "goVersion": "go1.8.3",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "compiler": "gc",
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]:   "platform": "linux/amd64"
Dec 04 16:24:49 ip-10-0-0-156.ap-northeast-1.compute.internal bash[1701]: }
Dec 04 16:24:56 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: NAME          STATUS    AGE
Dec 04 16:24:56 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: kube-system   Active    11m
Dec 04 16:25:04 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: NAME                     AGE
Dec 04 16:25:04 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: v1beta1.metrics.k8s.io   11m
Dec 04 16:25:05 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: apiserver is up but kube-proxy isn't up. We have likely encountered #1039.
Dec 04 16:25:05 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Temporary deleting the v1beta1.metrics.k8s.io apiservice as a work-around for #1039
Dec 04 16:25:12 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: apiservice "v1beta1.metrics.k8s.io" deleted
Dec 04 16:25:12 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Waiting until controller-manager stabilizes and it creates a kube-proxy pod.
Dec 04 16:25:12 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:15 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:18 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:21 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:24 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:27 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:30 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:33 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Sleeping 3 seconds.
Dec 04 16:25:36 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]:  2429 ?        Ssl    0:00 /hyperkube proxy --config=/etc/kubernetes/kube-proxy/kube-proxy-config.yaml
Dec 04 16:25:36 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: kube-proxy stared. apiserver should be responsive again.
Dec 04 16:25:36 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: configmap "calico-config" unchanged
Dec 04 16:25:36 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: secret "calico-etcd-secrets" unchanged
Dec 04 16:25:36 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: daemonset "calico-node" configured
Dec 04 16:25:36 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "calico-kube-controllers" unchanged
Dec 04 16:25:37 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
Dec 04 16:25:37 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: secret "kubernetes-dashboard-certs" configured
Dec 04 16:25:37 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: configmap "kube-dns" unchanged
Dec 04 16:25:37 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: configmap "kube-proxy-config" unchanged
Dec 04 16:25:38 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: serviceaccount "kube-dns" unchanged
Dec 04 16:25:38 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: serviceaccount "heapster" unchanged
Dec 04 16:25:38 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: serviceaccount "kube-proxy" unchanged
Dec 04 16:25:38 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: serviceaccount "kubernetes-dashboard" unchanged
Dec 04 16:25:38 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: serviceaccount "metrics-server" configured
Dec 04 16:25:38 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "tiller-deploy" configured
Dec 04 16:25:39 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: service "tiller-deploy" configured
Dec 04 16:25:39 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: daemonset "dnsmasq-node" unchanged
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "kube-dns" configured
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "kube-dns-autoscaler" unchanged
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "kubernetes-dashboard" unchanged
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "cluster-autoscaler" unchanged
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "heapster" configured
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "metrics-server" unchanged
Dec 04 16:25:40 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: daemonset "kube-proxy" unchanged
Dec 04 16:25:41 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: service "kube-dns" unchanged
Dec 04 16:25:41 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: service "heapster" unchanged
Dec 04 16:25:41 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: service "kubernetes-dashboard" unchanged
Dec 04 16:25:41 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: service "metrics-server" unchanged
Dec 04 16:25:41 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: deployment "kube-rescheduler" unchanged
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrole "kube-aws:node-extensions" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrole "system:metrics-server" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:admin" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:system-worker" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:node" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:node-proxier" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:node-extensions" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "heapster" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "metrics-server:system:auth-delegator" configured
Dec 04 16:25:42 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "system:metrics-server" configured
Dec 04 16:25:43 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kubernetes-dashboard" configured
Dec 04 16:25:44 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: role "system:pod-nanny" unchanged
Dec 04 16:25:44 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: role "kubernetes-dashboard-minimal" unchanged
Dec 04 16:25:44 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: rolebinding "heapster-nanny" unchanged
Dec 04 16:25:44 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: rolebinding "kubernetes-dashboard-minimal" unchanged
Dec 04 16:25:44 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: rolebinding "metrics-server-auth-reader" unchanged
Dec 04 16:25:45 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrole "kube-aws:node-bootstrapper" configured
Dec 04 16:25:45 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrole "kube-aws:kubelet-certificate-bootstrap" configured
Dec 04 16:25:45 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:node-bootstrapper" configured
Dec 04 16:25:45 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube-aws:kubelet-certificate-bootstrap" configured
Dec 04 16:25:46 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: serviceaccount "kube2iam" unchanged
Dec 04 16:25:46 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrole "kube2iam" configured
Dec 04 16:25:46 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: clusterrolebinding "kube2iam" configured
Dec 04 16:25:47 ip-10-0-0-156.ap-northeast-1.compute.internal retry[2069]: daemonset "kube2iam" unchanged
Dec 04 16:25:47 ip-10-0-0-156.ap-northeast-1.compute.internal systemd[1]: Started install-kube-system.service.

I wish this won't be a long-term solution 😉

Update: Verified that the new work-around isn't triggered when in normal condition like there is already a running controller node other than the one running install-kube-system, as expected.

tsuna commented 6 years ago

Should the apiservice v1beta1.metrics.k8s.io somehow get recreated after applying the workaround?

sohel2020 commented 5 years ago

@mumoshu you have deleted the apiservice v1beta1.metrics.k8s.io but how did you recreated after cluster bootstrap?

mumoshu commented 5 years ago

@tsuna @sohel2020 Nope. The commands in the {{ if .Addons.MetricsServer.Enabled -}} block in install-kube-system of cloud-config-controller should recreate it, along with serviceaccount, daemonset, svc, and so on.