Closed mumoshu closed 6 years ago
As my serviceCIDR is set to 10.3.0.0/24
, 10.3.0.126
seems like a clusterIP for the metrics-server service.
Perhaps this is a race while rolling the controller node in a single-controller cluster?
More concretely, would it be something like "apiserver has been registered a metrics-server api to serve metrics via k8s api but it isn't accessible when the single apiserver starts up"?
Then, should we transform metrics-server deployment to static pods?
I tried to delete the APIService on controller node start-up in order to stabilize controller-manager and apiserver. No luck so far.
core@ip-10-0-0-76 ~ $ docker run --rm --net=host -v /srv/kubernetes:/srv/kubernetes quay.io/coreos/hyperkube:v1.8.4_coreos.0 /hyperkube kubectl version
Client Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4+coreos.0", GitCommit:"4292f9682595afddbb4f8b1483673449c74f9619", GitTreeState:"clean", BuildDate:"2017-11-21T17:22:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4+coreos.0", GitCommit:"4292f9682595afddbb4f8b1483673449c74f9619", GitTreeState:"clean", BuildDate:"2017-11-21T17:22:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
core@ip-10-0-0-76 ~ $ docker run --rm --net=host -v /srv/kubernetes:/srv/kubernetes quay.io/coreos/hyperkube:v1.8.4_coreos.0 /hyperkube kubectl --namespace kube-system delete apiservice v1beta1.metrics.k8s.io
# hangs forever
My metrics-server pod managed by a deployment was fallen into crash-loopback with logs like:
$ k logs metrics-server-6bd7ddbc8-wxxgm
I1130 04:59:51.783586 1 heapster.go:71] /metrics-server --source=kubernetes.summary_api:'' --requestheader-client-ca-file=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --requestheader-username-headers=X-Remote-User --requestheader-group-headers=X-Remote-Group --requestheader-extra-headers-prefix=X-Remote-Extra
I1130 04:59:51.783688 1 heapster.go:72] Metrics Server version v0.2.0
I1130 04:59:51.783882 1 configs.go:61] Using Kubernetes client with master "https://10.3.0.1:443" and version
I1130 04:59:51.783899 1 configs.go:62] Using kubelet port 10255
I1130 04:59:51.784661 1 heapster.go:128] Starting with Metric Sink
I1130 04:59:53.982371 1 serving.go:308] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
E1130 05:00:21.785221 1 reflector.go:205] github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.3.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E1130 05:00:21.881712 1 reflector.go:205] github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.3.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E1130 05:00:21.881726 1 reflector.go:205] github.com/kubernetes-incubator/metrics-server/metrics/processors/namespace_based_enricher.go:85: Failed to list *v1.Namespace: Get https://10.3.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E1130 05:00:21.881911 1 reflector.go:205] github.com/kubernetes-incubator/metrics-server/metrics/util/util.go:52: Failed to list *v1.Node: Get https://10.3.0.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
E1130 05:00:21.881941 1 reflector.go:205] github.com/kubernetes-incubator/metrics-server/metrics/heapster.go:254: Failed to list *v1.Pod: Get https://10.3.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.3.0.1:443: i/o timeout
W1130 05:00:24.986736 1 authentication.go:222] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
F1130 05:00:24.986765 1 heapster.go:97] Could not create the API server: Get https://10.3.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.3.0.1:443: i/o timeout
Regarding https://github.com/kubernetes-incubator/kube-aws/issues/1039#issuecomment-348089135, perhaps it was due to invalidated service account key due to key rotation? (I did run kube-aws render
not kube-aws render stack
before running kube-aws update
)
With the deployment without the work-around to delete the apiregistration immediately after apiserver starts:
$ sudo tail /var/log/pods/**/*
==> /var/log/pods/7ad61f72c69c8442f08161a7d2690fde/kube-scheduler_0.log <==
{"log":"E1130 12:45:27.088401 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.PersistentVolumeClaim: Get http://127.0.0.1:8080/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused\n","stream":"stderr","time":"2017-11-30T12:45:27.088655478Z"}
{"log":"E1130 12:45:27.089693 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get http://127.0.0.1:8080/api/v1/nodes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused\n","stream":"stderr","time":"2017-11-30T12:45:27.089841393Z"}
{"log":"E1130 12:45:27.090923 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.ReplicationController: Get http://127.0.0.1:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused\n","stream":"stderr","time":"2017-11-30T12:45:27.091123997Z"}
{"log":"E1130 12:45:27.092054 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Service: Get http://127.0.0.1:8080/api/v1/services?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused\n","stream":"stderr","time":"2017-11-30T12:45:27.092286207Z"}
{"log":"E1130 12:45:27.093252 1 reflector.go:205] k8s.io/kubernetes/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.PersistentVolume: Get http://127.0.0.1:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 127.0.0.1:8080: getsockopt: connection refused\n","stream":"stderr","time":"2017-11-30T12:45:27.093343297Z"}
{"log":"I1130 12:45:28.781514 1 controller_utils.go:1041] Waiting for caches to sync for scheduler controller\n","stream":"stderr","time":"2017-11-30T12:45:28.781676135Z"}
{"log":"I1130 12:45:28.881743 1 controller_utils.go:1048] Caches are synced for scheduler controller\n","stream":"stderr","time":"2017-11-30T12:45:28.881958109Z"}
{"log":"I1130 12:45:28.881892 1 leaderelection.go:174] attempting to acquire leader lease...\n","stream":"stderr","time":"2017-11-30T12:45:28.882132954Z"}
{"log":"I1130 12:45:47.579012 1 leaderelection.go:184] successfully acquired lease kube-system/kube-scheduler\n","stream":"stderr","time":"2017-11-30T12:45:47.57961902Z"}
{"log":"I1130 12:45:47.579251 1 event.go:218] Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"kube-system\", Name:\"kube-scheduler\", UID:\"46c48c82-d5b3-11e7-a082-0697f2d3e8d4\", APIVersion:\"v1\", ResourceVersion:\"20048\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' ip-10-0-0-108.ap-northeast-1.compute.internal became leader\n","stream":"stderr","time":"2017-11-30T12:45:47.579656102Z"}
==> /var/log/pods/7b91ea0c9eee6faae2241181ce72781b/kube-controller-manager_5.log <==
{"log":"I1130 13:08:09.821817 1 core.go:131] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.\n","stream":"stderr","time":"2017-11-30T13:08:09.821860191Z"}
{"log":"W1130 13:08:09.821823 1 controllermanager.go:484] Skipping \"route\"\n","stream":"stderr","time":"2017-11-30T13:08:09.821863196Z"}
{"log":"I1130 13:08:09.821973 1 controllermanager.go:487] Started \"podgc\"\n","stream":"stderr","time":"2017-11-30T13:08:09.82208669Z"}
{"log":"I1130 13:08:09.822088 1 service_controller.go:185] Starting service controller\n","stream":"stderr","time":"2017-11-30T13:08:09.822332379Z"}
{"log":"I1130 13:08:09.823326 1 controller_utils.go:1041] Waiting for caches to sync for service controller\n","stream":"stderr","time":"2017-11-30T13:08:09.823435203Z"}
{"log":"I1130 13:08:09.823224 1 gc_controller.go:76] Starting GC controller\n","stream":"stderr","time":"2017-11-30T13:08:09.823449338Z"}
{"log":"I1130 13:08:09.823496 1 controller_utils.go:1041] Waiting for caches to sync for GC controller\n","stream":"stderr","time":"2017-11-30T13:08:09.823611331Z"}
{"log":"E1130 13:08:39.835121 1 memcache.go:159] couldn't get resource list for metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.113:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.113:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-11-30T13:08:39.835424649Z"}
{"log":"E1130 13:09:39.856478 1 controllermanager.go:480] Error starting \"garbagecollector\"\n","stream":"stderr","time":"2017-11-30T13:09:39.856746669Z"}
{"log":"F1130 13:09:39.856537 1 controllermanager.go:156] error starting controllers: failed to get supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.113:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.113:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-11-30T13:09:39.856777452Z"}
==> /var/log/pods/7b91ea0c9eee6faae2241181ce72781b/kube-controller-manager_6.log <==
{"log":"I1130 13:12:33.456263 1 controllermanager.go:109] Version: v1.8.4+coreos.0\n","stream":"stderr","time":"2017-11-30T13:12:33.456639176Z"}
{"log":"I1130 13:12:33.456847 1 leaderelection.go:174] attempting to acquire leader lease...\n","stream":"stderr","time":"2017-11-30T13:12:33.456966746Z"}
{"log":"I1130 13:12:33.460627 1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager\n","stream":"stderr","time":"2017-11-30T13:12:33.460711681Z"}
{"log":"I1130 13:12:33.460886 1 event.go:218] Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"kube-system\", Name:\"kube-controller-manager\", UID:\"46d8a677-d5b3-11e7-a082-0697f2d3e8d4\", APIVersion:\"v1\", ResourceVersion:\"21858\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' ip-10-0-0-108.ap-northeast-1.compute.internal became leader\n","stream":"stderr","time":"2017-11-30T13:12:33.461052374Z"}
==> /var/log/pods/bb73de1b14eaae40f3c9188e900d72c2/kube-apiserver_0.log <==
{"log":"Trying to reach: 'https://10.3.0.113:443/swagger.json', Header: map[]\n","stream":"stderr","time":"2017-11-30T13:10:00.583831802Z"}
{"log":"I1130 13:10:00.583640 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-11-30T13:10:00.583837017Z"}
{"log":"I1130 13:10:30.260172 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n","stream":"stderr","time":"2017-11-30T13:10:30.26047005Z"}
{"log":"E1130 13:11:00.260640 1 controller.go:111] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.113:443: i/o timeout'\n","stream":"stderr","time":"2017-11-30T13:11:00.260926362Z"}
{"log":"Trying to reach: 'https://10.3.0.113:443/swagger.json', Header: map[]\n","stream":"stderr","time":"2017-11-30T13:11:00.26095539Z"}
{"log":"I1130 13:11:00.260661 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-11-30T13:11:00.26095947Z"}
{"log":"I1130 13:12:00.260855 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n","stream":"stderr","time":"2017-11-30T13:12:00.261151784Z"}
{"log":"E1130 13:12:30.261302 1 controller.go:111] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.113:443: i/o timeout'\n","stream":"stderr","time":"2017-11-30T13:12:30.261472146Z"}
{"log":"Trying to reach: 'https://10.3.0.113:443/swagger.json', Header: map[]\n","stream":"stderr","time":"2017-11-30T13:12:30.261503674Z"}
{"log":"I1130 13:12:30.261322 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-11-30T13:12:30.261508149Z"}
$ sudo cat /var/log/pods/**/*controller-manager*
{"log":"I1201 07:00:25.798546 1 controllermanager.go:109] Version: v1.8.4+coreos.0\n","stream":"stderr","time":"2017-12-01T07:00:25.798883266Z"}
{"log":"I1201 07:00:25.799121 1 leaderelection.go:174] attempting to acquire leader lease...\n","stream":"stderr","time":"2017-12-01T07:00:25.799357431Z"}
{"log":"I1201 07:00:25.802797 1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager\n","stream":"stderr","time":"2017-12-01T07:00:25.802916246Z"}
{"log":"I1201 07:00:25.803080 1 event.go:218] Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"kube-system\", Name:\"kube-controller-manager\", UID:\"5a09e1a5-d5d9-11e7-af3a-06fc921ec494\", APIVersion:\"v1\", ResourceVersion:\"59320\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' ip-10-0-0-42.ap-northeast-1.compute.internal became leader\n","stream":"stderr","time":"2017-12-01T07:00:25.803257094Z"}
{"log":"E1201 07:01:25.826472 1 controllermanager.go:399] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.73:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.73:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-12-01T07:01:25.826779773Z"}
{"log":"I1201 07:01:25.826667 1 aws.go:847] Building AWS cloudprovider\n","stream":"stderr","time":"2017-12-01T07:01:25.82681359Z"}
{"log":"I1201 07:01:25.826702 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service\n","stream":"stderr","time":"2017-12-01T07:01:25.826822382Z"}
{"log":"I1201 07:01:26.093630 1 tags.go:76] AWS cloud filtering on ClusterID: k8s3\n","stream":"stderr","time":"2017-12-01T07:01:26.093920749Z"}
{"log":"I1201 07:01:26.094843 1 controller_utils.go:1041] Waiting for caches to sync for tokens controller\n","stream":"stderr","time":"2017-12-01T07:01:26.095027903Z"}
{"log":"I1201 07:01:26.095038 1 controllermanager.go:487] Started \"job\"\n","stream":"stderr","time":"2017-12-01T07:01:26.095247296Z"}
{"log":"I1201 07:01:26.095305 1 job_controller.go:138] Starting job controller\n","stream":"stderr","time":"2017-12-01T07:01:26.095476944Z"}
{"log":"I1201 07:01:26.095354 1 controller_utils.go:1041] Waiting for caches to sync for job controller\n","stream":"stderr","time":"2017-12-01T07:01:26.095491202Z"}
{"log":"I1201 07:01:26.095474 1 controllermanager.go:487] Started \"daemonset\"\n","stream":"stderr","time":"2017-12-01T07:01:26.095759822Z"}
{"log":"I1201 07:01:26.095795 1 daemon_controller.go:230] Starting daemon sets controller\n","stream":"stderr","time":"2017-12-01T07:01:26.095929508Z"}
{"log":"I1201 07:01:26.095856 1 controller_utils.go:1041] Waiting for caches to sync for daemon sets controller\n","stream":"stderr","time":"2017-12-01T07:01:26.095942708Z"}
{"log":"I1201 07:01:26.095929 1 controllermanager.go:487] Started \"replicaset\"\n","stream":"stderr","time":"2017-12-01T07:01:26.096089105Z"}
{"log":"I1201 07:01:26.096126 1 replica_set.go:156] Starting replica set controller\n","stream":"stderr","time":"2017-12-01T07:01:26.09631264Z"}
{"log":"I1201 07:01:26.096174 1 controller_utils.go:1041] Waiting for caches to sync for replica set controller\n","stream":"stderr","time":"2017-12-01T07:01:26.096326405Z"}
{"log":"I1201 07:01:26.096634 1 controllermanager.go:487] Started \"horizontalpodautoscaling\"\n","stream":"stderr","time":"2017-12-01T07:01:26.096810315Z"}
{"log":"I1201 07:01:26.096858 1 horizontal.go:145] Starting HPA controller\n","stream":"stderr","time":"2017-12-01T07:01:26.096972252Z"}
{"log":"I1201 07:01:26.096910 1 controller_utils.go:1041] Waiting for caches to sync for HPA controller\n","stream":"stderr","time":"2017-12-01T07:01:26.096983586Z"}
{"log":"I1201 07:01:26.096980 1 controllermanager.go:487] Started \"disruption\"\n","stream":"stderr","time":"2017-12-01T07:01:26.097138273Z"}
{"log":"I1201 07:01:26.097177 1 disruption.go:288] Starting disruption controller\n","stream":"stderr","time":"2017-12-01T07:01:26.097424685Z"}
{"log":"I1201 07:01:26.097194 1 controller_utils.go:1041] Waiting for caches to sync for disruption controller\n","stream":"stderr","time":"2017-12-01T07:01:26.097456645Z"}
{"log":"I1201 07:01:26.097245 1 controllermanager.go:487] Started \"endpoint\"\n","stream":"stderr","time":"2017-12-01T07:01:26.097465261Z"}
{"log":"I1201 07:01:26.097483 1 endpoints_controller.go:153] Starting endpoint controller\n","stream":"stderr","time":"2017-12-01T07:01:26.097597326Z"}
{"log":"I1201 07:01:26.097515 1 controller_utils.go:1041] Waiting for caches to sync for endpoint controller\n","stream":"stderr","time":"2017-12-01T07:01:26.097611171Z"}
{"log":"W1201 07:01:26.097655 1 shared_informer.go:304] resyncPeriod 74288251410068 is smaller than resyncCheckPeriod 81045341278913 and the informer has already started. Changing it to 81045341278913\n","stream":"stderr","time":"2017-12-01T07:01:26.097817578Z"}
{"log":"I1201 07:01:26.097708 1 controllermanager.go:487] Started \"resourcequota\"\n","stream":"stderr","time":"2017-12-01T07:01:26.097832415Z"}
{"log":"I1201 07:01:26.098007 1 resource_quota_controller.go:238] Starting resource quota controller\n","stream":"stderr","time":"2017-12-01T07:01:26.098170263Z"}
{"log":"I1201 07:01:26.098045 1 controller_utils.go:1041] Waiting for caches to sync for resource quota controller\n","stream":"stderr","time":"2017-12-01T07:01:26.098184999Z"}
{"log":"I1201 07:01:26.195031 1 controller_utils.go:1048] Caches are synced for tokens controller\n","stream":"stderr","time":"2017-12-01T07:01:26.19520102Z"}
{"log":"E1201 07:02:26.120695 1 namespaced_resources_deleter.go:169] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.73:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.73:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-12-01T07:02:26.120966215Z"}
{"log":"I1201 07:02:26.120839 1 controllermanager.go:487] Started \"namespace\"\n","stream":"stderr","time":"2017-12-01T07:02:26.120989532Z"}
{"log":"I1201 07:02:26.121137 1 namespace_controller.go:186] Starting namespace controller\n","stream":"stderr","time":"2017-12-01T07:02:26.121287838Z"}
{"log":"I1201 07:02:26.121156 1 controller_utils.go:1041] Waiting for caches to sync for namespace controller\n","stream":"stderr","time":"2017-12-01T07:02:26.121304541Z"}
{"log":"E1201 07:02:56.131913 1 memcache.go:159] couldn't get resource list for metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.73:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.73:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-12-01T07:02:56.132142772Z"}
{"log":"E1201 07:03:56.154206 1 controllermanager.go:480] Error starting \"garbagecollector\"\n","stream":"stderr","time":"2017-12-01T07:03:56.154447246Z"}
{"log":"F1201 07:03:56.154225 1 controllermanager.go:156] error starting controllers: failed to get supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.73:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.73:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-12-01T07:03:56.154471935Z"}
So, without deleting the metrics-server apiservice in very early stage of the controller node startup process, the controller-manager fails while starting controllers:
{"log":"F1201 07:03:56.154225 1 controllermanager.go:156] error starting controllers: failed to get supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server (\"Error: 'dial tcp 10.3.0.73:443: i/o timeout'\\nTrying to reach: 'https://10.3.0.73:443/apis/metrics.k8s.io/v1beta1'\") has prevented the request from succeeding\n","stream":"stderr","time":"2017-12-01T07:03:56.154471935Z"}
The apiserver gets super slow - more concretely, kubectl version
returns immediately but kubectl get *anything*
seems to hang forever:
core@ip-10-0-0-42 ~ $ kubectl version
Client Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4+coreos.0", GitCommit:"4292f9682595afddbb4f8b1483673449c74f9619", GitTreeState:"clean", BuildDate:"2017-11-21T17:22:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8+", GitVersion:"v1.8.4+coreos.0", GitCommit:"4292f9682595afddbb4f8b1483673449c74f9619", GitTreeState:"clean", BuildDate:"2017-11-21T17:22:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
core@ip-10-0-0-42 ~ $ kubectl get foo
<hangs forever...>
After the controller-manager fails, I can't delete the apiservice anymore - this just hangs forever:
core@ip-10-0-0-42 ~ $ kubectl --namespace kube-system delete apiservice metrics-server
Update:
It just takes very long - like 3 mins or so:
core@ip-10-0-0-42 ~ $ kubectl --namespace kube-system delete apiservice v1beta1.metrics.k8s.io
apiservice "v1beta1.metrics.k8s.io" deleted
And then creating the apiservice again brings back the apiserver and the controller-manager:
core@ip-10-0-0-42 ~ $ kubectl --namespace kube-system create -f /srv/kubernetes/manifests/metrics-server-apisvc.yaml
apiservice "v1beta1.metrics.k8s.io" created
core@ip-10-0-0-42 ~ $ kubectl get po
No resources found.
apiserver still outputs "Rate Limited Requeue" and "OpenAPI spec does not exist" errors but it is responding:
{"log":"I1201 07:17:27.201176 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-12-01T07:17:27.201431426Z"}
{"log":"I1201 07:18:27.201468 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n","stream":"stderr","time":"2017-12-01T07:18:27.201798565Z"}
{"log":"E1201 07:18:57.201947 1 controller.go:111] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.73:443: i/o timeout'\n","stream":"stderr","time":"2017-12-01T07:18:57.202194448Z"}
{"log":"Trying to reach: 'https://10.3.0.73:443/swagger.json', Header: map[]\n","stream":"stderr","time":"2017-12-01T07:18:57.20223123Z"}
{"log":"I1201 07:18:57.201969 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-12-01T07:18:57.202235354Z"}
{"log":"I1201 07:20:57.202214 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n","stream":"stderr","time":"2017-12-01T07:20:57.202515975Z"}
{"log":"I1201 07:20:57.202356 1 controller.go:122] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Nothing (removed from the queue).\n","stream":"stderr","time":"2017-12-01T07:20:57.202545453Z"}
{"log":"I1201 07:26:11.701571 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n","stream":"stderr","time":"2017-12-01T07:26:11.701880828Z"}
{"log":"E1201 07:26:11.792513 1 controller.go:111] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: OpenAPI spec does not exists\n","stream":"stderr","time":"2017-12-01T07:26:11.792804725Z"}
{"log":"I1201 07:26:11.792553 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-12-01T07:26:11.792829301Z"}
{"log":"I1201 07:26:56.901893 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n","stream":"stderr","time":"2017-12-01T07:26:56.90233554Z"}
{"log":"E1201 07:26:57.253496 1 controller.go:111] loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: OpenAPI spec does not exists\n","stream":"stderr","time":"2017-12-01T07:26:57.253861782Z"}
{"log":"I1201 07:26:57.253558 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.\n","stream":"stderr","time":"2017-12-01T07:26:57.253883585Z"}
controller-manager seems ok: https://gist.github.com/mumoshu/b77013987871da0ddf457b40d46fe6fa
@camilb Hi! Have you managed to rolling-update controller nodes after we've started deploying metrics-server? If so, how many controller nodes do you have in your cluster(s)?
Hi @mumoshu didn't upgrade any cluster yet, I'm planing to upgrade 2 staging clusters today, both with only one controller node. On a new cluster everything seems ok. One thing I find strange is your command kubectl --namespace kube-system delete apiservice metrics-server
which look different for me.
@camilb I've mistyped the name of apiservice in my first example :) Please see the updated comment above.
@camilb Thanks for the response!
Are you ok if you get forced to recreate the staging clusters? If not, please use the work-around I've managed in #1043
Update: the "OpenAPI spec does not exist" errors seem to be harmless according to https://github.com/kubernetes-incubator/metrics-server/issues/27#issuecomment-347160401
@mumoshu Have no problem to recreate them. Upgrading from 1.7.8 and 1.8.1. I'll let you know how it goes.
@camilb Thanks for your support 🙏
FYI, I've opened https://github.com/kubernetes-incubator/metrics-server/issues/28. Hopefully, I'm just missing something and there's an easy fix on user-side 😉
@mumoshu I'm working on a fix. I think I may have a chance to fix it on our side.
Quick note: the api server responds fine, the problem is from kubectl
requests to the api server. If you run kubectl -v8 get apiservice
or other resource, you can see that is trying to list the metrics.k8s.io/v1beta1
~3 times, with a default timeout of 60s.
A method to skip this waiting time is to use the --request-timeout
flag.
Ex.
kubectl get pods -n kube-system --request-timeout='1s'
@camilb You are so dope! Really looking forward to see the fix.
Hi @mumoshu I have a fix that works fine at the moment. Successfully upgraded a cluster with it. Please take a look: https://github.com/camilb/kube-aws/commit/a5a2f23bc1256bd3b67a65d895d8e9f4d0376617
I think it can be improved, but works.
@camilb Thx! But I couldn't figure out why it fixes the issue.
Were you able to upgrade a cluster with single controller node with the patch?
My understanding was this and your fix doesn't seem to directly address it, right?
@mumoshu It's actually working. Upgraded the cluster several times by changing the master instance type and it doesn't fail.
$ kubectl logs -f kube-controller-manager-ip-10-35-185-49.ec2.internal -n kube-system
I1201 12:11:20.876849 1 controllermanager.go:109] Version: v1.8.4
I1201 12:11:20.877782 1 leaderelection.go:174] attempting to acquire leader lease...
E1201 12:11:20.878452 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
E1201 12:11:23.234825 1 leaderelection.go:224] error retrieving resource lock kube-system/kube-controller-manager: Get http://127.0.0.1:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager: dial tcp 127.0.0.1:8080: getsockopt: connection refused
I1201 12:15:03.696598 1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager
I1201 12:15:03.696971 1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"4d7b28e1-5742-11e7-80a5-0e64000698fe", APIVersion:"v1", ResourceVersion:"30336737", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-35-185-49.ec2.internal became leader
I1201 12:15:03.796316 1 aws.go:847] Building AWS cloudprovider
I1201 12:15:03.796362 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I1201 12:15:04.018424 1 tags.go:76] AWS cloud filtering on ClusterID: k8sqa
I1201 12:15:04.019711 1 controller_utils.go:1041] Waiting for caches to sync for tokens controller
I1201 12:15:04.021142 1 controllermanager.go:487] Started "statefulset"
I1201 12:15:04.021528 1 stateful_set.go:146] Starting stateful set controller
E1201 12:15:04.021546 1 certificates.go:48] Failed to start certificate controller: error reading CA cert file "/etc/kubernetes/ca/ca.pem": open /etc/kubernetes/ca/ca.pem: no such file or directory
W1201 12:15:04.021578 1 controllermanager.go:484] Skipping "csrsigning"
W1201 12:15:04.021588 1 controllermanager.go:471] "tokencleaner" is disabled
I1201 12:15:04.021686 1 controller_utils.go:1041] Waiting for caches to sync for stateful set controller
I1201 12:15:04.022229 1 controllermanager.go:487] Started "service"
I1201 12:15:04.022373 1 service_controller.go:185] Starting service controller
I1201 12:15:04.022400 1 controller_utils.go:1041] Waiting for caches to sync for service controller
I1201 12:15:04.022688 1 node_controller.go:249] Sending events to api server.
I1201 12:15:04.022816 1 taint_controller.go:158] Sending events to api server.
I1201 12:15:04.022995 1 controllermanager.go:487] Started "node"
I1201 12:15:04.023146 1 node_controller.go:516] Starting node controller
I1201 12:15:04.023182 1 controller_utils.go:1041] Waiting for caches to sync for node controller
I1201 12:15:04.033982 1 controllermanager.go:487] Started "namespace"
I1201 12:15:04.034218 1 namespace_controller.go:186] Starting namespace controller
I1201 12:15:04.034249 1 controller_utils.go:1041] Waiting for caches to sync for namespace controller
I1201 12:15:04.034465 1 controllermanager.go:487] Started "serviceaccount"
I1201 12:15:04.034623 1 serviceaccounts_controller.go:113] Starting service account controller
I1201 12:15:04.034659 1 controller_utils.go:1041] Waiting for caches to sync for service account controller
I1201 12:15:04.040162 1 controllermanager.go:487] Started "disruption"
I1201 12:15:04.041289 1 disruption.go:288] Starting disruption controller
I1201 12:15:04.041302 1 controller_utils.go:1041] Waiting for caches to sync for disruption controller
I1201 12:15:04.041934 1 controllermanager.go:487] Started "persistentvolume-binder"
I1201 12:15:04.042074 1 pv_controller_base.go:259] Starting persistent volume controller
I1201 12:15:04.042107 1 controller_utils.go:1041] Waiting for caches to sync for persistent volume controller
I1201 12:15:04.042263 1 ttl_controller.go:116] Starting TTL controller
I1201 12:15:04.042253 1 controllermanager.go:487] Started "ttl"
W1201 12:15:04.042319 1 controllermanager.go:471] "bootstrapsigner" is disabled
W1201 12:15:04.042413 1 core.go:128] Unsuccessful parsing of cluster CIDR : invalid CIDR address:
I1201 12:15:04.042438 1 core.go:131] Will not configure cloud provider routes for allocate-node-cidrs: false, configure-cloud-routes: true.
W1201 12:15:04.042618 1 controllermanager.go:484] Skipping "route"
I1201 12:15:04.042303 1 controller_utils.go:1041] Waiting for caches to sync for TTL controller
W1201 12:15:04.042830 1 controllermanager.go:484] Skipping "persistentvolume-expander"
I1201 12:15:04.043320 1 controllermanager.go:487] Started "job"
I1201 12:15:04.043438 1 job_controller.go:138] Starting job controller
I1201 12:15:04.043480 1 controller_utils.go:1041] Waiting for caches to sync for job controller
I1201 12:15:04.043932 1 controllermanager.go:487] Started "deployment"
I1201 12:15:04.044068 1 deployment_controller.go:151] Starting deployment controller
I1201 12:15:04.044102 1 controller_utils.go:1041] Waiting for caches to sync for deployment controller
I1201 12:15:04.044724 1 controllermanager.go:487] Started "horizontalpodautoscaling"
I1201 12:15:04.044852 1 horizontal.go:145] Starting HPA controller
I1201 12:15:04.044886 1 controller_utils.go:1041] Waiting for caches to sync for HPA controller
I1201 12:15:04.045239 1 controllermanager.go:487] Started "replicationcontroller"
I1201 12:15:04.045373 1 replication_controller.go:151] Starting RC controller
I1201 12:15:04.045407 1 controller_utils.go:1041] Waiting for caches to sync for RC controller
I1201 12:15:04.045780 1 controllermanager.go:487] Started "daemonset"
I1201 12:15:04.045907 1 daemon_controller.go:230] Starting daemon sets controller
I1201 12:15:04.045941 1 controller_utils.go:1041] Waiting for caches to sync for daemon sets controller
I1201 12:15:04.046265 1 controllermanager.go:487] Started "csrapproving"
I1201 12:15:04.046496 1 certificate_controller.go:109] Starting certificate controller
I1201 12:15:04.046533 1 controller_utils.go:1041] Waiting for caches to sync for certificate controller
I1201 12:15:04.119883 1 controller_utils.go:1048] Caches are synced for tokens controller
I1201 12:15:05.249796 1 controllermanager.go:487] Started "garbagecollector"
I1201 12:15:05.250208 1 garbagecollector.go:136] Starting garbage collector controller
I1201 12:15:05.250233 1 controller_utils.go:1041] Waiting for caches to sync for garbage collector controller
I1201 12:15:05.250261 1 graph_builder.go:321] GraphBuilder running
I1201 12:15:05.250854 1 controllermanager.go:487] Started "replicaset"
I1201 12:15:05.250953 1 replica_set.go:156] Starting replica set controller
I1201 12:15:05.250972 1 controller_utils.go:1041] Waiting for caches to sync for replica set controller
I1201 12:15:05.251220 1 controllermanager.go:487] Started "cronjob"
I1201 12:15:05.251353 1 cronjob_controller.go:98] Starting CronJob Manager
W1201 12:15:05.251647 1 probe.go:215] Flexvolume plugin directory at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ does not exist. Recreating.
I1201 12:15:05.252224 1 controllermanager.go:487] Started "attachdetach"
I1201 12:15:05.252330 1 attach_detach_controller.go:255] Starting attach detach controller
I1201 12:15:05.252350 1 controller_utils.go:1041] Waiting for caches to sync for attach detach controller
I1201 12:15:05.252570 1 controllermanager.go:487] Started "endpoint"
I1201 12:15:05.252669 1 endpoints_controller.go:153] Starting endpoint controller
I1201 12:15:05.252684 1 controller_utils.go:1041] Waiting for caches to sync for endpoint controller
I1201 12:15:05.252792 1 controllermanager.go:487] Started "podgc"
I1201 12:15:05.252885 1 gc_controller.go:76] Starting GC controller
I1201 12:15:05.252900 1 controller_utils.go:1041] Waiting for caches to sync for GC controller
I1201 12:15:05.253325 1 controllermanager.go:487] Started "resourcequota"
I1201 12:15:05.253954 1 resource_quota_controller.go:238] Starting resource quota controller
I1201 12:15:05.253970 1 controller_utils.go:1041] Waiting for caches to sync for resource quota controller
I1201 12:15:05.334391 1 controller_utils.go:1048] Caches are synced for namespace controller
I1201 12:15:05.334985 1 controller_utils.go:1048] Caches are synced for service account controller
E1201 12:15:05.498248 1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-135.ec2.internal" does not exist
E1201 12:15:05.499015 1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-135.ec2.internal" does not exist
E1201 12:15:05.499089 1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-46.ec2.internal" does not exist
E1201 12:15:05.499186 1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-46.ec2.internal" does not exist
E1201 12:15:05.499218 1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-59.ec2.internal" does not exist
E1201 12:15:05.499453 1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-59.ec2.internal" does not exist
E1201 12:15:05.499507 1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-97.ec2.internal" does not exist
E1201 12:15:05.499599 1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-97.ec2.internal" does not exist
E1201 12:15:05.499634 1 actual_state_of_world.go:483] Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-49.ec2.internal" does not exist
E1201 12:15:05.499838 1 actual_state_of_world.go:497] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true because nodeName="ip-10-35-185-49.ec2.internal" does not exist
I1201 12:15:05.521802 1 controller_utils.go:1048] Caches are synced for stateful set controller
I1201 12:15:05.522509 1 controller_utils.go:1048] Caches are synced for service controller
I1201 12:15:05.522593 1 service_controller.go:651] Detected change in list of current cluster nodes. New node set: [ip-10-35-185-46.ec2.internal ip-10-35-185-59.ec2.internal ip-10-35-185-97.ec2.internal]
I1201 12:15:05.522899 1 service_controller.go:659] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
I1201 12:15:05.523633 1 controller_utils.go:1048] Caches are synced for node controller
I1201 12:15:05.523689 1 node_controller.go:563] Initializing eviction metric for zone: us-east-1::us-east-1d
W1201 12:15:05.523758 1 node_controller.go:916] Missing timestamp for Node ip-10-35-185-59.ec2.internal. Assuming now as a timestamp.
W1201 12:15:05.523791 1 node_controller.go:916] Missing timestamp for Node ip-10-35-185-97.ec2.internal. Assuming now as a timestamp.
W1201 12:15:05.523821 1 node_controller.go:916] Missing timestamp for Node ip-10-35-185-49.ec2.internal. Assuming now as a timestamp.
W1201 12:15:05.523845 1 node_controller.go:916] Missing timestamp for Node ip-10-35-185-135.ec2.internal. Assuming now as a timestamp.
W1201 12:15:05.523871 1 node_controller.go:916] Missing timestamp for Node ip-10-35-185-46.ec2.internal. Assuming now as a timestamp.
I1201 12:15:05.523894 1 node_controller.go:832] Controller detected that zone us-east-1::us-east-1d is now in state Normal.
I1201 12:15:05.524143 1 taint_controller.go:181] Starting NoExecuteTaintManager
I1201 12:15:05.524368 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-46.ec2.internal", UID:"f21f8c36-d684-11e7-8040-0eb61a73949e", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node ip-10-35-185-46.ec2.internal event: Registered Node ip-10-35-185-46.ec2.internal in Controller
I1201 12:15:05.524390 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-59.ec2.internal", UID:"0c54d8c5-d685-11e7-8040-0eb61a73949e", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node ip-10-35-185-59.ec2.internal event: Registered Node ip-10-35-185-59.ec2.internal in Controller
I1201 12:15:05.524401 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-97.ec2.internal", UID:"1ad7d913-d685-11e7-8040-0eb61a73949e", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node ip-10-35-185-97.ec2.internal event: Registered Node ip-10-35-185-97.ec2.internal in Controller
I1201 12:15:05.524412 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-49.ec2.internal", UID:"c0f72b76-d690-11e7-86d8-0e2d7d9cf658", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node ip-10-35-185-49.ec2.internal event: Registered Node ip-10-35-185-49.ec2.internal in Controller
I1201 12:15:05.524424 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-135.ec2.internal", UID:"55cd9b1a-d68d-11e7-a55c-0e82348b8662", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node ip-10-35-185-135.ec2.internal event: Registered Node ip-10-35-185-135.ec2.internal in Controller
I1201 12:15:05.541423 1 controller_utils.go:1048] Caches are synced for disruption controller
I1201 12:15:05.541437 1 disruption.go:296] Sending events to api server.
I1201 12:15:05.542378 1 controller_utils.go:1048] Caches are synced for persistent volume controller
I1201 12:15:05.542746 1 controller_utils.go:1048] Caches are synced for TTL controller
I1201 12:15:05.543566 1 controller_utils.go:1048] Caches are synced for job controller
I1201 12:15:05.544247 1 controller_utils.go:1048] Caches are synced for deployment controller
I1201 12:15:05.564981 1 controller_utils.go:1048] Caches are synced for resource quota controller
I1201 12:15:05.569770 1 controller_utils.go:1048] Caches are synced for HPA controller
I1201 12:15:05.569820 1 controller_utils.go:1048] Caches are synced for RC controller
I1201 12:15:05.569864 1 controller_utils.go:1048] Caches are synced for daemon sets controller
I1201 12:15:05.570633 1 controller_utils.go:1048] Caches are synced for certificate controller
I1201 12:15:05.570744 1 controller_utils.go:1048] Caches are synced for garbage collector controller
I1201 12:15:05.570752 1 garbagecollector.go:145] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
I1201 12:15:05.570845 1 controller_utils.go:1048] Caches are synced for replica set controller
I1201 12:15:05.580208 1 controller_utils.go:1048] Caches are synced for attach detach controller
I1201 12:15:05.581638 1 controller_utils.go:1048] Caches are synced for endpoint controller
I1201 12:15:05.581724 1 controller_utils.go:1048] Caches are synced for GC controller
I1201 12:15:45.532264 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-135.ec2.internal", UID:"55cd9b1a-d68d-11e7-a55c-0e82348b8662", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeNotReady' Node ip-10-35-185-135.ec2.internal status is now: NodeNotReady
I1201 12:15:45.847653 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-135.ec2.internal", UID:"55cd9b1a-d68d-11e7-a55c-0e82348b8662", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeletingNode' Node ip-10-35-185-135.ec2.internal event: Deleting Node ip-10-35-185-135.ec2.internal because it's not present according to cloud provider
I1201 12:15:50.847690 1 event.go:218] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-35-185-135.ec2.internal", UID:"55cd9b1a-d68d-11e7-a55c-0e82348b8662", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RemovingNode' Node ip-10-35-185-135.ec2.internal event: Removing Node ip-10-35-185-135.ec2.internal from Controller
I1201 12:16:05.657399 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube-node-drainer-sg-ds-mtq5g
I1201 12:16:05.668573 1 gc_controller.go:166] Forced deletion of orphaned Pod kube-node-drainer-sg-ds-mtq5g succeeded
I1201 12:16:05.668591 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube-node-drainer-ds-kw9bw
I1201 12:16:05.687134 1 gc_controller.go:166] Forced deletion of orphaned Pod kube-node-drainer-ds-kw9bw succeeded
I1201 12:16:05.687192 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube-controller-manager-ip-10-35-185-135.ec2.internal
E1201 12:16:05.701022 1 daemon_controller.go:263] kube-system/kube-node-drainer-ds failed with : error storing status for daemon set &v1beta1.DaemonSet{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-node-drainer-ds", GenerateName:"", Namespace:"kube-system", SelfLink:"/apis/extensions/v1beta1/namespaces/kube-system/daemonsets/kube-node-drainer-ds", UID:"18243135-77a1-11e7-9fbe-0e54effddef8", ResourceVersion:"30336838", Generation:4, CreationTimestamp:v1.Time{Time:time.Time{sec:63637288743, nsec:0, loc:(*time.Location)(0x9e22280)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"k8s-app":"kube-node-drainer-ds"}, Annotations:map[string]string{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"extensions/v1beta1\",\"kind\":\"DaemonSet\",\"metadata\":{\"annotations\":{},\"labels\":{\"k8s-app\":\"kube-node-drainer-ds\"},\"name\":\"kube-node-drainer-ds\",\"namespace\":\"kube-system\"},\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"scheduler.alpha.kubernetes.io/critical-pod\":\"\"},\"labels\":{\"k8s-app\":\"kube-node-drainer-ds\"}},\"spec\":{\"containers\":[{\"command\":[\"/bin/sh\",\"-xec\",\"metadata() { wget -O - -q http://169.254.169.254/2016-09-02/\\\"$1\\\"; }\\nasg() { aws --region=\\\"${REGION}\\\" autoscaling \\\"$@\\\"; }\\n\\n# Hyperkube binary is not statically linked, so we need to use\\n# the musl interpreter to be able to run it in this image\\n# See: https://github.com/kubernetes-incubator/kube-aws/pull/674#discussion_r118889687\\nkubectl() { /lib/ld-musl-x86_64.so.1 /opt/bin/hyperkube kubectl \\\"$@\\\"; }\\n\\nINSTANCE_ID=$(metadata meta-data/instance-id)\\nREGION=$(metadata dynamic/instance-identity/document | jq -r .region)\\n[ -n \\\"${REGION}\\\" ]\\n\\n# Not customizable, for now\\nPOLL_INTERVAL=10\\n\\n# Used to identify the source which requested the instance termination\\ntermination_source=''\\n\\n# Instance termination detection loop\\nwhile sleep ${POLL_INTERVAL}; do\\n\\n # Spot instance termination check\\n http_status=$(curl -o /dev/null -w '%{http_code}' -sL http://169.254.169.254/latest/meta-data/spot/termination-time)\\n if [ \\\"${http_status}\\\" -eq 200 ]; then\\n termination_source=spot\\n break\\n fi\\n\\n # Termination ConfigMap check\\n if [ -e /etc/kube-node-drainer/asg ] \\u0026\\u0026 grep -q \\\"${INSTANCE_ID}\\\" /etc/kube-node-drainer/asg; then\\n termination_source=asg\\n break\\n fi\\ndone\\n\\n# Node draining loop\\nwhile true; do\\n echo Node is terminating, draining it...\\n\\n if ! kubectl drain --ignore-daemonsets=true --delete-local-data=true --force=true --timeout=60s \\\"${NODE_NAME}\\\"; then\\n echo Not all pods on this host can be evicted, will try again\\n continue\\n fi\\n echo All evictable pods are gone\\n\\n if [ \\\"${termination_source}\\\" == asg ]; then\\n echo Notifying AutoScalingGroup that instance ${INSTANCE_ID} can be shutdown\\n ASG_NAME=$(asg describe-auto-scaling-instances --instance-ids \\\"${INSTANCE_ID}\\\" | jq -r '.AutoScalingInstances[].AutoScalingGroupName')\\n HOOK_NAME=$(asg describe-lifecycle-hooks --auto-scaling-group-name \\\"${ASG_NAME}\\\" | jq -r '.LifecycleHooks[].LifecycleHookName' | grep -i nodedrainer)\\n asg complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id \\\"${INSTANCE_ID}\\\" --lifecycle-hook-name \\\"${HOOK_NAME}\\\" --auto-scaling-group-name \\\"${ASG_NAME}\\\"\\n fi\\n\\n # Expect instance will be shut down in 5 minutes\\n sleep 300\\ndone\\n\"],\"env\":[{\"name\":\"NODE_NAME\",\"valueFrom\":{\"fieldRef\":{\"fieldPath\":\"spec.nodeName\"}}}],\"image\":\"quay.io/coreos/awscli:master\",\"name\":\"main\",\"volumeMounts\":[{\"mountPath\":\"/opt/bin\",\"name\":\"workdir\"},{\"mountPath\":\"/etc/kube-node-drainer\",\"name\":\"kube-node-drainer-status\",\"readOnly\":true}]}],\"initContainers\":[{\"command\":[\"/bin/cp\",\"-f\",\"/hyperkube\",\"/workdir/hyperkube\"],\"image\":\"gcr.io/google-containers/hyperkube-amd64:v1.8.4\",\"name\":\"hyperkube\",\"volumeMounts\":[{\"mountPath\":\"/workdir\",\"name\":\"workdir\"}]}],\"tolerations\":[{\"effect\":\"NoSchedule\",\"operator\":\"Exists\"},{\"effect\":\"NoExecute\",\"operator\":\"Exists\"},{\"key\":\"CriticalAddonsOnly\",\"operator\":\"Exists\"}],\"volumes\":[{\"emptyDir\":{},\"name\":\"workdir\"},{\"name\":\"kube-node-drainer-status\",\"projected\":{\"sources\":[{\"configMap\":{\"name\":\"kube-node-drainer-status\",\"optional\":true}}]}}]}},\"updateStrategy\":{\"type\":\"RollingUpdate\"}}}\n"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1beta1.DaemonSetSpec{Selector:(*v1.LabelSelector)(0xc424096f20), Template:v1.PodTemplateSpec{ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"k8s-app":"kube-node-drainer-ds"}, Annotations:map[string]string{"pod.beta.kubernetes.io/init-containers":"[{\"name\":\"hyperkube\",\"image\":\"quay.io/coreos/hyperkube:v1.7.2_coreos.0\",\"command\":[\"/bin/cp\",\"-f\",\"/hyperkube\",\"/workdir/hyperkube\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"workdir\",\"mountPath\":\"/workdir\"}],\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\",\"imagePullPolicy\":\"IfNotPresent\"}]", "scheduler.alpha.kubernetes.io/critical-pod":"", "pod.alpha.kubernetes.io/init-containers":"[{\"name\":\"hyperkube\",\"image\":\"quay.io/coreos/hyperkube:v1.7.2_coreos.0\",\"command\":[\"/bin/cp\",\"-f\",\"/hyperkube\",\"/workdir/hyperkube\"],\"resources\":{},\"volumeMounts\":[{\"name\":\"workdir\",\"mountPath\":\"/workdir\"}],\"terminationMessagePath\":\"/dev/termination-log\",\"terminationMessagePolicy\":\"File\",\"imagePullPolicy\":\"IfNotPresent\"}]"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.PodSpec{Volumes:[]v1.Volume{v1.Volume{Name:"workdir", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(nil), EmptyDir:(*v1.EmptyDirVolumeSource)(0xc424096f80), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}, v1.Volume{Name:"kube-node-drainer-status", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(nil), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(0xc424096fc0), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}}, InitContainers:[]v1.Container{v1.Container{Name:"hyperkube", Image:"gcr.io/google-containers/hyperkube-amd64:v1.8.4", Command:[]string{"/bin/cp", "-f", "/hyperkube", "/workdir/hyperkube"}, Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar(nil), Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"workdir", ReadOnly:false, MountPath:"/workdir", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}}, LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, Containers:[]v1.Container{v1.Container{Name:"main", Image:"quay.io/coreos/awscli:master", Command:[]string{"/bin/sh", "-xec", "metadata() { wget -O - -q http://169.254.169.254/2016-09-02/\"$1\"; }\nasg() { aws --region=\"${REGION}\" autoscaling \"$@\"; }\n\n# Hyperkube binary is not statically linked, so we need to use\n# the musl interpreter to be able to run it in this image\n# See: https://github.com/kubernetes-incubator/kube-aws/pull/674#discussion_r118889687\nkubectl() { /lib/ld-musl-x86_64.so.1 /opt/bin/hyperkube kubectl \"$@\"; }\n\nINSTANCE_ID=$(metadata meta-data/instance-id)\nREGION=$(metadata dynamic/instance-identity/document | jq -r .region)\n[ -n \"${REGION}\" ]\n\n# Not customizable, for now\nPOLL_INTERVAL=10\n\n# Used to identify the source which requested the instance termination\ntermination_source=''\n\n# Instance termination detection loop\nwhile sleep ${POLL_INTERVAL}; do\n\n # Spot instance termination check\n http_status=$(curl -o /dev/null -w '%{http_code}' -sL http://169.254.169.254/latest/meta-data/spot/termination-time)\n if [ \"${http_status}\" -eq 200 ]; then\n termination_source=spot\n break\n fi\n\n # Termination ConfigMap check\n if [ -e /etc/kube-node-drainer/asg ] && grep -q \"${INSTANCE_ID}\" /etc/kube-node-drainer/asg; then\n termination_source=asg\n break\n fi\ndone\n\n# Node draining loop\nwhile true; do\n echo Node is terminating, draining it...\n\n if ! kubectl drain --ignore-daemonsets=true --delete-local-data=true --force=true --timeout=60s \"${NODE_NAME}\"; then\n echo Not all pods on this host can be evicted, will try again\n continue\n fi\n echo All evictable pods are gone\n\n if [ \"${termination_source}\" == asg ]; then\n echo Notifying AutoScalingGroup that instance ${INSTANCE_ID} can be shutdown\n ASG_NAME=$(asg describe-auto-scaling-instances --instance-ids \"${INSTANCE_ID}\" | jq -r '.AutoScalingInstances[].AutoScalingGroupName')\n HOOK_NAME=$(asg describe-lifecycle-hooks --auto-scaling-group-name \"${ASG_NAME}\" | jq -r '.LifecycleHooks[].LifecycleHookName' | grep -i nodedrainer)\n asg complete-lifecycle-action --lifecycle-action-result CONTINUE --instance-id \"${INSTANCE_ID}\" --lifecycle-hook-name \"${HOOK_NAME}\" --auto-scaling-group-name \"${ASG_NAME}\"\n fi\n\n # Expect instance will be shut down in 5 minutes\n sleep 300\ndone\n"}, Args:[]string(nil), WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar{v1.EnvVar{Name:"NODE_NAME", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc424097060)}}, Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"workdir", ReadOnly:false, MountPath:"/opt/bin", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}, v1.VolumeMount{Name:"kube-node-drainer-status", ReadOnly:true, MountPath:"/etc/kube-node-drainer", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}}, LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"IfNotPresent", SecurityContext:(*v1.SecurityContext)(nil), Stdin:false, StdinOnce:false, TTY:false}}, RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc423d876d8), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"", DeprecatedServiceAccount:"", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:false, HostPID:false, HostIPC:false, SecurityContext:(*v1.PodSecurityContext)(0xc423d8b680), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]v1.Toleration{v1.Toleration{Key:"", Operator:"Exists", Value:"", Effect:"NoSchedule", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"", Operator:"Exists", Value:"", Effect:"NoExecute", TolerationSeconds:(*int64)(nil)}, v1.Toleration{Key:"CriticalAddonsOnly", Operator:"Exists", Value:"", Effect:"", TolerationSeconds:(*int64)(nil)}}, HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil)}}, UpdateStrategy:v1beta1.DaemonSetUpdateStrategy{Type:"RollingUpdate", RollingUpdate:(*v1beta1.RollingUpdateDaemonSet)(0xc423c439e0)}, MinReadySeconds:0, TemplateGeneration:4, RevisionHistoryLimit:(*int32)(0xc423d87768)}, Status:v1beta1.DaemonSetStatus{CurrentNumberScheduled:5, NumberMisscheduled:0, DesiredNumberScheduled:5, NumberReady:4, ObservedGeneration:4, UpdatedNumberScheduled:5, NumberAvailable:4, NumberUnavailable:1, CollisionCount:(*int32)(nil)}}: Operation cannot be fulfilled on daemonsets.extensions "kube-node-drainer-ds": the object has been modified; please apply your changes to the latest version and try again
I1201 12:16:05.708873 1 gc_controller.go:166] Forced deletion of orphaned Pod kube-controller-manager-ip-10-35-185-135.ec2.internal succeeded
I1201 12:16:05.708890 1 gc_controller.go:62] PodGC is force deleting Pod: monitoring:node-exporter-nwfw5
I1201 12:16:05.723499 1 gc_controller.go:166] Forced deletion of orphaned Pod node-exporter-nwfw5 succeeded
I1201 12:16:05.723516 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:calico-node-zn7n7
I1201 12:16:05.753754 1 gc_controller.go:166] Forced deletion of orphaned Pod calico-node-zn7n7 succeeded
I1201 12:16:05.753770 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube-proxy-sn8xn
I1201 12:16:05.778579 1 gc_controller.go:166] Forced deletion of orphaned Pod kube-proxy-sn8xn succeeded
I1201 12:16:05.778595 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube2iam-4n694
I1201 12:16:05.811886 1 gc_controller.go:166] Forced deletion of orphaned Pod kube2iam-4n694 succeeded
I1201 12:16:05.811900 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube-apiserver-ip-10-35-185-135.ec2.internal
I1201 12:16:05.833422 1 gc_controller.go:166] Forced deletion of orphaned Pod kube-apiserver-ip-10-35-185-135.ec2.internal succeeded
I1201 12:16:05.833437 1 gc_controller.go:62] PodGC is force deleting Pod: kube-system:kube-scheduler-ip-10-35-185-135.ec2.internal
I1201 12:16:05.846832 1 gc_controller.go:166] Forced deletion of orphaned Pod kube-scheduler-ip-10-35-185-135.ec2.internal succeeded
@camilb Thanks. Probably my understanding was incorrect(not sure exactly what part though).
Would you mind sharing logs from apiserver after rolling-update, too?
@mumoshu Here are the logs from the current API server on the first cluster, after multiple updates.
ip-10-35-185-49 core # docker logs -f b965566cc491
I1201 12:11:20.344595 1 server.go:114] Version: v1.8.4
I1201 12:11:20.344666 1 cloudprovider.go:59] --external-hostname was not specified. Trying to get it from the cloud provider.
I1201 12:11:20.344745 1 aws.go:847] Building AWS cloudprovider
I1201 12:11:20.344864 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I1201 12:11:20.456700 1 tags.go:76] AWS cloud filtering on ClusterID: k8sqa
I1201 12:11:20.459214 1 aws.go:847] Building AWS cloudprovider
I1201 12:11:20.459244 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I1201 12:11:20.534495 1 tags.go:76] AWS cloud filtering on ClusterID: k8sqa
I1201 12:11:21.000365 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.001173 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.002775 1 feature_gate.go:156] feature gates: map[Initializers:true]
I1201 12:11:21.002796 1 initialization.go:84] enabled Initializers feature as part of admission plugin setup
I1201 12:11:21.004556 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.007711 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.008433 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.008896 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.009527 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.010119 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.010768 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.011384 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.011971 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.012783 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.013618 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.014317 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.014951 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.015609 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.019445 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.022325 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.023774 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.026152 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.043505 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.044316 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.054074 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.055141 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.056089 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.057077 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.057979 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.058879 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.059789 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.060685 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.061570 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.062635 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.063498 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.064366 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.069080 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.069926 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.070759 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.071891 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.081377 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.104141 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.105170 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.106139 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.107102 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.109963 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.110840 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.111740 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.112784 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.113778 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.114672 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.115678 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.131652 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.132864 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.133965 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.135026 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:21.136081 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
W1201 12:11:21.201106 1 genericapiserver.go:311] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
[restful] 2017/12/01 12:11:21 log.go:33: [restful/swagger] listing is available at https://54.91.26.226/swaggerapi
[restful] 2017/12/01 12:11:21 log.go:33: [restful/swagger] https://54.91.26.226/swaggerui/ is mapped to folder /swagger-ui/
[restful] 2017/12/01 12:11:21 log.go:33: [restful/swagger] listing is available at https://54.91.26.226/swaggerapi
[restful] 2017/12/01 12:11:21 log.go:33: [restful/swagger] https://54.91.26.226/swaggerui/ is mapped to folder /swagger-ui/
I1201 12:11:21.991889 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:11:24.122937 1 insecure_handler.go:118] Serving insecurely on 127.0.0.1:8080
I1201 12:11:24.124024 1 serve.go:85] Serving securely on 0.0.0.0:443
I1201 12:11:24.124159 1 controller.go:84] Starting OpenAPI AggregationController
I1201 12:11:24.136417 1 crd_finalizer.go:242] Starting CRDFinalizer
I1201 12:11:24.136656 1 crdregistration_controller.go:112] Starting crd-autoregister controller
I1201 12:11:24.136674 1 controller_utils.go:1041] Waiting for caches to sync for crd-autoregister controller
I1201 12:11:24.137013 1 apiservice_controller.go:112] Starting APIServiceRegistrationController
I1201 12:11:24.137078 1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
I1201 12:11:24.137112 1 available_controller.go:192] Starting AvailableConditionController
I1201 12:11:24.137134 1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I1201 12:11:24.137736 1 customresource_discovery_controller.go:152] Starting DiscoveryController
I1201 12:11:24.137765 1 naming_controller.go:277] Starting NamingConditionController
W1201 12:11:24.198545 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30335019 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 12:11:24.236768 1 controller_utils.go:1048] Caches are synced for crd-autoregister controller
I1201 12:11:24.236878 1 autoregister_controller.go:136] Starting autoregister controller
I1201 12:11:24.236887 1 cache.go:32] Waiting for caches to sync for autoregister controller
I1201 12:11:24.237180 1 cache.go:39] Caches are synced for AvailableConditionController controller
I1201 12:11:24.237181 1 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1201 12:11:24.336982 1 cache.go:39] Caches are synced for autoregister controller
I1201 12:11:25.917671 1 trace.go:76] Trace[932620041]: "Create /api/v1/nodes" (started: 2017-12-01 12:11:24.914559896 +0000 UTC) (total time: 1.003065327s):
Trace[932620041]: [1.000195458s] [1.000082807s] About to store object in database
I1201 12:11:27.474440 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:11:27.503562 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:11:27.503580 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 12:11:34.210298 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336082 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:11:44.217812 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336151 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:11:54.225296 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336187 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:12:04.232903 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336213 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:12:14.239912 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336288 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:12:24.246709 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336347 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 12:12:27.503731 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:12:27.506243 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:12:27.506255 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 12:12:34.253799 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336374 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:12:44.259927 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336395 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:12:54.266199 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336491 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:13:04.272642 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336532 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:13:14.279090 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336555 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:13:24.285425 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336574 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:13:34.291808 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336592 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:13:44.298004 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336611 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:13:54.304391 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336629 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:14:04.310691 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336647 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:14:14.317077 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336666 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:14:24.322855 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336684 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 12:14:27.506381 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:14:27.595273 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:14:27.595288 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 12:14:34.329056 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336702 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 12:14:44.335916 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 5fd72e3b-582d-11e7-80a5-0e64000698fe 30336721 0 2017-06-23 16:02:36 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.35.185.49 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 12:14:44.610047 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:14:44.644441 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:14:44.645952 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 12:15:03.698515 1 trace.go:76] Trace[1231635878]: "Get /api/v1/namespaces/kube-system/pods/kube-controller-manager-ip-10-35-185-49.ec2.internal/log" (started: 2017-12-01 12:14:36.430180928 +0000 UTC) (total time: 27.268312474s):
Trace[1231635878]: [27.268312474s] [27.266535432s] END
I1201 12:15:03.698900 1 trace.go:76] Trace[133454118]: "Get /api/v1/namespaces/kube-system/pods/kube-controller-manager-ip-10-35-185-49.ec2.internal/log" (started: 2017-12-01 12:13:40.181877929 +0000 UTC) (total time: 1m23.516987556s):
Trace[133454118]: [1m23.516987556s] [1m23.515058689s] END
I1201 12:16:26.586848 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:16:26.591137 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:16:26.591150 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:17:26.591266 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:17:26.595447 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:17:26.595460 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:19:26.595619 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:19:26.598114 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:19:26.598151 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:21:26.621145 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:21:27.140101 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:21:27.140117 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:22:27.140240 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:22:27.142520 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:22:27.142533 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:24:27.142666 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:24:27.146789 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:24:27.146801 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:26:26.644870 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:26:26.648838 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:26:26.648850 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:27:26.648962 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:27:26.792745 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:27:26.792760 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:29:26.792958 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:29:26.795116 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:29:26.795128 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:31:26.622558 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:31:27.034170 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:31:27.034189 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:32:27.034308 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:32:27.038150 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:32:27.038162 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:34:27.038270 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:34:27.042268 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:34:27.042281 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:36:26.608462 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:36:26.612466 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:36:26.612484 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:37:26.612601 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:37:26.616461 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:37:26.616473 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:39:26.616599 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:39:26.618555 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:39:26.618595 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:41:26.708243 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:41:27.154143 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:41:27.154159 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:42:27.154271 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:42:27.158300 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:42:27.158315 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:44:27.158415 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:44:27.162464 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:44:27.162477 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:46:26.600388 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:46:26.604685 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:46:26.604697 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:47:26.604792 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:47:26.609196 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:47:26.609208 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:49:26.609349 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:49:26.611400 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:49:26.611428 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:51:26.640456 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:51:27.028049 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:51:27.028064 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:52:27.028172 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:52:27.032331 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:52:27.032361 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:54:27.032468 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:54:27.036670 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:54:27.036684 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 12:56:26.648446 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 12:56:26.652438 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 12:56:26.652452 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
Seems to be related to https://github.com/kubernetes-incubator/metrics-server/issues/27
@mumoshu Just finished the upgrade process for the second cluster using https://github.com/camilb/kube-aws/commit/a5a2f23bc1256bd3b67a65d895d8e9f4d0376617
Here are the logs for the API server:
ip-10-0-101-25 core # docker logs e47528fb89f3
I1201 14:39:34.102574 1 server.go:114] Version: v1.8.4
I1201 14:39:34.102632 1 cloudprovider.go:59] --external-hostname was not specified. Trying to get it from the cloud provider.
I1201 14:39:34.102722 1 aws.go:847] Building AWS cloudprovider
I1201 14:39:34.102778 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I1201 14:39:34.267201 1 tags.go:76] AWS cloud filtering on ClusterID: k8sqa
I1201 14:39:34.269772 1 aws.go:847] Building AWS cloudprovider
I1201 14:39:34.269804 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I1201 14:39:34.871975 1 tags.go:76] AWS cloud filtering on ClusterID: k8sqa
I1201 14:39:35.058627 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.059113 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.060563 1 feature_gate.go:156] feature gates: map[Initializers:true]
I1201 14:39:35.060579 1 initialization.go:84] enabled Initializers feature as part of admission plugin setup
I1201 14:39:35.062080 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.065194 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.065894 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.066393 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.067104 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.067978 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.068781 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.069496 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.070213 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.070905 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.071649 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.072425 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.073278 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.074191 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.075052 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.075797 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.076377 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.077068 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.099422 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.100423 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.101407 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.102533 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.103635 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.104422 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.105273 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.106100 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.106982 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.113273 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.114083 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.114994 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.115927 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.116804 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.117550 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.118197 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.118830 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.119688 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.120554 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.121321 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.122027 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.122719 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.123387 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.142141 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.144069 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.157552 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.158831 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.159765 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.160743 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.161739 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.162865 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.163800 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.164589 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.165330 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:35.166130 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
W1201 14:39:35.236161 1 genericapiserver.go:311] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
[restful] 2017/12/01 14:39:35 log.go:33: [restful/swagger] listing is available at https://10.0.101.25:443/swaggerapi
[restful] 2017/12/01 14:39:35 log.go:33: [restful/swagger] https://10.0.101.25:443/swaggerui/ is mapped to folder /swagger-ui/
[restful] 2017/12/01 14:39:35 log.go:33: [restful/swagger] listing is available at https://10.0.101.25:443/swaggerapi
[restful] 2017/12/01 14:39:35 log.go:33: [restful/swagger] https://10.0.101.25:443/swaggerui/ is mapped to folder /swagger-ui/
I1201 14:39:35.978276 1 logs.go:41] warning: ignoring ServerName for user-provided CA for backwards compatibility is deprecated
I1201 14:39:38.069534 1 insecure_handler.go:118] Serving insecurely on 127.0.0.1:8080
I1201 14:39:38.070609 1 serve.go:85] Serving securely on 0.0.0.0:443
I1201 14:39:38.071281 1 apiservice_controller.go:112] Starting APIServiceRegistrationController
I1201 14:39:38.071306 1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
I1201 14:39:38.071732 1 controller.go:84] Starting OpenAPI AggregationController
I1201 14:39:38.071768 1 crdregistration_controller.go:112] Starting crd-autoregister controller
I1201 14:39:38.071774 1 controller_utils.go:1041] Waiting for caches to sync for crd-autoregister controller
I1201 14:39:38.076559 1 available_controller.go:192] Starting AvailableConditionController
I1201 14:39:38.076580 1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I1201 14:39:38.076601 1 crd_finalizer.go:242] Starting CRDFinalizer
I1201 14:39:38.076885 1 customresource_discovery_controller.go:152] Starting DiscoveryController
I1201 14:39:38.076908 1 naming_controller.go:277] Starting NamingConditionController
W1201 14:39:38.147784 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 2264895 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:39:38.171405 1 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I1201 14:39:38.171857 1 controller_utils.go:1048] Caches are synced for crd-autoregister controller
I1201 14:39:38.171931 1 autoregister_controller.go:136] Starting autoregister controller
I1201 14:39:38.171953 1 cache.go:32] Waiting for caches to sync for autoregister controller
I1201 14:39:38.176680 1 cache.go:39] Caches are synced for AvailableConditionController controller
I1201 14:39:38.272011 1 cache.go:39] Caches are synced for autoregister controller
I1201 14:39:39.129708 1 storage_rbac.go:196] updated clusterrole.rbac.authorization.k8s.io/system:controller:horizontal-pod-autoscaler with additional permissions: [PolicyRule{Resources:["*"], APIGroups:["custom.metrics.k8s.io"], Verbs:["get"]}]
W1201 14:39:48.161908 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4587171 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:39:58.168921 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4587373 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:40:08.201379 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4587548 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:40:18.211142 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4587700 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:40:28.219374 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4587776 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:40:38.231227 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4587919 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:40:48.249633 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588187 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:40:58.256494 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588284 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:41:08.264085 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588448 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:41:18.280431 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588595 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:41:28.290915 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588749 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:41:38.298149 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588843 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:41:48.305869 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588876 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:41:58.312769 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588937 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:42:08.322282 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4588997 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:42:09.696300 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:42:09.861925 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:42:09.861941 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 14:42:18.331063 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589065 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:42:28.338398 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589109 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:42:38.345438 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589153 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:42:48.352798 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589172 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:42:58.360435 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589193 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:43:08.367886 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589208 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:43:09.863536 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W1201 14:43:18.375397 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589246 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:43:28.383149 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4589443 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:43:38.391435 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590082 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
E1201 14:43:39.864256 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.128:443: i/o timeout'
Trying to reach: 'https://10.3.0.128:443/swagger.json', Header: map[]
I1201 14:43:39.864272 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 14:43:48.398840 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590193 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:43:58.405906 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590236 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:44:08.412953 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590293 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:44:18.421103 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590308 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:44:28.428363 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590323 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:44:38.436354 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590370 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:44:39.864437 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
I1201 14:44:39.864455 1 controller.go:122] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Nothing (removed from the queue).
W1201 14:44:48.444271 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590396 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:44:58.451743 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590490 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:45:08.458835 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590550 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:45:18.465430 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590621 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:45:24.505253 1 trace.go:76] Trace[1189838613]: "Get /api/v1/namespaces/kube-system/pods/kube-apiserver-ip-10-0-101-25.eu-west-1.compute.internal/log" (started: 2017-12-01 14:42:04.20217864 +0000 UTC) (total time: 3m20.303045749s):
Trace[1189838613]: [3m20.303045749s] [3m20.300772957s] END
W1201 14:45:28.474113 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590645 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:45:38.483214 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4590815 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:45:48.490738 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591060 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:45:55.081058 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:45:55.082476 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error: 'dial tcp 10.3.0.128:443: getsockopt: connection refused'
Trying to reach: 'https://10.3.0.128:443/swagger.json', Header: map[]
I1201 14:45:55.082510 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 14:45:58.499241 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591145 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:46:08.506645 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591187 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:46:18.513334 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591237 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:46:28.520671 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591284 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:46:38.527825 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591334 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:46:48.535777 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591386 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:46:55.082627 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:46:55.280389 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:46:55.280403 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
W1201 14:46:58.542226 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591435 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:47:08.549560 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591481 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:47:18.556850 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591516 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
W1201 14:47:28.563601 1 controller.go:386] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default /api/v1/namespaces/default/endpoints/kubernetes 46cdb0f3-c071-11e7-9a1b-068a8cfdd8b0 4591558 0 2017-11-03 08:30:40 +0000 UTC <nil> <nil> map[] map[] [] nil [] } [{[{10.0.101.25 <nil> <nil>}] [] [{https 443 TCP}]}]}
I1201 14:48:55.280529 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:48:55.366033 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:48:55.366044 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 14:49:40.699439 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:49:41.167807 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:49:41.167820 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 14:50:41.167941 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:50:41.247744 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:50:41.247760 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 14:51:03.850511 1 trace.go:76] Trace[1676413587]: "Get /api/v1/namespaces/kube-system/pods/kube-controller-manager-ip-10-0-101-25.eu-west-1.compute.internal/log" (started: 2017-12-01 14:41:11.020992093 +0000 UTC) (total time: 9m52.829479624s):
Trace[1676413587]: [9m52.829479624s] [9m52.827235529s] END
E1201 14:51:32.306084 1 status.go:62] apiserver received an error that is not an metav1.Status: client disconnected
I1201 14:51:32.306478 1 trace.go:76] Trace[262423617]: "Update /apis/extensions/v1beta1/namespaces/wordpress/ingresses/web-ing/status" (started: 2017-12-01 14:51:15.287820819 +0000 UTC) (total time: 17.018637176s):
Trace[262423617]: [17.018637176s] [17.018637176s] END
I1201 14:52:41.247883 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:52:41.261292 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:52:41.261305 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 14:54:40.626200 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:54:40.629057 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:54:40.629085 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
I1201 14:55:40.629194 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
E1201 14:55:40.645584 1 controller.go:111] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: OpenAPI spec does not exists
I1201 14:55:40.645597 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
@camilb Thanks again for your effort! I'll try it myself but anyways, do you have any idea what was the issue and how your change fixed it?
@mumoshu
What I think it's happening is that before this "fix" the v1beta1.metrics.k8s.io
resource was created right after the api server was up, before the controller-manager
.
The systemd service is running after install-kube-system.service After=install-kube-system.service
so the controller-manager starts before the v1beta1.metrics.k8s.io
resource is created.
Check the timestamps:
ip-10-0-101-25 core # docker inspect 01d08da9e43d
[
{
"Id": "01d08da9e43de848ee53beac8961e032b9a03267bed19e3b20a01560408e2e2e",
"Created": "2017-12-01T14:39:33.07577671Z",
"Path": "/hyperkube",
"Args": [
"controller-manager",
systemctl status -l install-metrics-server-apiservice.service
Active: active (exited) since Fri 2017-12-01 14:40:18 UTC; 1h 0min ago
$ kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io","namespace":""},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
creationTimestamp: 2017-12-01T14:40:18Z
Also tested rebooting the master several times and everything looks fine.
Conformance e2e tests from the upgraded cluster:
https://scanner.heptio.com/1dd7ff34ccda9b9ad6a45a112542d240/diagnostics/
Update
Endpoint creation timestamp:
$ kubectl get endpoints metrics-server -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-12-01T14:39:59Z
Also I think when the controller-manager
fails, the endpoint for metrics-server
is not created and it will not recover.
Maybe we can check if the endpoint exists instead of the service in /opt/bin/install-metrics-server-apiservice
while ! kubectl get endpoints metrics-server -n kube-system; do
echo Waiting until metrics-server endpoint is created.
sleep 3
done
@camilb Thanks again!
controller-manager starts before the v1beta1.metrics.k8s.io resource is created.
This is probably the last point I don't fully understand.
Why this fixes it after rolling-update of controller node?
After the roll, metrics.k8s.io is already installed to apiserver when apiserver/controlle-manager starts(because it would have persisted in etcd)...
I'm ok as long as it works but just curious!
Hi @mumoshu this is what I understand up to this moment:
Without the fix:
With the fix:
Note: the last upgrade was performed on a cluster that didn't have the v1beta1.metrics.k8s.io
Also maybe helps:
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
can have 3 different states:
status:
conditions:
- lastTransitionTime: 2017-12-01T17:57:17Z
message: all checks passed
reason: Passed
status: "True"
type: Available
status:
conditions:
- lastTransitionTime: 2017-12-01T18:01:24Z
message: endpoints for service/metrics-server in "kube-system" have no addresses
reason: MissingEndpoints
status: "False"
type: Available
conditions:
- lastTransitionTime: 2017-12-01T18:01:24Z
message: service/metrics-server in "kube-system" is not present
reason: ServiceNotFound
status: "False"
type: Available
In each of these cases, the requests will not fail. The API server will not try to reach the metrics server if the check fails.
@camilb Thanks for the info! I've tried w/ your modification in my brand-new cluster.
However, no luck so far.
Steps to reproduce:
kubectl get no
from my workstation takes 3 minutes or so
kubectl --namespace kube-system delete apiservice v1beta1.metrics.k8s.io
(also takes 3 minuets or so without --request-timeout) result in apiserver to become responsive againSo, your fix https://github.com/camilb/kube-aws/commit/a5a2f23bc1256bd3b67a65d895d8e9f4d0376617 doesn't work(by its nature) "after" a rolling-update of a controller node, right? Or am I still missing something?
With the fix:
- The API starts
- controller manager starts and creates the endpoint
- apiservice v1beta1.metrics.k8s.io is created
- the API server reach the endpoint without the multiple timeouts
// No offence but really just trying to understand what's going on!
Update: My current work-around of deleting the apiservice on the beginning of install-kube-system isn't ideal. kubectl get/delete
on the apiservice takes 3 min each, so the work-around adds 6 minutes in total to the controller boot time 😢
However, we can use --request-timeout
as @camilb suggested to reduce the amount of additional boot time to a minimum.
As I have guessed last week, this seems to affect clusters with single controller node only.
In case when we have 2 controller nodes in a kube-aws cluster, the issue disappears:
Trial 1:
+00:00:29 Controlplane UPDATE_IN_PROGRESS Controllers "Rolling update initiated. Terminating 2 obsolete instance(s) in batches of 1, while keeping at least 1 instance(s) in service. Waiting on resource signals with a timeout of PT10M when new instances are added to the autoscaling group."
+00:00:29 Controlplane UPDATE_IN_PROGRESS Controllers "Terminating instance(s) [i-0796ec6e6c776a6e8]; replacing with 1 new instance(s)."
+00:01:47 Controlplane UPDATE_IN_PROGRESS Controllers "Successfully terminated instance(s) [i-0796ec6e6c776a6e8] (Progress 50%)."
+00:01:48 Controlplane UPDATE_IN_PROGRESS Controllers "New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT10M."
+00:04:33 Controlplane UPDATE_IN_PROGRESS Controllers "Received SUCCESS signal with UniqueId i-01f53298f68c3ee94"
+00:04:34 Controlplane UPDATE_IN_PROGRESS Controllers "Terminating instance(s) [i-0d66a08184aadc58f]; replacing with 1 new instance(s)."
+00:05:28 Controlplane UPDATE_IN_PROGRESS Controllers "Successfully terminated instance(s) [i-0d66a08184aadc58f] (Progress 100%)."
+00:05:28 Controlplane UPDATE_IN_PROGRESS Controllers "New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT10M."
+00:08:33 Controlplane UPDATE_IN_PROGRESS Controllers "Received SUCCESS signal with UniqueId i-04b68905f74853711"
+00:08:36 Controlplane UPDATE_COMPLETE Controllers
Trial 2:
+00:00:26 Controlplane UPDATE_IN_PROGRESS Controllers "Rolling update initiated. Terminating 2 obsolete instance(s) in batches of 1, while keeping at least 1 instance(s) in service. Waiting on resource signals with a timeout of PT10M when new instances are added to the autoscaling group."
+00:00:27 Controlplane UPDATE_IN_PROGRESS Controllers "Terminating instance(s) [i-01f53298f68c3ee94]; replacing with 1 new instance(s)."
+00:01:21 Controlplane UPDATE_IN_PROGRESS Controllers "Successfully terminated instance(s) [i-01f53298f68c3ee94] (Progress 50%)."
+00:01:21 Controlplane UPDATE_IN_PROGRESS Controllers "New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT10M."
+00:03:46 Controlplane UPDATE_IN_PROGRESS Controllers "Received SUCCESS signal with UniqueId i-0e2e82a25cadc1318"
+00:03:47 Controlplane UPDATE_IN_PROGRESS Controllers "Terminating instance(s) [i-04b68905f74853711]; replacing with 1 new instance(s)."
+00:04:40 Controlplane UPDATE_IN_PROGRESS Controllers "Successfully terminated instance(s) [i-04b68905f74853711] (Progress 100%)."
+00:04:41 Controlplane UPDATE_IN_PROGRESS Controllers "New instance(s) added to autoscaling group - Waiting on 1 resource signal(s) with a timeout of PT10M."
+00:07:05 Controlplane UPDATE_IN_PROGRESS Controllers "Received SUCCESS signal with UniqueId i-07f01c69acbeb2aca"
+00:07:08 Controlplane UPDATE_COMPLETE Controllers
One more sub-optimal work-around: maintain at least 1 controller node while a rolling-update:
controller:
# intead of:
# count: 1
autoScalingGroup:
minSize: 1
# Note that cfn rejects any update when this was `maxSize: 1` because then it can't maintain 1 controller node while a rolling-update
maxSize: 2
rollingUpdateMinInstancesInService: 1
This is sub-optional because it would work for rolling-updates but not for sudden node failure.
Just finished an experiment with/without --request-timeout=1s
.
--request-timeout=1s
):core@ip-10-0-0-252 ~ $ journalctl -u install-kube-system
-- Logs begin at Mon 2017-12-04 08:49:38 UTC, end at Mon 2017-12-04 09:26:18 UTC. --
Dec 04 08:50:01 ip-10-0-0-252.ap-northeast-1.compute.internal systemd[1]: Starting install-kube-system.service...
Dec 04 08:50:01 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1111]: activating
Dec 04 08:50:01 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1111]: waiting until kubelet starts
Dec 04 08:50:11 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1111]: activating
Dec 04 08:50:11 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1111]: waiting until kubelet starts
Dec 04 08:50:21 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1111]: active
Dec 04 08:50:21 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1530]: active
Dec 04 08:50:22 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:50:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:50:42 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:50:52 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:51:02 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:51:12 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:51:22 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: waiting until apiserver starts
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: {
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "major": "1",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "minor": "8+",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "gitVersion": "v1.8.4+coreos.0",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "gitCommit": "4292f9682595afddbb4f8b1483673449c74f9619",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "gitTreeState": "clean",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "buildDate": "2017-11-21T17:22:25Z",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "goVersion": "go1.8.3",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "compiler": "gc",
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: "platform": "linux/amd64"
Dec 04 08:51:32 ip-10-0-0-252.ap-northeast-1.compute.internal bash[1535]: }
Dec 04 08:55:02 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: NAME STATUS AGE
Dec 04 08:55:02 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: kube-system Active 4h
Dec 04 08:57:33 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: secret "kubernetes-dashboard-certs" unchanged
Dec 04 09:00:03 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: configmap "kube-dns" unchanged
Dec 04 09:02:34 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: configmap "kube-proxy-config" unchanged
Dec 04 09:05:04 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: serviceaccount "kube-dns" unchanged
Dec 04 09:07:35 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: serviceaccount "heapster" unchanged
Dec 04 09:10:05 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: serviceaccount "kube-proxy" unchanged
Dec 04 09:12:36 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: serviceaccount "kubernetes-dashboard" unchanged
Dec 04 09:15:06 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: serviceaccount "metrics-server" configured
Dec 04 09:17:37 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: deployment "tiller-deploy" configured
Dec 04 09:18:07 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: service "tiller-deploy" configured
Dec 04 09:20:38 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: deployment "kube-dns" configured
Dec 04 09:23:08 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: deployment "kube-dns-autoscaler" unchanged
Dec 04 09:25:39 ip-10-0-0-252.ap-northeast-1.compute.internal retry[1920]: deployment "kubernetes-dashboard" unchanged
Beware the timestamps! Every kubectl command is taking 2~3 minutes each.
--request-timeout=1s
):core@ip-10-0-0-17 ~ $ journalctl -u install-kube-system
-- Logs begin at Mon 2017-12-04 09:31:27 UTC, end at Mon 2017-12-04 09:39:02 UTC. --
Dec 04 09:31:53 ip-10-0-0-17.ap-northeast-1.compute.internal systemd[1]: Starting install-kube-system.service...
Dec 04 09:31:53 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1125]: activating
Dec 04 09:31:53 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1125]: waiting until kubelet starts
Dec 04 09:32:03 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1125]: activating
Dec 04 09:32:03 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1125]: waiting until kubelet starts
Dec 04 09:32:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1125]: active
Dec 04 09:32:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1522]: active
Dec 04 09:32:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: waiting until apiserver starts
Dec 04 09:32:23 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: waiting until apiserver starts
Dec 04 09:32:33 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: waiting until apiserver starts
Dec 04 09:32:43 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: waiting until apiserver starts
Dec 04 09:32:53 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: waiting until apiserver starts
Dec 04 09:33:03 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: waiting until apiserver starts
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: {
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "major": "1",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "minor": "8+",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "gitVersion": "v1.8.4+coreos.0",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "gitCommit": "4292f9682595afddbb4f8b1483673449c74f9619",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "gitTreeState": "clean",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "buildDate": "2017-11-21T17:22:25Z",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "goVersion": "go1.8.3",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "compiler": "gc",
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: "platform": "linux/amd64"
Dec 04 09:33:13 ip-10-0-0-17.ap-northeast-1.compute.internal bash[1526]: }
Dec 04 09:33:21 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: NAME STATUS AGE
Dec 04 09:33:21 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: kube-system Active 5h
Dec 04 09:33:26 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: secret "kubernetes-dashboard-certs" unchanged
Dec 04 09:33:32 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: configmap "kube-dns" unchanged
Dec 04 09:33:37 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: configmap "kube-proxy-config" unchanged
Dec 04 09:33:43 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: serviceaccount "kube-dns" unchanged
Dec 04 09:33:49 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: serviceaccount "heapster" unchanged
Dec 04 09:33:54 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: serviceaccount "kube-proxy" unchanged
Dec 04 09:34:00 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: serviceaccount "kubernetes-dashboard" unchanged
Dec 04 09:34:05 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: serviceaccount "metrics-server" configured
Dec 04 09:34:11 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "tiller-deploy" configured
Dec 04 09:34:12 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: service "tiller-deploy" configured
Dec 04 09:34:17 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "kube-dns" configured
Dec 04 09:34:23 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "kube-dns-autoscaler" unchanged
Dec 04 09:34:28 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "kubernetes-dashboard" unchanged
Dec 04 09:34:34 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "cluster-autoscaler" unchanged
Dec 04 09:34:39 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "heapster" configured
Dec 04 09:34:45 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "metrics-server" unchanged
Dec 04 09:34:51 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: daemonset "kube-proxy" unchanged
Dec 04 09:34:56 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: service "kube-dns" unchanged
Dec 04 09:35:02 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: service "heapster" unchanged
Dec 04 09:35:07 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: service "kubernetes-dashboard" unchanged
Dec 04 09:35:13 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: service "metrics-server" unchanged
Dec 04 09:35:18 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: deployment "kube-rescheduler" unchanged
Dec 04 09:35:24 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrole "kube-aws:node-extensions" configured
Dec 04 09:35:29 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrole "system:metrics-server" configured
Dec 04 09:35:35 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:admin" configured
Dec 04 09:35:40 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:system-worker" configured
Dec 04 09:35:46 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:node" configured
Dec 04 09:35:52 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:node-proxier" configured
Dec 04 09:35:57 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:node-extensions" configured
Dec 04 09:36:03 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "heapster" configured
Dec 04 09:36:08 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "metrics-server:system:auth-delegator" configured
Dec 04 09:36:09 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "system:metrics-server" configured
Dec 04 09:36:15 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kubernetes-dashboard" configured
Dec 04 09:36:20 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: role "system:pod-nanny" unchanged
Dec 04 09:36:26 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: role "kubernetes-dashboard-minimal" unchanged
Dec 04 09:36:31 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: rolebinding "heapster-nanny" unchanged
Dec 04 09:36:37 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: rolebinding "kubernetes-dashboard-minimal" unchanged
Dec 04 09:36:42 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: rolebinding "metrics-server-auth-reader" unchanged
Dec 04 09:36:48 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrole "kube-aws:node-bootstrapper" configured
Dec 04 09:36:54 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrole "kube-aws:kubelet-certificate-bootstrap" configured
Dec 04 09:36:59 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:node-bootstrapper" configured
Dec 04 09:37:05 ip-10-0-0-17.ap-northeast-1.compute.internal retry[1900]: clusterrolebinding "kube-aws:kubelet-certificate-bootstrap" configur
Dec 04 09:37:05 ip-10-0-0-17.ap-northeast-1.compute.internal systemd[1]: Started install-kube-system.service.
Now it takes 4~5 seconds each and 4 minuets in total. 4 minutes in total is still a considerable delay compared to the normal case.
I guess we have several choices on top of this now:
--request-timeout=0.5s
kubectl apply -f
w/ the request timeoutHi @mumoshu are you considering removing the metrics server for now? I'm still not see any component benefiting from it. kubectl top
still uses heapser. Also checked in GKE, they still use heapster for k8s v1.8.4
@camilb Thanks for the suggestion. Yes- that's also ok for me. I had verified that kubectl top
worked w/o metrics-server, too.
However, if we expect to need it sooner or later, I'd rather like to hide it behind a flag in cluster.yaml like metricsServer.enabled
which defaults to false
, instead of completely deleting it, so that we can provide a smoother path for everyone to start experimenting with metrics-server.
WDYT?
Updated my comment - Please reload if you've read it in an email notification.
@mumoshu yes, I agree using the flag.
@camilb Thanks for the confirmation. I'll make it until v0.9.9!
--request-timeout=1s
+ kubect apply -f
batching per resource kind:
core@ip-10-0-0-155 ~ $ journalctl -u install-kube-system -f
-- Logs begin at Mon 2017-12-04 10:55:36 UTC. --
Dec 04 10:56:10 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1123]: activating
Dec 04 10:56:10 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1123]: waiting until kubelet starts
Dec 04 10:56:20 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1123]: activating
Dec 04 10:56:20 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1123]: waiting until kubelet starts
Dec 04 10:56:30 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1123]: active
Dec 04 10:56:30 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1706]: active
Dec 04 10:56:30 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:56:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:56:50 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:57:00 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:57:10 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:57:20 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:57:30 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: waiting until apiserver starts
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: {
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "major": "1",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "minor": "8+",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "gitVersion": "v1.8.4+coreos.0",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "gitCommit": "4292f9682595afddbb4f8b1483673449c74f9619",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "gitTreeState": "clean",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "buildDate": "2017-11-21T17:22:25Z",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "goVersion": "go1.8.3",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "compiler": "gc",
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: "platform": "linux/amd64"
Dec 04 10:57:40 ip-10-0-0-155.ap-northeast-1.compute.internal bash[1711]: }
Dec 04 10:57:47 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: NAME STATUS AGE
Dec 04 10:57:47 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: kube-system Active 16m
Dec 04 10:57:53 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: configmap "calico-config" unchanged
Dec 04 10:57:54 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: secret "calico-etcd-secrets" unchanged
Dec 04 10:57:55 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: daemonset "calico-node" configured
Dec 04 10:57:56 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "calico-kube-controllers" unchanged
Dec 04 10:58:00 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
Dec 04 10:58:02 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: secret "kubernetes-dashboard-certs" configured
Dec 04 10:58:09 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: configmap "kube-dns" unchanged
Dec 04 10:58:10 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: configmap "kube-proxy-config" unchanged
Dec 04 10:58:15 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: serviceaccount "kube-dns" unchanged
Dec 04 10:58:16 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: serviceaccount "heapster" unchanged
Dec 04 10:58:18 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: serviceaccount "kube-proxy" unchanged
Dec 04 10:58:19 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: serviceaccount "kubernetes-dashboard" unchanged
Dec 04 10:58:20 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: serviceaccount "metrics-server" configured
Dec 04 10:58:25 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "tiller-deploy" configured
Dec 04 10:58:26 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: service "tiller-deploy" configured
Dec 04 10:58:32 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: daemonset "dnsmasq-node" unchanged
Dec 04 10:58:37 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "kube-dns" configured
Dec 04 10:58:38 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "kube-dns-autoscaler" unchanged
Dec 04 10:58:39 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "kubernetes-dashboard" unchanged
Dec 04 10:58:40 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "cluster-autoscaler" unchanged
Dec 04 10:58:41 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "heapster" configured
Dec 04 10:58:42 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "metrics-server" unchanged
Dec 04 10:58:48 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: daemonset "kube-proxy" unchanged
Dec 04 10:58:53 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: service "kube-dns" unchanged
Dec 04 10:58:54 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: service "heapster" unchanged
Dec 04 10:58:55 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: service "kubernetes-dashboard" unchanged
Dec 04 10:58:56 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: service "metrics-server" unchanged
Dec 04 10:59:02 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: deployment "kube-rescheduler" unchanged
Dec 04 10:59:07 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrole "kube-aws:node-extensions" configured
Dec 04 10:59:08 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrole "system:metrics-server" configured
Dec 04 10:59:14 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:admin" configured
Dec 04 10:59:15 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:system-worker" configured
Dec 04 10:59:16 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:node" configured
Dec 04 10:59:17 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:node-proxier" configured
Dec 04 10:59:18 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:node-extensions" configured
Dec 04 10:59:19 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "heapster" configured
Dec 04 10:59:20 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "metrics-server:system:auth-delegator" configured
Dec 04 10:59:21 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "system:metrics-server" configured
Dec 04 10:59:26 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kubernetes-dashboard" configured
Dec 04 10:59:32 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: role "system:pod-nanny" unchanged
Dec 04 10:59:33 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: role "kubernetes-dashboard-minimal" unchanged
Dec 04 10:59:39 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: rolebinding "heapster-nanny" unchanged
Dec 04 10:59:40 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: rolebinding "kubernetes-dashboard-minimal" unchanged
Dec 04 10:59:41 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: rolebinding "metrics-server-auth-reader" unchanged
Dec 04 10:59:46 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrole "kube-aws:node-bootstrapper" configured
Dec 04 10:59:47 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrole "kube-aws:kubelet-certificate-bootstrap" configured
Dec 04 10:59:53 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:node-bootstrapper" configured
Dec 04 10:59:54 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube-aws:kubelet-certificate-bootstrap" configured
Dec 04 10:59:59 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: serviceaccount "kube2iam" unchanged
Dec 04 11:00:00 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrole "kube2iam" configured
Dec 04 11:00:01 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: clusterrolebinding "kube2iam" configured
Dec 04 11:00:07 ip-10-0-0-155.ap-northeast-1.compute.internal retry[2100]: daemonset "kube2iam" unchanged
Dec 04 11:00:07 ip-10-0-0-155.ap-northeast-1.compute.internal systemd[1]: Started install-kube-system.service.
Notice that delays are inserted only between resource kinds, and the total time spend in install-kube-system was 2~3 minutes.
Probably we no longer need to rely on the work-around of deleting the apiservice.
install-kube-system
now looks like:
kubectl() {
# --request-timeout=1s is intended to instruct kubectl to give up discovering unresponsive apiservice(s) in certain periods
# so that temporal freakiness/unresponsity of specific apiservice until apiserver/controller-manager fully starts doesn't
# affect the whole controller bootstrap process.
/usr/bin/docker run --rm --net=host -v /srv/kubernetes:/srv/kubernetes {{.HyperkubeImage.RepoWithTag}} /hyperkube kubectl --request\
-timeout=1s "$@"
}
# Try to batch as many files as possible to reduce the total amount of delay due to wilderness in the API aggregation
# See https://github.com/kubernetes-incubator/kube-aws/issues/1039
applyall() {
kubectl apply -f $(echo "$@" | tr ' ' ',')
}
while ! kubectl get ns kube-system; do
echo Waiting until kube-system created.
sleep 3
done
mfdir=/srv/kubernetes/manifests
{{ if .UseCalico }}
/bin/bash /opt/bin/populate-tls-calico-etcd
applyall "${mfdir}/calico.yaml"
{{ end }}
{{ if .Experimental.NodeDrainer.Enabled }}
applyall "${mfdir}/{kube-node-drainer-ds,kube-node-drainer-asg-status-updater-de}".yaml"
{{ end }}
#*snip*
In usual cases, the install-kube-system finishes in at most 1 minute after the apiserver becomes ready. This is far shorter than the 2~3 minutes I've achieved with --request-timeout=1s
. Probably I shouldn't aggressively lower the timeout for reliability though?
core@ip-10-0-0-41 ~ $ journalctl -u install-kube-system | cat
-- Logs begin at Mon 2017-12-04 12:30:13 UTC, end at Mon 2017-12-04 12:42:34 UTC. --
Dec 04 12:30:43 ip-10-0-0-41.ap-northeast-1.compute.internal systemd[1]: Starting install-kube-system.service...
Dec 04 12:30:43 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: activating
Dec 04 12:30:43 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: waiting until kubelet starts
Dec 04 12:30:53 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: activating
Dec 04 12:30:53 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: waiting until kubelet starts
Dec 04 12:31:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: activating
Dec 04 12:31:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: waiting until kubelet starts
Dec 04 12:31:13 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: activating
Dec 04 12:31:13 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: waiting until kubelet starts
Dec 04 12:31:23 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1087]: active
Dec 04 12:31:23 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1656]: active
Dec 04 12:31:23 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:31:33 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:31:43 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:31:53 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:32:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:32:13 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:32:23 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:32:33 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:32:43 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:32:53 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: waiting until apiserver starts
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: {
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "major": "1",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "minor": "8+",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "gitVersion": "v1.8.4+coreos.0",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "gitCommit": "4292f9682595afddbb4f8b1483673449c74f9619",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "gitTreeState": "clean",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "buildDate": "2017-11-21T17:22:25Z",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "goVersion": "go1.8.3",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "compiler": "gc",
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: "platform": "linux/amd64"
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal bash[1660]: }
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: NAME STATUS AGE
Dec 04 12:33:03 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: kube-system Active 8s
Dec 04 12:33:04 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: configmap "calico-config" created
Dec 04 12:33:04 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: secret "calico-etcd-secrets" created
Dec 04 12:33:04 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: daemonset "calico-node" created
Dec 04 12:33:04 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "calico-kube-controllers" created
Dec 04 12:33:05 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: secret "kubernetes-dashboard-certs" created
Dec 04 12:33:05 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: configmap "kube-dns" created
Dec 04 12:33:05 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: configmap "kube-proxy-config" created
Dec 04 12:33:06 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: serviceaccount "kube-dns" created
Dec 04 12:33:06 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: serviceaccount "heapster" created
Dec 04 12:33:06 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: serviceaccount "kube-proxy" created
Dec 04 12:33:06 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: serviceaccount "kubernetes-dashboard" created
Dec 04 12:33:06 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: serviceaccount "metrics-server" created
Dec 04 12:33:07 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "tiller-deploy" created
Dec 04 12:33:07 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: service "tiller-deploy" created
Dec 04 12:33:07 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: daemonset "dnsmasq-node" created
Dec 04 12:33:08 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "kube-dns" created
Dec 04 12:33:08 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "kube-dns-autoscaler" created
Dec 04 12:33:08 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "kubernetes-dashboard" created
Dec 04 12:33:08 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "cluster-autoscaler" created
Dec 04 12:33:08 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "heapster" created
Dec 04 12:33:08 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "metrics-server" created
Dec 04 12:33:09 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: daemonset "kube-proxy" created
Dec 04 12:33:10 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: service "kube-dns" created
Dec 04 12:33:10 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: service "heapster" created
Dec 04 12:33:10 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: service "kubernetes-dashboard" created
Dec 04 12:33:10 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: service "metrics-server" created
Dec 04 12:33:11 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: deployment "kube-rescheduler" created
Dec 04 12:33:11 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrole "kube-aws:node-extensions" created
Dec 04 12:33:11 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrole "system:metrics-server" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:admin" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:system-worker" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:node" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:node-proxier" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:node-extensions" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "heapster" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "metrics-server:system:auth-delegator" created
Dec 04 12:33:12 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "system:metrics-server" created
Dec 04 12:33:13 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kubernetes-dashboard" created
Dec 04 12:33:13 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: role "system:pod-nanny" created
Dec 04 12:33:13 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: role "kubernetes-dashboard-minimal" created
Dec 04 12:33:14 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: rolebinding "heapster-nanny" created
Dec 04 12:33:14 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: rolebinding "kubernetes-dashboard-minimal" created
Dec 04 12:33:14 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: rolebinding "metrics-server-auth-reader" created
Dec 04 12:33:15 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrole "kube-aws:node-bootstrapper" created
Dec 04 12:33:15 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrole "kube-aws:kubelet-certificate-bootstrap" created
Dec 04 12:33:15 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:node-bootstrapper" created
Dec 04 12:33:15 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube-aws:kubelet-certificate-bootstrap" created
Dec 04 12:33:16 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: serviceaccount "kube2iam" created
Dec 04 12:33:16 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrole "kube2iam" created
Dec 04 12:33:16 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: clusterrolebinding "kube2iam" created
Dec 04 12:33:17 ip-10-0-0-41.ap-northeast-1.compute.internal retry[2013]: daemonset "kube2iam" created
Dec 04 12:33:17 ip-10-0-0-41.ap-northeast-1.compute.internal systemd[1]: Started install-kube-system.service.
Also, we could translate controller.count
to a better setup.
Currently:
controller:
count: 1
is translated to:
controller:
autoScalingGroup:
minSize: 1
maxSize: `
rollingUpdateMinInstancesInService: 0
Ideally we should translate it to:
controller:
autoScalingGroup:
minSize: 1
maxSize: 2
rollingUpdateMinInstancesInService: 1
This way, we won't need to trigger this edge-case, at least for a rolling-update of single controller node.
After kubectl batching, I'm experiencing an another issue - the kube-proxy daemonset never create a kube-proxy pod on the controller node when I've terminated the node to instruct my cluster into a situation of 0 controller nodes. Of course it results in a cfn resource-signal timeout.
Also, kubectl get no
still shows an already terminated controller node, while kubectl get po
still shows controller-manager and apiserver pods had been running on the already terminated node.
ip-10-0-0-41
is the already terminated node:
core@ip-10-0-0-107 ~ $ k get po,no -o wide
NAME READY STATUS RESTARTS AGE IP NODE
po/calico-kube-controllers-7fb5479cf4-qszkp 1/1 Running 0 38m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/calico-node-8bjhn 1/1 Running 0 31m 10.0.0.97 ip-10-0-0-97.ap-northeast-1.compute.internal
po/calico-node-bkv7p 1/1 Running 0 31m 10.0.0.36 ip-10-0-0-36.ap-northeast-1.compute.internal
po/calico-node-hqjjn 1/1 Running 0 38m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/cluster-autoscaler-759f549dd5-2pts6 1/1 Running 6 38m 10.2.89.4 ip-10-0-0-41.ap-northeast-1.compute.internal
po/dnsmasq-node-jjqsf 2/2 Running 0 31m 10.0.0.97 ip-10-0-0-97.ap-northeast-1.compute.internal
po/dnsmasq-node-lfltv 2/2 Running 0 38m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/dnsmasq-node-mzkr2 2/2 Running 0 31m 10.0.0.36 ip-10-0-0-36.ap-northeast-1.compute.internal
po/heapster-5f9c4878c5-vhh9k 2/2 Running 0 30m 10.2.90.4 ip-10-0-0-97.ap-northeast-1.compute.internal
po/kube-apiserver-ip-10-0-0-107.ap-northeast-1.compute.internal 1/1 Running 0 25m 10.0.0.107 ip-10-0-0-107.ap-northeast-1.compute.internal
po/kube-apiserver-ip-10-0-0-196.ap-northeast-1.compute.internal 1/1 Running 0 13m 10.0.0.196 ip-10-0-0-196.ap-northeast-1.compute.internal
po/kube-apiserver-ip-10-0-0-41.ap-northeast-1.compute.internal 1/1 Running 0 37m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/kube-controller-manager-ip-10-0-0-107.ap-northeast-1.compute.internal 1/1 Running 5 25m 10.0.0.107 ip-10-0-0-107.ap-northeast-1.compute.internal
po/kube-controller-manager-ip-10-0-0-196.ap-northeast-1.compute.internal 1/1 Running 3 13m 10.0.0.196 ip-10-0-0-196.ap-northeast-1.compute.internal
po/kube-controller-manager-ip-10-0-0-41.ap-northeast-1.compute.internal 1/1 Running 0 37m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/kube-dns-7d654c9888-c29s8 3/3 Running 0 38m 10.2.90.2 ip-10-0-0-97.ap-northeast-1.compute.internal
po/kube-dns-7d654c9888-xj49n 3/3 Running 0 30m 10.2.82.4 ip-10-0-0-36.ap-northeast-1.compute.internal
po/kube-dns-autoscaler-665fb57848-qfrnt 1/1 Running 0 38m 10.2.90.3 ip-10-0-0-97.ap-northeast-1.compute.internal
po/kube-proxy-9f89g 1/1 Running 0 31m 10.0.0.97 ip-10-0-0-97.ap-northeast-1.compute.internal
po/kube-proxy-9nlw9 1/1 Running 0 38m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/kube-proxy-gbx96 1/1 Running 0 31m 10.0.0.36 ip-10-0-0-36.ap-northeast-1.compute.internal
po/kube-rescheduler-f55879654-v9x8l 1/1 Running 0 38m 10.0.0.97 ip-10-0-0-97.ap-northeast-1.compute.internal
po/kube-scheduler-ip-10-0-0-107.ap-northeast-1.compute.internal 1/1 Running 0 25m 10.0.0.107 ip-10-0-0-107.ap-northeast-1.compute.internal
po/kube-scheduler-ip-10-0-0-196.ap-northeast-1.compute.internal 1/1 Running 0 13m 10.0.0.196 ip-10-0-0-196.ap-northeast-1.compute.internal
po/kube-scheduler-ip-10-0-0-41.ap-northeast-1.compute.internal 1/1 Running 0 37m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/kube2iam-cs7r2 1/1 Running 0 31m 10.0.0.36 ip-10-0-0-36.ap-northeast-1.compute.internal
po/kube2iam-g79pl 1/1 Running 0 31m 10.0.0.97 ip-10-0-0-97.ap-northeast-1.compute.internal
po/kube2iam-tjp9z 1/1 Running 0 38m 10.0.0.41 ip-10-0-0-41.ap-northeast-1.compute.internal
po/kubernetes-dashboard-6d758d8ffb-8l68s 1/1 Running 0 38m 10.2.89.3 ip-10-0-0-41.ap-northeast-1.compute.internal
po/metrics-server-6bd7ddbc8-vmh96 1/1 Running 0 38m 10.2.82.3 ip-10-0-0-36.ap-northeast-1.compute.internal
po/tiller-deploy-7f475d4d4-bszgj 1/1 Running 0 38m 10.2.89.2 ip-10-0-0-41.ap-northeast-1.compute.internal
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
no/ip-10-0-0-107.ap-northeast-1.compute.internal Ready master 26m v1.8.4+coreos.0 13.230.81.21 Container Linux by CoreOS 1520.9.0 (Ladybug) 4.13.16-coreos-r1 docker://1.12.6
no/ip-10-0-0-196.ap-northeast-1.compute.internal Ready master 14m v1.8.4+coreos.0 13.230.87.111 Container Linux by CoreOS 1520.9.0 (Ladybug) 4.13.16-coreos-r1 docker://1.12.6
no/ip-10-0-0-36.ap-northeast-1.compute.internal Ready <none> 31m v1.8.4+coreos.0 13.230.157.61 Container Linux by CoreOS 1520.9.0 (Ladybug) 4.13.16-coreos-r1 docker://1.12.6
no/ip-10-0-0-41.ap-northeast-1.compute.internal Ready master 38m v1.8.4+coreos.0 13.112.9.125 Container Linux by CoreOS 1520.9.0 (Ladybug) 4.13.16-coreos-r1 docker://1.12.6
no/ip-10-0-0-97.ap-northeast-1.compute.internal Ready <none> 31m v1.8.4+coreos.0 54.65.107.102 Container Linux by CoreOS 1520.9.0 (Ladybug) 4.13.16-coreos-r1 docker://1.12.6
You might have seen that both controller-manager pods on live controller nodes in crash loopbacks.
Logs:
core@ip-10-0-0-107 ~ $ k logs po/kube-controller-manager-ip-10-0-0-107.ap-northeast-1.compute.internal
I1204 13:10:00.435437 1 controllermanager.go:109] Version: v1.8.4+coreos.0
I1204 13:10:00.436058 1 leaderelection.go:174] attempting to acquire leader lease...
I1204 13:11:40.005014 1 leaderelection.go:184] successfully acquired lease kube-system/kube-controller-manager
I1204 13:11:40.005232 1 event.go:218] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"kube-controller-manager", UID:"40ddfeb9-d8ef-11e7-b608-062fc19c420c", APIVersion:"v1", ResourceVersion:"4181", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' ip-10-0-0-107.ap-northeast-1.compute.internal became leader
E1204 13:12:40.028303 1 controllermanager.go:399] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server ("Error: 'dial tcp 10.3.0.67:443: i/o timeout'\nTrying to reach: 'https://10.3.0.67:443/apis/metrics.k8s.io/v1beta1'") has prevented the request from succeeding
I1204 13:12:40.028460 1 aws.go:847] Building AWS cloudprovider
I1204 13:12:40.028488 1 aws.go:810] Zone not specified in configuration file; querying AWS metadata service
I1204 13:12:40.271220 1 tags.go:76] AWS cloud filtering on ClusterID: k8s3
W1204 13:12:40.272143 1 controllermanager.go:471] "tokencleaner" is disabled
I1204 13:12:40.272355 1 controller_utils.go:1041] Waiting for caches to sync for tokens controller
I1204 13:12:40.272714 1 controllermanager.go:487] Started "persistentvolume-binder"
I1204 13:12:40.272986 1 pv_controller_base.go:259] Starting persistent volume controller
I1204 13:12:40.273023 1 controller_utils.go:1041] Waiting for caches to sync for persistent volume controller
I1204 13:12:40.273067 1 controllermanager.go:487] Started "endpoint"
I1204 13:12:40.273418 1 endpoints_controller.go:153] Starting endpoint controller
I1204 13:12:40.273463 1 controller_utils.go:1041] Waiting for caches to sync for endpoint controller
I1204 13:12:40.273481 1 resource_quota_controller.go:238] Starting resource quota controller
I1204 13:12:40.273518 1 controller_utils.go:1041] Waiting for caches to sync for resource quota controller
I1204 13:12:40.273469 1 controllermanager.go:487] Started "resourcequota"
I1204 13:12:40.372548 1 controller_utils.go:1048] Caches are synced for tokens controller
E1204 13:13:40.296029 1 namespaced_resources_deleter.go:169] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: an error on the server ("Error: 'dial tcp 10.3.0.67:443: i/o timeout'\nTrying to reach: 'https://10.3.0.67:443/apis/metrics.k8s.io/v1beta1'") has prevented the request from succeeding
I1204 13:13:40.296145 1 controllermanager.go:487] Started "namespace"
I1204 13:13:40.296456 1 namespace_controller.go:186] Starting namespace controller
I1204 13:13:40.296470 1 controller_utils.go:1041] Waiting for caches to sync for namespace controller
I1204 13:13:40.296504 1 controllermanager.go:487] Started "serviceaccount"
I1204 13:13:40.296676 1 serviceaccounts_controller.go:113] Starting service account controller
I1204 13:13:40.296698 1 controller_utils.go:1041] Waiting for caches to sync for service account controller
I1204 13:13:40.296871 1 controllermanager.go:487] Started "statefulset"
I1204 13:13:40.296968 1 stateful_set.go:146] Starting stateful set controller
I1204 13:13:40.296994 1 controller_utils.go:1041] Waiting for caches to sync for stateful set controller
E1204 13:14:10.306946 1 memcache.go:159] couldn't get resource list for metrics.k8s.io/v1beta1: an error on the server ("Error: 'dial tcp 10.3.0.67:443: i/o timeout'\nTrying to reach: 'https://10.3.0.67:443/apis/metrics.k8s.io/v1beta1'") has prevented the request from succeeding
Again 😢
Then, I'll proceed with the option 4 in my prev suggestion. Sigh!
I do have an endpoint for the metrics-server:
core@ip-10-0-0-107 ~ $ kubectl get endpoints metrics-server -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
creationTimestamp: 2017-12-04T12:33:10Z
labels:
kubernetes.io/name: Metrics-server
name: metrics-server
namespace: kube-system
resourceVersion: "1545"
selfLink: /api/v1/namespaces/kube-system/endpoints/metrics-server
uid: 49c1a942-d8ef-11e7-b608-062fc19c420c
subsets:
- addresses:
- ip: 10.2.82.3
nodeName: ip-10-0-0-36.ap-northeast-1.compute.internal
targetRef:
kind: Pod
name: metrics-server-6bd7ddbc8-vmh96
namespace: kube-system
resourceVersion: "1544"
uid: 48bdaaac-d8ef-11e7-b608-062fc19c420c
ports:
- port: 443
protocol: TCP
The node and the pod with the podIP are there:
core@ip-10-0-0-107 ~ $ kubectl --namespace kube-system get po -o wide | grep 10.2.82.3
metrics-server-6bd7ddbc8-vmh96 1/1 Running 0 54m 10.2.82.3 ip-10-0-0-36.ap-northeast-1.compute.internal
However, no service-to-pod iptables are there. Indeed this would be due to that kube-proxy isn't running on the node.
core@ip-10-0-0-107 ~ $ sudo iptables-save
# Generated by iptables-save v1.4.21 on Mon Dec 4 13:28:36 2017
*filter
:INPUT ACCEPT [199849:260146379]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [196255:83770839]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-FIREWALL - [0:0]
-A INPUT -j KUBE-FIREWALL
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
COMMIT
# Completed on Mon Dec 4 13:28:36 2017
# Generated by iptables-save v1.4.21 on Mon Dec 4 13:28:36 2017
*nat
:PREROUTING ACCEPT [12:694]
:INPUT ACCEPT [12:694]
:OUTPUT ACCEPT [309:21328]
:POSTROUTING ACCEPT [309:21328]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-POSTROUTING - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 10.2.0.0/16 -d 10.2.0.0/16 -j RETURN
-A POSTROUTING -s 10.2.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
-A POSTROUTING ! -s 10.2.0.0/16 -d 10.2.76.0/24 -j RETURN
-A POSTROUTING ! -s 10.2.0.0/16 -d 10.2.0.0/16 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
COMMIT
# Completed on Mon Dec 4 13:28:36 2017
Remember that kube-proxy pod isn't created by the kube-proxy daemonset due to controller-manager outage. A race-condition, again.
Again, once I had deleted the apiservice, the controller-manager on the node started successfully, kube-proxy pod created, iptables rules for the metrics-server service are created:
core@ip-10-0-0-107 ~ $ k get po
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7fb5479cf4-qszkp 1/1 Running 0 1h
calico-node-8bjhn 1/1 Running 0 52m
calico-node-bkv7p 1/1 Running 0 52m
calico-node-hqjjn 1/1 Running 0 1h
calico-node-rrwmx 0/1 ContainerCreating 0 16s
calico-node-v9qwg 0/1 ContainerCreating 0 16s
cluster-autoscaler-759f549dd5-2pts6 1/1 Running 6 1h
dnsmasq-node-f7mzs 0/2 ContainerCreating 0 16s
dnsmasq-node-jjqsf 2/2 Running 0 52m
dnsmasq-node-lfltv 2/2 Running 0 1h
dnsmasq-node-mzkr2 2/2 Running 0 52m
dnsmasq-node-wss9t 0/2 ContainerCreating 0 16s
heapster-76bbc65855-8pb75 2/2 Running 0 16s
kube-apiserver-ip-10-0-0-107.ap-northeast-1.compute.internal 1/1 Running 0 46m
kube-apiserver-ip-10-0-0-196.ap-northeast-1.compute.internal 1/1 Running 0 34m
kube-apiserver-ip-10-0-0-41.ap-northeast-1.compute.internal 1/1 Running 0 59m
kube-controller-manager-ip-10-0-0-107.ap-northeast-1.compute.internal 1/1 Running 7 46m
kube-controller-manager-ip-10-0-0-196.ap-northeast-1.compute.internal 0/1 CrashLoopBackOff 5 34m
kube-controller-manager-ip-10-0-0-41.ap-northeast-1.compute.internal 1/1 Running 0 59m
kube-dns-7d654c9888-c29s8 3/3 Running 0 1h
kube-dns-7d654c9888-xj49n 3/3 Running 0 51m
kube-dns-autoscaler-665fb57848-qfrnt 1/1 Running 0 1h
kube-proxy-9f89g 1/1 Running 0 52m
kube-proxy-9nlw9 1/1 Running 0 59m
kube-proxy-gbx96 1/1 Running 0 52m
kube-proxy-j89mj 1/1 Running 0 16s
kube-proxy-ztv29 1/1 Running 0 16s
kube-rescheduler-f55879654-v9x8l 1/1 Running 0 59m
kube-scheduler-ip-10-0-0-107.ap-northeast-1.compute.internal 1/1 Running 0 46m
kube-scheduler-ip-10-0-0-196.ap-northeast-1.compute.internal 1/1 Running 0 34m
kube-scheduler-ip-10-0-0-41.ap-northeast-1.compute.internal 1/1 Running 0 59m
kube2iam-cs7r2 1/1 Running 0 52m
kube2iam-g79pl 1/1 Running 0 52m
kube2iam-hkzj8 0/1 ContainerCreating 0 16s
kube2iam-tjp9z 1/1 Running 0 59m
kube2iam-wvgrm 0/1 ContainerCreating 0 16s
kubernetes-dashboard-6d758d8ffb-8l68s 1/1 Running 0 1h
metrics-server-6bd7ddbc8-vmh96 1/1 Running 0 1h
tiller-deploy-7f475d4d4-bszgj 1/1 Running 0 1h
core@ip-10-0-0-107 ~ $ sudo iptables-save | grep metrics-server
-A KUBE-SEP-WSNLDDAKKGPCZDEC -s 10.2.82.3/32 -m comment --comment "kube-system/metrics-server:" -j KUBE-MARK-MASQ
-A KUBE-SEP-WSNLDDAKKGPCZDEC -p tcp -m comment --comment "kube-system/metrics-server:" -m tcp -j DNAT --to-destination 10.2.82.3:443
-A KUBE-SERVICES ! -s 10.2.0.0/16 -d 10.3.0.67/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.3.0.67/32 -p tcp -m comment --comment "kube-system/metrics-server: cluster IP" -m tcp --dport 443 -j KUBE-SVC-LC5QY66VUV2HJ6WZ
-A KUBE-SVC-LC5QY66VUV2HJ6WZ -m comment --comment "kube-system/metrics-server:" -j KUBE-SEP-WSNLDDAKKGPCZDEC
Okay then, I'll delete the metrics-server apiservice when and only when iptables rules don't exist :trollface:
Update: I'd rather like to delete it when kube-proxy doesn't exist, so that I can avoid a tight coupling to a specific kube-proxy backend - iptables.
Probably after
kube-aws render
and thenkube-aws update
, my kube-aws cluster has fallen into the UPDATE_ROLLBACK_FAILED state.Logging-in and then surveying various points, I've verified:
docker ps -a
shows that thecontroller-manager
container is failingsudo cat /var/log/pods/**/*kube-controller-manager*.log
showed that it is failing due to a timeout while accessing the metrics server endpoint(?)Also,
docker logs $apiserver_container_id
shows:cc @camilb