kubernetes-retired / kube-aws

[EOL] A command-line tool to declaratively manage Kubernetes clusters on AWS
Apache License 2.0
1.12k stars 295 forks source link

upgrading 0.14 to 0.15: etcd migration failed; controller kubelet fails #1881

Closed flah00 closed 4 years ago

flah00 commented 4 years ago

TL;DR By manually editing the etcd.json.tmpl template I was able to work around one problem. But I have not been able to fix the controller related issue.

Etcd

The export-existing-etcd-state.service was failing, with the error

Failed to start Exports Kubernetes Values from a remote Etcd cluster

This was because ETCD_ENDPOINTS was configured to use private host names in /var/run/coreos/etcdadm-environment-migration. To work around this issue, I had to update stack-templates/etcd.json.tmpl

diff --git a/k9s-zoo/stack-templates/etcd.json.tmpl b/k9s-zoo/stack-templates/etcd.json.tmpl
index a34df2d..8c3dabc 100644
--- a/k9s-zoo/stack-templates/etcd.json.tmpl
+++ b/k9s-zoo/stack-templates/etcd.json.tmpl
@@ -411,9 +411,7 @@
               {{ if $.EtcdMigrationEnabled -}}
               "/var/run/coreos/etcdadm-environment-migration": {
                 "content": { "Fn::Join" : [ "", [
-                  "ETCD_ENDPOINTS='",
-                    "{{ $.EtcdMigrationExistingEndpoints }}",
-                  "'\n",
+                  "ETCD_ENDPOINTS='https://PUBLIC_HOST_1:2379,https://PUBLIC_HOST_2:2379,https://PUBLIC_HOST_3:2379'",
                   "AWS_DEFAULT_REGION='",
                     "{{$.Region}}",
                   "'\n",

Controller

After I make it beyond etcd, I'm confronted with a kubelet networking error, on the controllers.

Jul 03 18:55:52 HOST.ec2.internal sh[20858]: F0703 18:55:52.153512   20858 server.go:273] failed to run 
Kubelet: could not init cloud provider "aws": error finding instance i-006bdcc9632d50a6c: "error listing AWS instances: 
\"RequestError: send request failed\\ncaused by: Post https://ec2.us-east-1.amazonaws.com/: dial tcp: lookup ec2.us-east-1.amazonaws.com on [::1]:53: read udp [::1]:50109->[::1]:53: read: connection refused\""
Jul 03 18:55:52 HOST.ec2.internal systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
flah00 commented 4 years ago

The controller issues are all related to the aws-iam-auth plugin These issues were not present, when I updated the feature on the 0.14.x branch