scholzj / terraform-aws-minikube

Terraform module for single node Kubernetes instance bootstrapped using kubeadm
Apache License 2.0
65 stars 45 forks source link

Kubernetes don't completely start up after reboot #4

Closed mludvig closed 5 years ago

mludvig commented 5 years ago

Hi, thanks for providing this template! It works great right after deployment but k8s seems to be half-broken after a simple instance reboot. I tried a couple of deployments and it's perfectly reproducible.

After deployment (before reboot)

There are 20 containers running

root@ip-172-30-2-247 ~ # docker ps
CONTAINER ID     IMAGE                            COMMAND                  CREATED          STATUS           PORTS        NAMES
72772b50ff39     eb516548c180                     "/coredns -conf /e..."   4 minutes ago    Up 4 minutes                  k8s_coredns_coredns-fb8b8dccf-46ttk_kube-system_495e0b66-87f1-11e9-af26-0a1ee8c58078_0
74071e694ec0     eb516548c180                     "/coredns -conf /e..."   4 minutes ago    Up 4 minutes                  k8s_coredns_coredns-fb8b8dccf-b9mhj_kube-system_4963984e-87f1-11e9-af26-0a1ee8c58078_0
3ae3ff9e9c73     k8s.gcr.io/pause:3.1             "/pause"                 4 minutes ago    Up 4 minutes                  k8s_POD_coredns-fb8b8dccf-b9mhj_kube-system_4963984e-87f1-11e9-af26-0a1ee8c58078_43
df8b0eef7ec3     k8s.gcr.io/pause:3.1             "/pause"                 4 minutes ago    Up 4 minutes                  k8s_POD_coredns-fb8b8dccf-46ttk_kube-system_495e0b66-87f1-11e9-af26-0a1ee8c58078_41
623c4cc86d76     b4d7c4247c3a                     "start_runit"            4 minutes ago    Up 4 minutes                  k8s_calico-node_calico-node-8rckj_kube-system_4958e121-87f1-11e9-af26-0a1ee8c58078_2
8106f5622be7     0bd1f99c7034                     "/usr/bin/kube-con..."   4 minutes ago    Up 4 minutes                  k8s_calico-kube-controllers_calico-kube-controllers-8649d847c4-29xss_kube-system_495de6cf-87f1-11e9-af26-0a1ee8c5807
7d70242f82fa     quay.io/coreos/etcd@sha256:...   "/usr/local/bin/et..."   4 minutes ago    Up 4 minutes                  k8s_calico-etcd_calico-etcd-qzbks_kube-system_55247f87-87f1-11e9-af26-0a1ee8c58078_0
49475b87dfe6     k8s.gcr.io/pause:3.1             "/pause"                 4 minutes ago    Up 4 minutes                  k8s_POD_calico-etcd-qzbks_kube-system_55247f87-87f1-11e9-af26-0a1ee8c58078_0
ceb5a5d082bb     k8s.gcr.io/pause:3.1             "/pause"                 4 minutes ago    Up 4 minutes                  k8s_POD_calico-kube-controllers-8649d847c4-29xss_kube-system_495de6cf-87f1-11e9-af26-0a1ee8c58078_0
9529052099c3     20a2d7035165                     "/usr/local/bin/ku..."   5 minutes ago    Up 5 minutes                  k8s_kube-proxy_kube-proxy-fx548_kube-system_4958d277-87f1-11e9-af26-0a1ee8c58078_0
aea199144e04     k8s.gcr.io/pause:3.1             "/pause"                 5 minutes ago    Up 5 minutes                  k8s_POD_calico-node-8rckj_kube-system_4958e121-87f1-11e9-af26-0a1ee8c58078_0
e9f5fe880890     k8s.gcr.io/pause:3.1             "/pause"                 5 minutes ago    Up 5 minutes                  k8s_POD_kube-proxy-fx548_kube-system_4958d277-87f1-11e9-af26-0a1ee8c58078_0
a40f97c206e8     2c4adeb21b4f                     "etcd --advertise-..."   5 minutes ago    Up 5 minutes                  k8s_etcd_etcd-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_cd3d6cd87a522a8d47f9f84a29a21085_0
f07f1230fcbe     8931473d5bdb                     "kube-scheduler --..."   5 minutes ago    Up 5 minutes                  k8s_kube-scheduler_kube-scheduler-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_f44110a0ca540009109bfc
a8bb045ae863     cfaa4ad74c37                     "kube-apiserver --..."   5 minutes ago    Up 5 minutes                  k8s_kube-apiserver_kube-apiserver-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_5d623fb4138e843edbe51b8557fa2cac2c     efb3887b411d                     "kube-controller-m..."   5 minutes ago    Up 5 minutes                  k8s_kube-controller-manager_kube-controller-manager-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_4d4b
cec3e482bb98     k8s.gcr.io/pause:3.1             "/pause"                 5 minutes ago    Up 5 minutes                  k8s_POD_kube-controller-manager-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_4d4b59c11383339b1dbc695725a629730365     k8s.gcr.io/pause:3.1             "/pause"                 5 minutes ago    Up 5 minutes                  k8s_POD_kube-scheduler-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_f44110a0ca540009109bfc32a7eb0baa_
25618cb3d3b5     k8s.gcr.io/pause:3.1             "/pause"                 5 minutes ago    Up 5 minutes                  k8s_POD_etcd-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_cd3d6cd87a522a8d47f9f84a29a21085_0
a45912a2e9b0     k8s.gcr.io/pause:3.1             "/pause"                 5 minutes ago    Up 5 minutes                  k8s_POD_kube-apiserver-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_5d623fb4138e843edbe51bb363cb7fdc_

And kubectl works:

root@ip-172-30-2-247 ~ # kubectl --kubeconfig /etc/kubernetes/admin.conf get all --all-namespaces
NAMESPACE     NAME                                                                          READY   STATUS    RESTARTS   AGE
kube-system   pod/calico-etcd-qzbks                                                         1/1     Running   0          2m30s
kube-system   pod/calico-kube-controllers-8649d847c4-29xss                                  1/1     Running   1          2m50s
kube-system   pod/calico-node-8rckj                                                         1/1     Running   2          2m50s
kube-system   pod/coredns-fb8b8dccf-46ttk                                                   1/1     Running   0          2m50s
kube-system   pod/coredns-fb8b8dccf-b9mhj                                                   1/1     Running   0          2m50s
kube-system   pod/etcd-ip-172-30-2-247.ap-southeast-2.compute.internal                      1/1     Running   0          117s
kube-system   pod/kube-apiserver-ip-172-30-2-247.ap-southeast-2.compute.internal            1/1     Running   0          101s
kube-system   pod/kube-controller-manager-ip-172-30-2-247.ap-southeast-2.compute.internal   1/1     Running   0          2m1s
kube-system   pod/kube-proxy-fx548                                                          1/1     Running   0          2m50s
kube-system   pod/kube-scheduler-ip-172-30-2-247.ap-southeast-2.compute.internal            1/1     Running   0          106s

NAMESPACE     NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes    ClusterIP   10.96.0.1       <none>        443/TCP                  2m57s
kube-system   service/calico-etcd   ClusterIP   10.96.232.136   <none>        6666/TCP                 2m55s
kube-system   service/kube-dns      ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   2m56s

NAMESPACE     NAME                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
kube-system   daemonset.apps/calico-etcd   1         1         1       1            1           <none>                        2m55s
kube-system   daemonset.apps/calico-node   1         1         1       1            1           beta.kubernetes.io/os=linux   2m55s
kube-system   daemonset.apps/kube-proxy    1         1         1       1            1           <none>                        2m55s

NAMESPACE     NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/calico-kube-controllers   1/1     1            1           2m55s
kube-system   deployment.apps/coredns                   2/2     2            2           2m56s

NAMESPACE     NAME                                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/calico-kube-controllers-8649d847c4   1         1         1       2m50s
kube-system   replicaset.apps/coredns-fb8b8dccf                    2         2         2       2m50s

After reboot

Only 6 containers come up:

root@ip-172-30-2-247 ~ # docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
ae9eadd98658        cfaa4ad74c37           "kube-apiserver --..."   19 seconds ago      Up 19 seconds                           k8s_kube-apiserver_kube-apiserver-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_5d623fb4138e843edbe51bb363cb7fdc_7
5fa7ba84ad9e        efb3887b411d           "kube-controller-m..."   13 minutes ago      Up 13 minutes                           k8s_kube-controller-manager_kube-controller-manager-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_4d4b59c11383339b1dbc6957db2b1aac_1
ef7970cc7a31        8931473d5bdb           "kube-scheduler --..."   13 minutes ago      Up 13 minutes                           k8s_kube-scheduler_kube-scheduler-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_f44110a0ca540009109bfc32a7eb0baa_1
517eb8dc2228        k8s.gcr.io/pause:3.1   "/pause"                 13 minutes ago      Up 13 minutes                           k8s_POD_kube-controller-manager-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_4d4b59c11383339b1dbc6957db2b1aac_1
e321d41d616f        k8s.gcr.io/pause:3.1   "/pause"                 13 minutes ago      Up 13 minutes                           k8s_POD_kube-apiserver-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_5d623fb4138e843edbe51bb363cb7fdc_1
76d753980f61        k8s.gcr.io/pause:3.1   "/pause"                 13 minutes ago      Up 13 minutes                           k8s_POD_kube-scheduler-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_f44110a0ca540009109bfc32a7eb0baa_1
009531cebfa0        k8s.gcr.io/pause:3.1   "/pause"                 13 minutes ago      Up 13 minutes                           k8s_POD_etcd-ip-172-30-2-247.ap-southeast-2.compute.internal_kube-system_cd3d6cd87a522a8d47f9f84a29a21085_1

And kubectl doesn't work:

root@ip-172-30-2-247 ~ # kubectl --kubeconfig /etc/kubernetes/admin.conf get all
The connection to the server 172.30.2.247:6443 was refused - did you specify the right host or port?

I gave it more than enough time to come up but still no go. It looks like the issue is with the kube-apiserver that keeps starting and failing over and over again.

Unfortunately I'm not a big kubernetes expert so I don't know where to look to fix it.

Any chance you could have a look at it?

Thanks!

Michael

mludvig commented 5 years ago

Apparently it's caused by SELinux not being properly disabled in the init-aws-minikube.sh script.

Fixed in #5 (Properly disable SELinux)

scholzj commented 5 years ago

5 has been merged, so this should be resolved now.