hobby-kube / guide

Kubernetes clusters for the hobbyist.
MIT License
5.6k stars 259 forks source link

Kube-DNS do not start #43

Closed ticruz38 closed 6 years ago

ticruz38 commented 6 years ago

With scaleway provider, using VC1M instance, I couldn't ssh to my server, seems like the problem is related to the option enable local boot which is disabled by default when using terraform. I could terraform the machines till the end of the process without errors but the kube-dns pod remains pending on containerCreating state. From my research it could be related to a bug with weave-net, I tried to move to kube-flannel but I need to re-init the master node with kubeadmin which raise other problems.

kubectl get pods --all-namespaces

NAMESPACE     NAME                            READY     STATUS              RESTARTS   AGE
kube-system   kube-apiserver-kube1            1/1       Running             0          16m
kube-system   kube-controller-manager-kube1   1/1       Running             0          17m
kube-system   kube-dns-86f4d74b45-7f8j2       0/3       ContainerCreating   0          17m
kube-system   kube-proxy-gvn78                1/1       Running             0          17m
kube-system   kube-proxy-p8k7z                1/1       Running             0          17m
kube-system   kube-proxy-wtw2p                1/1       Running             0          17m
kube-system   kube-scheduler-kube1            1/1       Running             0          16m
kube-system   weave-net-fwx9c                 2/2       Running             1          17m
kube-system   weave-net-hjcpp                 2/2       Running             0          17m
kube-system   weave-net-vcn74                 2/2       Running             0          17m

kubectl describe pods kube-dns -n kube-system

Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        19m (x4 over 19m)  default-scheduler  0/1 nodes are available: 1 node(s) were not ready.
  Warning  FailedScheduling        19m (x2 over 19m)  default-scheduler  0/2 nodes are available: 2 node(s) were not ready.
  Warning  FailedScheduling        19m (x3 over 19m)  default-scheduler  0/3 nodes are available: 3 node(s) were not ready.
  Normal   Scheduled               18m                default-scheduler  Successfully assigned kube-dns-86f4d74b45-7f8j2 to kube3
  Normal   SuccessfulMountVolume   18m                kubelet, kube3     MountVolume.SetUp succeeded for volume "kube-dns-config"
  Normal   SuccessfulMountVolume   18m                kubelet, kube3     MountVolume.SetUp succeeded for volume "kube-dns-token-t47ql"
  Warning  FailedCreatePodSandBox  2m (x4 over 14m)   kubelet, kube3     Failed create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          2m (x4 over 14m)   kubelet, kube3     Pod sandbox changed, it will be killed and re-created.
pstadler commented 6 years ago

Thanks for the detailed report. I'm pretty sure that the recent changes in the Scaleway infrastructure broke things. This has happened before.

Please let me know what you changed in the Terraform setup in order to successfully launch the cluster. It could be that some necessary kernel modules are missing. This usually depends on the bootscript.

ticruz38 commented 6 years ago

I commented out the provider module, and started my 3 kubes by hand on scaleway. I installed the packages needed on my machines and wrote down IPS to the terraform modules that needed it. I then installed the required bootscript "4.10..." on each machine with 'local boot' enabled and restarted. All the terraform step went fine, but the kube-dns is not starting.

Basically I deleted the provider module and did that step by hand, with all the same settings that the provider module provide. I will try again, but I was'nt able to ssh until "local boot" was enabled (disable by default with terraform...)

ticruz38 commented 6 years ago

Seems like my nodes can't talk to each other, I looked in Kubelet logs in kube1, and here what it says.

journalctl | grep kubelet

May 05 11:00:51 kube1 kubelet[1245]: E0505 11:00:51.043254    1245 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=150881, ErrCode=NO_ERROR, debug=""
May 05 11:00:51 kube1 kubelet[1245]: E0505 11:00:51.043891    1245 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=150881, ErrCode=NO_ERROR, debug=""
May 05 11:00:51 kube1 kubelet[1245]: E0505 11:00:51.044549    1245 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=150881, ErrCode=NO_ERROR, debug=""
May 05 11:00:51 kube1 kubelet[1245]: E0505 11:00:51.044691    1245 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to watch *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&resourceVersion=525&timeoutSeconds=448&watch=true: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:51 kube1 kubelet[1245]: E0505 11:00:51.045064    1245 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to watch *v1.Service: Get https://10.0.1.1:6443/api/v1/services?resourceVersion=166&timeoutSeconds=525&watch=true: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:51 kube1 kubelet[1245]: E0505 11:00:51.046810    1245 reflector.go:322] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to watch *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&resourceVersion=395029&timeoutSeconds=450&watch=true: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: E0505 11:00:52.014997    1245 event.go:209] Unable to write event: 'Post https://10.0.1.1:6443/api/v1/namespaces/kube-system/events: dial tcp 10.0.1.1:6443:getsockopt: connection refused' (may retry after sleeping)
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.042292    1245 pod_container_deletor.go:77] Container "8f9f815692df3e68039a765842465c43eb8cf1bc41debe7436fe442403654f56" not found in pod's containers
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.042395    1245 status_manager.go:461] Failed to get status for pod "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-kube1: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: E0505 11:00:52.047362    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: E0505 11:00:52.049781    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: E0505 11:00:52.051870    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.075301    1245 pod_container_deletor.go:77] Container "f747da351b4a410f2a7fc51f8df3a95d69d6c263feb29a1d50afc37b00151d78" not found in pod's containers
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.076670    1245 status_manager.go:461] Failed to get status for pod "kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-kube1: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.091459    1245 pod_container_deletor.go:77] Container "889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c" not found in pod's containers
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.092865    1245 status_manager.go:461] Failed to get status for pod "kube-scheduler-kube1_kube-system(aa8d5cab3ea096315de0c2003230d4f9)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-kube1: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.131449    1245 pod_container_deletor.go:77] Container "0ba6272e745998035370dd765c36c260cada0ba37b594f49dd1b3c7090a8282e" not found in pod's containers
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.133930    1245 status_manager.go:461] Failed to get status for pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/weave-net-fwx9c: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:52 kube1 kubelet[1245]: W0505 11:00:52.155379    1245 status_manager.go:461] Failed to get status for pod "kube-proxy-p8k7z_kube-system(af95bb4d-4d85-11e8-8d44-de19481fc00d)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-proxy-p8k7z: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.061130    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.061130    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.070726    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.085946    1245 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-controller-manager-kube1": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449/start: EOF
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.086767    1245 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-controller-manager-kube1": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449/start: EOF
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.086904    1245 kuberuntime_manager.go:646] createPodSandbox for pod "kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-controller-manager-kube1": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449/start: EOF
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.087337    1245 pod_workers.go:186] Error syncing pod 8bd7e96fd69257327447632a692e29b7 ("kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)"), skipping: failed to "CreatePodSandbox" for "kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-controller-manager-kube1\": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449/start: EOF"
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.087685    1245 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-kube1": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/cb8d46bc17b4165a7f44fbbcc25d677ee5dde407d1cb78473d572fde29859e23/start: EOF
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.087805    1245 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-kube1": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/cb8d46bc17b4165a7f44fbbcc25d677ee5dde407d1cb78473d572fde29859e23/start: EOF
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.087886    1245 kuberuntime_manager.go:646] createPodSandbox for pod "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-apiserver-kube1": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/cb8d46bc17b4165a7f44fbbcc25d677ee5dde407d1cb78473d572fde29859e23/start: EOF
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.088743    1245 pod_workers.go:186] Error syncing pod 80922f536b042385a52a7614bad79832 ("kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)"), skipping: failed to "CreatePodSandbox" for "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-apiserver-kube1\": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/cb8d46bc17b4165a7f44fbbcc25d677ee5dde407d1cb78473d572fde29859e23/start: EOF"
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.097322    1245 docker_sandbox.go:236] Failed to stop sandbox "889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c": error duringconnect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c/stop?t=10: read unix @->/var/run/docker.sock: read: connection reset by peer
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.097693    1245 remote_runtime.go:115] StopPodSandbox "889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c" from runtime service failed: rpc error: code = Unknown desc = error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c/stop?t=10:read unix @->/var/run/docker.sock: read: connection reset by peer
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.097781    1245 kuberuntime_manager.go:799] Failed to stop sandbox {"docker" "889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c"}
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.097889    1245 kuberuntime_manager.go:594] killPodWithSyncResult failed: failed to "KillPodSandbox" for "aa8d5cab3ea096315de0c2003230d4f9" with KillPodSandboxError: "rpc error: code = Unknown desc = error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c/stop?t=10: read unix @->/var/run/docker.sock: read: connection reset by peer"
May 05 11:00:53 kube1 kubelet[1245]: E0505 11:00:53.097955    1245 pod_workers.go:186] Error syncing pod aa8d5cab3ea096315de0c2003230d4f9 ("kube-scheduler-kube1_kube-system(aa8d5cab3ea096315de0c2003230d4f9)"), skipping: failed to "KillPodSandbox" for "aa8d5cab3ea096315de0c2003230d4f9" with KillPodSandboxError: "rpc error: code = Unknown desc = error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/889811647e63306ccd4de00617487a55d17f7e54ee14fd6665045dc91124fe3c/stop?t=10: read unix @->/var/run/docker.sock: read: connection reset by peer"
May 05 11:00:54 kube1 kubelet[1245]: E0505 11:00:54.063636    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:54 kube1 kubelet[1245]: E0505 11:00:54.070569    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:54 kube1 kubelet[1245]: E0505 11:00:54.072718    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: E0505 11:00:55.066993    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: E0505 11:00:55.074480    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: E0505 11:00:55.076027    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.142981    1245 status_manager.go:461] Failed to get status for pod "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-kube1: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.144513    1245 status_manager.go:461] Failed to get status for pod "kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-kube1: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.145160    1245 status_manager.go:461] Failed to get status for pod "kube-scheduler-kube1_kube-system(aa8d5cab3ea096315de0c2003230d4f9)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-kube1: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.145673    1245 status_manager.go:461] Failed to get status for pod "kube-proxy-p8k7z_kube-system(af95bb4d-4d85-11e8-8d44-de19481fc00d)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-proxy-p8k7z: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.146477    1245 status_manager.go:461] Failed to get status for pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/weave-net-fwx9c: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.495017    1245 pod_container_deletor.go:77] Container "cb8d46bc17b4165a7f44fbbcc25d677ee5dde407d1cb78473d572fde29859e23" not found in pod's containers
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.501560    1245 pod_container_deletor.go:77] Container "fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449" not found in pod's containers
May 05 11:00:55 kube1 kubelet[1245]: W0505 11:00:55.556766    1245 pod_container_deletor.go:77] Container "a81bb7dbcb04d1390b86d9a60103a776fdc49f40031e975726ef6f86a222224d" not found in pod's containers
May 05 11:00:56 kube1 kubelet[1245]: E0505 11:00:56.068331    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:56 kube1 kubelet[1245]: E0505 11:00:56.077448    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:56 kube1 kubelet[1245]: E0505 11:00:56.078696    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:56 kube1 kubelet[1245]: I0505 11:00:56.776978    1245 kuberuntime_manager.go:757] checking backoff for container "weave" in pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:00:56 kube1 kubelet[1245]: I0505 11:00:56.990747    1245 kuberuntime_manager.go:757] checking backoff for container "kube-proxy" in pod "kube-proxy-p8k7z_kube-system(af95bb4d-4d85-11e8-8d44-de19481fc00d)"
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.069434    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.078867    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.084450    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.525593    1245 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "kube1": Get https://10.0.1.1:6443/api/v1/nodes/kube1?resourceVersion=0&timeout=10s: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.529984    1245 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "kube1": Get https://10.0.1.1:6443/api/v1/nodes/kube1?timeout=10s: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.532076    1245 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "kube1": Get https://10.0.1.1:6443/api/v1/nodes/kube1?timeout=10s: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.535205    1245 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "kube1": Get https://10.0.1.1:6443/api/v1/nodes/kube1?timeout=10s: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.538258    1245 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "kube1": Get https://10.0.1.1:6443/api/v1/nodes/kube1?timeout=10s: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:57 kube1 kubelet[1245]: E0505 11:00:57.538322    1245 kubelet_node_status.go:366] Unable to update node status: update node status exceeds retry count
May 05 11:00:57 kube1 kubelet[1245]: I0505 11:00:57.715879    1245 kuberuntime_manager.go:757] checking backoff for container "kube-scheduler" in pod "kube-scheduler-kube1_kube-system(aa8d5cab3ea096315de0c2003230d4f9)"
May 05 11:00:57 kube1 kubelet[1245]: I0505 11:00:57.768947    1245 kuberuntime_manager.go:757] checking backoff for container "kube-apiserver" in pod "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)"
May 05 11:00:57 kube1 kubelet[1245]: I0505 11:00:57.971560    1245 kuberuntime_manager.go:757] checking backoff for container "kube-controller-manager" in pod "kube-controller-manager-kube1_kube-system(8bd7e96fd69257327447632a692e29b7)"
May 05 11:00:58 kube1 kubelet[1245]: I0505 11:00:58.026476    1245 kuberuntime_manager.go:757] checking backoff for container "weave-npc" in pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:00:58 kube1 kubelet[1245]: E0505 11:00:58.071339    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:58 kube1 kubelet[1245]: E0505 11:00:58.081543    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:58 kube1 kubelet[1245]: E0505 11:00:58.086712    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:58 kube1 kubelet[1245]: W0505 11:00:58.204996    1245 pod_container_deletor.go:77] Container "53dd44cdf00e3d15c6feb427f570ed13bf742d9f0949617b4a5f32dd4f412f74" not found in pod's containers
May 05 11:00:59 kube1 kubelet[1245]: E0505 11:00:59.073402    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:59 kube1 kubelet[1245]: E0505 11:00:59.086100    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:59 kube1 kubelet[1245]: E0505 11:00:59.088890    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:00:59 kube1 kubelet[1245]: W0505 11:00:59.095397    1245 status_manager.go:461] Failed to get status for pod "kube-proxy-p8k7z_kube-system(af95bb4d-4d85-11e8-8d44-de19481fc00d)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-proxy-p8k7z: dial tcp 10.0.1.1:6443: getsockopt: connection refused
May 05 11:01:00 kube1 kubelet[1245]: I0505 11:01:00.727576    1245 kuberuntime_manager.go:513] Container {Name:weave Image:weaveworks/weave-kube:2.3.0 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:weave-net-token-7gfd8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
May 05 11:01:00 kube1 kubelet[1245]: I0505 11:01:00.728384    1245 kuberuntime_manager.go:757] checking backoff for container "weave" in pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:00 kube1 kubelet[1245]: I0505 11:01:00.729162    1245 kuberuntime_manager.go:767] Back-off 10s restarting failed container=weave pod=weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)
May 05 11:01:00 kube1 kubelet[1245]: E0505 11:01:00.729331    1245 pod_workers.go:186] Error syncing pod af967c7b-4d85-11e8-8d44-de19481fc00d ("weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 10s restarting failed container=weave pod=weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:01 kube1 kubelet[1245]: I0505 11:01:01.906897    1245 kuberuntime_manager.go:513] Container {Name:weave Image:weaveworks/weave-kube:2.3.0 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:weave-net-token-7gfd8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
May 05 11:01:01 kube1 kubelet[1245]: I0505 11:01:01.907356    1245 kuberuntime_manager.go:757] checking backoff for container "weave" in pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:01 kube1 kubelet[1245]: I0505 11:01:01.907641    1245 kuberuntime_manager.go:767] Back-off 10s restarting failed container=weave pod=weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)
May 05 11:01:01 kube1 kubelet[1245]: E0505 11:01:01.907715    1245 pod_workers.go:186] Error syncing pod af967c7b-4d85-11e8-8d44-de19481fc00d ("weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 10s restarting failed container=weave pod=weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:02 kube1 kubelet[1245]: I0505 11:01:02.919771    1245 kuberuntime_manager.go:513] Container {Name:weave Image:weaveworks/weave-kube:2.3.0 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:weave-net-token-7gfd8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
May 05 11:01:02 kube1 kubelet[1245]: I0505 11:01:02.923928    1245 kuberuntime_manager.go:757] checking backoff for container "weave" in pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:02 kube1 kubelet[1245]: I0505 11:01:02.924461    1245 kuberuntime_manager.go:767] Back-off 10s restarting failed container=weave pod=weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)
May 05 11:01:02 kube1 kubelet[1245]: E0505 11:01:02.924546    1245 pod_workers.go:186] Error syncing pod af967c7b-4d85-11e8-8d44-de19481fc00d ("weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 10s restarting failed container=weave pod=weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:09 kube1 kubelet[1245]: E0505 11:01:09.262444    1245 event.go:209] Unable to write event: 'Post https://10.0.1.1:6443/api/v1/namespaces/kube-system/events: net/http: TLS handshaketimeout' (may retry after sleeping)
May 05 11:01:09 kube1 kubelet[1245]: W0505 11:01:09.265184    1245 status_manager.go:461] Failed to get status for pod "kube-apiserver-kube1_kube-system(80922f536b042385a52a7614bad79832)": Get https://10.0.1.1:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-kube1: net/http: TLS handshake timeout
May 05 11:01:10 kube1 kubelet[1245]: E0505 11:01:10.074928    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://10.0.1.1:6443/api/v1/services?limit=500&resourceVersion=0: net/http: TLS handshake timeout
May 05 11:01:10 kube1 kubelet[1245]: E0505 11:01:10.089788    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://10.0.1.1:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkube1&limit=500&resourceVersion=0: net/http: TLS handshake timeout
May 05 11:01:10 kube1 kubelet[1245]: E0505 11:01:10.092126    1245 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://10.0.1.1:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkube1&limit=500&resourceVersion=0: net/http: TLS handshake timeout
May 05 11:01:11 kube1 kubelet[1245]: E0505 11:01:11.850744    1245 remote_runtime.go:132] RemovePodSandbox "fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: Driver aufs failed to remove root filesystem fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449: rename /var/lib/docker/aufs/mnt/8e9f5f9f146453cdc98ea50567d2991be45164c033c2e4109866e8479aeb07e6 /var/lib/docker/aufs/mnt/8e9f5f9f146453cdc98ea50567d2991be45164c033c2e4109866e8479aeb07e6-removing: device orresource busy
May 05 11:01:11 kube1 kubelet[1245]: E0505 11:01:11.850835    1245 kuberuntime_gc.go:157] Failed to remove sandbox "fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449": rpc error:code = Unknown desc = Error response from daemon: Driver aufs failed to remove root filesystem fcd7b3af5fc14f3cffce724a330c644e188fe0e7b6f045788052af681b6e8449: rename /var/lib/docker/aufs/mnt/8e9f5f9f146453cdc98ea50567d2991be45164c033c2e4109866e8479aeb07e6 /var/lib/docker/aufs/mnt/8e9f5f9f146453cdc98ea50567d2991be45164c033c2e4109866e8479aeb07e6-removing: device or resource busy
May 05 11:01:17 kube1 kubelet[1245]: I0505 11:01:17.449102    1245 kuberuntime_manager.go:513] Container {Name:weave Image:weaveworks/weave-kube:2.3.0 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:weave-net-token-7gfd8 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
May 05 11:01:17 kube1 kubelet[1245]: I0505 11:01:17.449539    1245 kuberuntime_manager.go:757] checking backoff for container "weave" in pod "weave-net-fwx9c_kube-system(af967c7b-4d85-11e8-8d44-de19481fc00d)"
May 05 11:01:17 kube1 kubelet[1245]: E0505 11:01:17.540771    1245 kubelet_node_status.go:374] Error updating node status, will retry: error getting node "kube1": Get https://10.0.1.1:6443/api/v1/nodes/kube1?resourceVersion=0&timeout=10s: net/http: TLS handshake timeout

These are the logs from docker container kube-api

Flag --admission-control has been deprecated, Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I0505 11:00:59.138961       1 server.go:135] Version: v1.10.2
I0505 11:00:59.139473       1 server.go:724] external host was not specified, using 10.0.1.1
I0505 11:01:01.842944       1 plugins.go:149] Loaded 9 admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota.
I0505 11:01:01.848014       1 plugins.go:149] Loaded 9 admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota.
I0505 11:01:01.864092       1 master.go:228] Using reconciler: master-count
W0505 11:01:02.261294       1 genericapiserver.go:342] Skipping API batch/v2alpha1 because it has no resources.
W0505 11:01:02.305296       1 genericapiserver.go:342] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
W0505 11:01:02.310691       1 genericapiserver.go:342] Skipping API storage.k8s.io/v1alpha1 because it has no resources.
W0505 11:01:02.350525       1 genericapiserver.go:342] Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources.
[restful] 2018/05/05 11:01:02 log.go:33: [restful/swagger] listing is available at https://10.0.1.1:6443/swaggerapi
[restful] 2018/05/05 11:01:02 log.go:33: [restful/swagger] https://10.0.1.1:6443/swaggerui/ is mapped to folder /swagger-ui/
[restful] 2018/05/05 11:01:07 log.go:33: [restful/swagger] listing is available at https://10.0.1.1:6443/swaggerapi
[restful] 2018/05/05 11:01:07 log.go:33: [restful/swagger] https://10.0.1.1:6443/swaggerui/ is mapped to folder /swagger-ui/
I0505 11:01:07.298250       1 plugins.go:149] Loaded 9 admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota.
I0505 11:01:17.424548       1 serve.go:96] Serving securely on [::]:6443
I0505 11:01:17.429031       1 apiservice_controller.go:90] Starting APIServiceRegistrationController
I0505 11:01:17.429088       1 cache.go:32] Waiting for caches to sync for APIServiceRegistrationController controller
I0505 11:01:17.429176       1 controller.go:84] Starting OpenAPI AggregationController
I0505 11:01:17.429411       1 crd_finalizer.go:242] Starting CRDFinalizer
I0505 11:01:17.430183       1 available_controller.go:262] Starting AvailableConditionController
I0505 11:01:17.430244       1 cache.go:32] Waiting for caches to sync for AvailableConditionController controller
I0505 11:01:17.438038       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35020: EOF
I0505 11:01:17.457376       1 crdregistration_controller.go:110] Starting crd-autoregister controller
I0505 11:01:17.457482       1 controller_utils.go:1019] Waiting for caches to sync for crd-autoregister controller
I0505 11:01:17.457568       1 customresource_discovery_controller.go:174] Starting DiscoveryController
I0505 11:01:17.457640       1 naming_controller.go:276] Starting NamingConditionController
I0505 11:01:17.481519       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35024: EOF
I0505 11:01:17.494056       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35026: EOF
I0505 11:01:17.498166       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35022: EOF
I0505 11:01:17.506897       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35028: EOF
I0505 11:01:17.532653       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35032: EOF
I0505 11:01:17.545789       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35040: EOF
I0505 11:01:17.558155       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35030: EOF
I0505 11:01:17.582805       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35046: EOF
I0505 11:01:17.593088       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35042: EOF
I0505 11:01:17.615250       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35048: EOF
I0505 11:01:17.624459       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35044: EOF
I0505 11:01:17.639982       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35050: EOF
I0505 11:01:17.640306       1 cache.go:39] Caches are synced for AvailableConditionController controller
I0505 11:01:17.642796       1 cache.go:39] Caches are synced for APIServiceRegistrationController controller
I0505 11:01:17.644389       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35052: EOF
I0505 11:01:17.663229       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35054: EOF
I0505 11:01:17.669734       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35056: EOF
I0505 11:01:17.682057       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35058: EOF
I0505 11:01:17.685445       1 controller_utils.go:1026] Caches are synced for crd-autoregister controller
I0505 11:01:17.685926       1 autoregister_controller.go:136] Starting autoregister controller
I0505 11:01:17.685962       1 cache.go:32] Waiting for caches to sync for autoregister controller
I0505 11:01:17.694511       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35060: EOF
I0505 11:01:17.707268       1 logs.go:49] http: TLS handshake error from 10.0.1.2:37578: EOF
I0505 11:01:17.721690       1 logs.go:49] http: TLS handshake error from 10.0.1.2:37580: EOF
I0505 11:01:17.730829       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35062: EOF
I0505 11:01:17.743168       1 logs.go:49] http: TLS handshake error from 10.0.1.2:51606: EOF
I0505 11:01:17.755716       1 logs.go:49] http: TLS handshake error from 10.0.1.3:53286: EOF
I0505 11:01:17.768826       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35064: EOF
I0505 11:01:17.791726       1 logs.go:49] http: TLS handshake error from 10.0.1.2:51608: EOF
I0505 11:01:17.804902       1 logs.go:49] http: TLS handshake error from 10.0.1.2:37586: EOF
I0505 11:01:17.817427       1 logs.go:49] http: TLS handshake error from 10.0.1.2:51612: EOF
I0505 11:01:17.829706       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38854: EOF
I0505 11:01:17.834221       1 cache.go:39] Caches are synced for autoregister controller
I0505 11:01:17.842946       1 logs.go:49] http: TLS handshake error from 10.0.1.3:53288: EOF
I0505 11:01:17.855274       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38856: EOF
I0505 11:01:17.884006       1 logs.go:49] http: TLS handshake error from 10.0.1.3:53294: EOF
I0505 11:01:17.896749       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38860: EOF
I0505 11:01:17.909778       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38862: EOF
I0505 11:01:17.936682       1 logs.go:49] http: TLS handshake error from 10.0.1.2:51614: EOF
I0505 11:01:17.958016       1 logs.go:49] http: TLS handshake error from 10.0.1.2:51618: EOF
I0505 11:01:17.980296       1 logs.go:49] http: TLS handshake error from 10.0.1.2:51616: EOF
I0505 11:01:17.984892       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38866: EOF
I0505 11:01:17.997643       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38870: EOF
I0505 11:01:18.007570       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35066: EOF
I0505 11:01:18.023860       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35428: EOF
I0505 11:01:18.043968       1 logs.go:49] http: TLS handshake error from 10.0.1.3:38864: EOF
I0505 11:01:18.060799       1 logs.go:49] http: TLS handshake error from 10.0.1.1:35454: EOF
I0505 11:01:18.373340       1 trace.go:76] Trace[1806904349]: "GuaranteedUpdate etcd3: *core.Node" (started: 2018-05-05 11:01:17.824561504 +0000 UTC m=+18.918597853) (total time: 548.699176ms):
Trace[1806904349]: [486.946382ms] [486.81435ms] Transaction prepared
I0505 11:01:18.373605       1 trace.go:76] Trace[579616267]: "Patch /api/v1/nodes/kube1/status" (started: 2018-05-05 11:01:17.824157418 +0000 UTC m=+18.918193696) (total time: 549.406967ms):
Trace[579616267]: [549.242793ms] [542.941642ms] Object stored in database
I0505 11:01:18.380591       1 trace.go:76] Trace[97983429]: "GuaranteedUpdate etcd3: *apiregistration.APIService" (started: 2018-05-05 11:01:17.676879411 +0000 UTC m=+18.770915729) (total time:703.646707ms):
Trace[97983429]: [703.646707ms] [703.497371ms] END
I0505 11:01:18.380841       1 trace.go:76] Trace[895102304]: "Update /apis/apiregistration.k8s.io/v1/apiservices/v1./status" (started: 2018-05-05 11:01:17.675249922 +0000 UTC m=+18.769286241) (total time: 705.551898ms):
Trace[895102304]: [705.400867ms] [703.834717ms] Object stored in database
I0505 11:01:18.381730       1 trace.go:76] Trace[888781305]: "GuaranteedUpdate etcd3: *apiregistration.APIService" (started: 2018-05-05 11:01:17.674467885 +0000 UTC m=+18.768504203) (total time: 707.143413ms):
Trace[888781305]: [707.143413ms] [706.97921ms] END
I0505 11:01:18.382405       1 trace.go:76] Trace[1692647675]: "Update /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.storage.k8s.io/status" (started: 2018-05-05 11:01:17.672816059 +0000 UTC m=+18.766852348) (total time: 709.422405ms):
Trace[1692647675]: [709.077638ms] [707.516471ms] Object stored in database
I0505 11:01:18.382816       1 trace.go:76] Trace[1269643779]: "GuaranteedUpdate etcd3: *apiregistration.APIService" (started: 2018-05-05 11:01:17.67305103 +0000 UTC m=+18.767087369) (total time: 709.701236ms):
Trace[1269643779]: [709.701236ms] [709.54797ms] END
I0505 11:01:18.382843       1 trace.go:76] Trace[931998892]: "GuaranteedUpdate etcd3: *apiregistration.APIService" (started: 2018-05-05 11:01:17.660908485 +0000 UTC m=+18.754944824) (total time: 721.86699ms):
Trace[931998892]: [721.86699ms] [721.70397ms] END
I0505 11:01:18.383097       1 trace.go:76] Trace[1925688627]: "Update /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.policy/status" (started: 2018-05-05 11:01:17.672297515 +0000 UTC m=+18.766333823) (total time: 710.699157ms):
Trace[1925688627]: [710.564257ms] [709.867173ms] Object stored in database
I0505 11:01:18.383138       1 trace.go:76] Trace[964127368]: "Update /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.events.k8s.io/status" (started: 2018-05-05 11:01:17.659264811 +0000 UTC m=+18.753301099) (total time: 723.827813ms):
Trace[964127368]: [723.623248ms] [722.070934ms] Object stored in database
I0505 11:01:18.383570       1 trace.go:76] Trace[667567241]: "GuaranteedUpdate etcd3: *apiregistration.APIService" (started: 2018-05-05 11:01:17.675429594 +0000 UTC m=+18.769465922) (total time: 708.103137ms):
Trace[667567241]: [708.103137ms] [707.959916ms] END
I0505 11:01:18.383754       1 trace.go:76] Trace[1340245617]: "Update /apis/apiregistration.k8s.io/v1/apiservices/v2beta1.autoscaling/status" (started: 2018-05-05 11:01:17.67380586 +0000 UTC m=+18.767842128) (total time: 709.910061ms):
Trace[1340245617]: [709.795904ms] [708.224211ms] Object stored in database
I0505 11:01:33.974201       1 controller.go:537] quota admission added evaluator for: { endpoints}
E0505 13:41:07.262095       1 watcher.go:208] watch chan error: etcdserver: mvcc: required revision has been compacted

I forced the private_interface to be "eth0", do you think it could be the reason to my problem? I really don't have much knowledge with network securities, I'm trying to find clues here and there but it feels the problem could be anywhere from weave-net daemonset to a misconfigured firewall or etcd...

pstadler commented 6 years ago

Please check whether Wireguard is able to send and receive traffic between hosts. Refer to the guide.

ticruz38 commented 6 years ago

wg show

interface: wg0
  public key: 2OtrvpVKAVy3e/+uty7Ge/TKQtk4POhjycIGEhimA3E=
  private key: (hidden)
  listening port: 51820

peer: 2YxQM/HlxTmlTVnqwyL7t/RLZCxZMnE3wJVqYG2kFWM=
  endpoint: 10.4.25.143:51820
  allowed ips: 10.0.1.2/32
  latest handshake: 1 minute, 27 seconds ago
  transfer: 2.49 GiB received, 3.06 GiB sent

peer: Deob1oymtd+pz01dXZuqxPodI5myEAsNyHxi7FdDeFE=
  endpoint: 10.3.86.195:51820
  allowed ips: 10.0.1.3/32
  latest handshake: 1 minute, 57 seconds ago
  transfer: 3.84 GiB received, 3.99 GiB sent

Seems all good with wireguard, got pretty much the same output on 3 nodes. the firewall module is setup to work on eth0, when it seems that wireguard work on wg0, should I switch?

pstadler commented 6 years ago

Closing this due to inactivity.