linuxkit / linuxkit

A toolkit for building secure, portable and lean operating systems for containers
Apache License 2.0
8.27k stars 1.01k forks source link

project/kubernetes: master always NotReady, failure to configure CNI/weave? #2031

Closed ijc closed 7 years ago

ijc commented 7 years ago

Running d310ebf44ff000c9b9e90542bbd0a62d386be3e1 with a patch to avoid stale local images:

diff --git a/projects/kubernetes/Makefile b/projects/kubernetes/Makefile
index 05280281..cb1c297b 100644
--- a/projects/kubernetes/Makefile
+++ b/projects/kubernetes/Makefile
@@ -20,10 +20,10 @@ push-container-images: build-container-images cache-images
 build-vm-images: kube-master-initrd.img kube-node-initrd.img

 kube-master-initrd.img: kube-master.yml
-   ../../bin/moby build -name kube-master kube-master.yml
+   ../../bin/moby build --pull -name kube-master kube-master.yml

 kube-node-initrd.img: kube-node.yml
-   ../../bin/moby build -name kube-node kube-node.yml
+   ../../bin/moby build --pull -name kube-node kube-node.yml

 clean:
    rm -f -r \

I am running, from projects/kubernetes:

$ git clean -fdx . && make build-container-images && make build-vm-images && ./boot-master.sh 

(I think build-container-images is redundant given the use of --pull, but I left it in in case I am wrong)

Then on the serial console I am running:

# runc exec --tty kubelet ash -l

(I'm doing this in preference to using ssh_into_kubelet.sh since this works on both MacOS and Linux)

Then:

# kubeadm-init.sh
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.6.1
[init] Using Authorization mode: RBAC
[preflight] Skipping pre-flight checks
[preflight] No supported init system detected, won't ensure kubelet is running.
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [moby kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.65.5]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 23.779355 seconds
[apiclient] Waiting for at least one node to register
[apiclient] First node has registered after 4.503625 seconds
[token] Using token: 3b9191.51ae41e60e2b9ba6
[apiconfig] Created RBAC rules
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 3b9191.51ae41e60e2b9ba6 192.168.65.5:6443

serviceaccount "weave-net" created
daemonset "weave-net" created
clusterrole "weave-net" created
clusterrolebinding "weave-net" created

At which point I get:

# kubectl get nodes
NAME      STATUS     AGE       VERSION
moby      NotReady   1m        v1.6.1

This state persists forever AFAICT (I've waited a very long time).

Examining /var/log/kubelet.log I see, repeatedly at intervals:

I0613 09:01:14.952354    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/8b8054e7-5016-11e7-8f7d-1a20c5ea1298-weave-net-token-kb4sp" (spec.Name: "weave-net-token-kb4sp") pod "8b8054e7-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b8054e7-5016-11e7-8f7d-1a20c5ea1298").
W0613 09:01:15.033012    1983 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
E0613 09:01:15.033211    1983 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I0613 09:01:15.221822    1983 kuberuntime_manager.go:458] Container {Name:weave Image:weaveworks/weave-kube:1.9.4 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath:} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath:} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath:} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath:} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath:} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath:} {Name:weave-net-token-kb4sp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:01:15.221898    1983 kuberuntime_manager.go:458] Container {Name:weave-npc Image:weaveworks/weave-npc:1.9.4 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weave-net-token-kb4sp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:01:15.221966    1983 kuberuntime_manager.go:742] checking backoff for container "weave" in pod "weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:01:15.222095    1983 kuberuntime_manager.go:752] Back-off 1m20s restarting failed container=weave pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)
I0613 09:01:15.222102    1983 kuberuntime_manager.go:742] checking backoff for container "weave-npc" in pod "weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:01:15.222157    1983 kuberuntime_manager.go:752] Back-off 1m20s restarting failed container=weave-npc pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)
E0613 09:01:15.222192    1983 pod_workers.go:182] Error syncing pod 8b8054e7-5016-11e7-8f7d-1a20c5ea1298 ("weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"), skipping: [failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=weave pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
, failed to "StartContainer" for "weave-npc" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=weave-npc pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
]
I0613 09:01:16.977058    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/8b80617e-5016-11e7-8f7d-1a20c5ea1298-kube-proxy-token-1jndl" (spec.Name: "kube-proxy-token-1jndl") pod "8b80617e-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b80617e-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:01:16.977789    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/8b80617e-5016-11e7-8f7d-1a20c5ea1298-kube-proxy" (spec.Name: "kube-proxy") pod "8b80617e-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b80617e-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:01:17.218581    1983 kuberuntime_manager.go:458] Container {Name:kube-proxy Image:gcr.io/google_containers/kube-proxy-amd64:v1.6.1 Command:[/usr/local/bin/kube-proxy --kubeconfig=/var/lib/kube-proxy/kubeconfig.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:kube-proxy ReadOnly:false MountPath:/var/lib/kube-proxy SubPath:} {Name:kube-proxy-token-1jndl ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:01:17.218703    1983 kuberuntime_manager.go:742] checking backoff for container "kube-proxy" in pod "kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:01:17.218820    1983 kuberuntime_manager.go:752] Back-off 1m20s restarting failed container=kube-proxy pod=kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)
E0613 09:01:17.218875    1983 pod_workers.go:182] Error syncing pod 8b80617e-5016-11e7-8f7d-1a20c5ea1298 ("kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=kube-proxy pod=kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)"
W0613 09:01:20.034820    1983 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
E0613 09:01:20.035009    1983 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

My understanding was that adding the weave.yaml (done by kubeadm-init.sh) should have populated /etc/cni/net.d with something. If I manually do:

# echo '{ "name": "weave", "type": "weave-net" }' > /etc/cni/net.d/10-weave.conf

Now I get repeated stanzas in /var/log/kubelet.log of:

I0613 09:06:03.929034    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/8b8054e7-5016-11e7-8f7d-1a20c5ea1298-weave-net-token-kb4sp" (spec.Name: "weave-net-token-kb4sp") pod "8b8054e7-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b8054e7-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:06:04.222881    1983 kuberuntime_manager.go:458] Container {Name:weave Image:weaveworks/weave-kube:1.9.4 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath:} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath:} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath:} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath:} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath:} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath:} {Name:weave-net-token-kb4sp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:06:04.222987    1983 kuberuntime_manager.go:458] Container {Name:weave-npc Image:weaveworks/weave-npc:1.9.4 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weave-net-token-kb4sp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:06:04.223132    1983 kuberuntime_manager.go:742] checking backoff for container "weave" in pod "weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:06:04.223424    1983 kuberuntime_manager.go:752] Back-off 5m0s restarting failed container=weave pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)
I0613 09:06:04.223433    1983 kuberuntime_manager.go:742] checking backoff for container "weave-npc" in pod "weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:06:04.223506    1983 kuberuntime_manager.go:752] Back-off 5m0s restarting failed container=weave-npc pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)
E0613 09:06:04.223545    1983 pod_workers.go:182] Error syncing pod 8b8054e7-5016-11e7-8f7d-1a20c5ea1298 ("weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"), skipping: [failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
, failed to "StartContainer" for "weave-npc" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave-npc pod=weave-net-63nvq_kube-system(8b8054e7-5016-11e7-8f7d-1a20c5ea1298)"
]
I0613 09:06:04.947342    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/8b80617e-5016-11e7-8f7d-1a20c5ea1298-kube-proxy-token-1jndl" (spec.Name: "kube-proxy-token-1jndl") pod "8b80617e-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b80617e-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:06:04.948075    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/8b80617e-5016-11e7-8f7d-1a20c5ea1298-kube-proxy" (spec.Name: "kube-proxy") pod "8b80617e-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b80617e-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:06:05.221185    1983 kuberuntime_manager.go:458] Container {Name:kube-proxy Image:gcr.io/google_containers/kube-proxy-amd64:v1.6.1 Command:[/usr/local/bin/kube-proxy --kubeconfig=/var/lib/kube-proxy/kubeconfig.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:kube-proxy ReadOnly:false MountPath:/var/lib/kube-proxy SubPath:} {Name:kube-proxy-token-1jndl ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:06:05.221355    1983 kuberuntime_manager.go:742] checking backoff for container "kube-proxy" in pod "kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:06:05.221484    1983 kuberuntime_manager.go:752] Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)
E0613 09:06:05.221522    1983 pod_workers.go:182] Error syncing pod 8b80617e-5016-11e7-8f7d-1a20c5ea1298 ("kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-5fpl2_kube-system(8b80617e-5016-11e7-8f7d-1a20c5ea1298)"
I0613 09:06:08.924644    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/8b899e86-5016-11e7-8f7d-1a20c5ea1298-kube-dns-token-w107z" (spec.Name: "kube-dns-token-w107z") pod "8b899e86-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b899e86-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:06:08.925228    1983 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/8b899e86-5016-11e7-8f7d-1a20c5ea1298-kube-dns-config" (spec.Name: "kube-dns-config") pod "8b899e86-5016-11e7-8f7d-1a20c5ea1298" (UID: "8b899e86-5016-11e7-8f7d-1a20c5ea1298").
I0613 09:06:09.222821    1983 kuberuntime_manager.go:458] Container {Name:kubedns Image:gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1 Command:[] Args:[--domain=cluster.local. --dns-port=10053 --config-dir=/kube-dns-config --v=2] WorkingDir: Ports:[{Name:dns-local HostPort:0 ContainerPort:10053 Protocol:UDP HostIP:} {Name:dns-tcp-local HostPort:0 ContainerPort:10053 Protocol:TCP HostIP:} {Name:metrics HostPort:0 ContainerPort:10055 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:PROMETHEUS_PORT Value:10055 ValueFrom:nil}] Resources:{Limits:map[memory:{i:{value:178257920 scale:0} d:{Dec:<nil>} s:170Mi Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:73400320 scale:0} d:{Dec:<nil>} s:70Mi Format:BinarySI}]} VolumeMounts:[{Name:kube-dns-config ReadOnly:false MountPath:/kube-dns-config SubPath:} {Name:kube-dns-token-w107z ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthcheck/kubedns,Port:10054,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/readiness,Port:8081,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:3,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:06:09.222922    1983 kuberuntime_manager.go:458] Container {Name:dnsmasq Image:gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1 Command:[] Args:[-v=2 -logtostderr -configDir=/etc/k8s/dns/dnsmasq-nanny -restartDnsmasq=true -- -k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] WorkingDir: Ports:[{Name:dns HostPort:0 ContainerPort:53 Protocol:UDP HostIP:} {Name:dns-tcp HostPort:0 ContainerPort:53 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:150 scale:-3} d:{Dec:<nil>} s:150m Format:DecimalSI} memory:{i:{value:20971520 scale:0} d:{Dec:<nil>} s:20Mi Format:BinarySI}]} VolumeMounts:[{Name:kube-dns-config ReadOnly:false MountPath:/etc/k8s/dns/dnsmasq-nanny SubPath:} {Name:kube-dns-token-w107z ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthcheck/dnsmasq,Port:10054,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
I0613 09:06:09.222972    1983 kuberuntime_manager.go:458] Container {Name:sidecar Image:gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1 Command:[] Args:[--v=2 --logtostderr --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A] WorkingDir: Ports:[{Name:metrics HostPort:0 ContainerPort:10054 Protocol:TCP HostIP:}] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI} memory:{i:{value:20971520 scale:0} d:{Dec:<nil>} s:20Mi Format:BinarySI}]} VolumeMounts:[{Name:kube-dns-token-w107z ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/metrics,Port:10054,Host:,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:60,TimeoutSeconds:5,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:5,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
W0613 09:06:09.228628    1983 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "kube-dns-3913472980-wkbf6_kube-system": Unexpected command output nsenter: cannot open : No such file or directory
 with error: exit status 1
E0613 09:06:09.230006    1983 cni.go:275] Error deleting network: failed to find plugin "weave-net" in path [/opt/cni/bin /opt/weave-net/bin]
E0613 09:06:09.230638    1983 remote_runtime.go:109] StopPodSandbox "ee5486195a819a2bc07cf4a2d0f7d67ea604e7afadf07a656fc26419ded34703" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "kube-dns-3913472980-wkbf6_kube-system" network: failed to find plugin "weave-net" in path [/opt/cni/bin /opt/weave-net/bin]
E0613 09:06:09.230685    1983 kuberuntime_manager.go:784] Failed to stop sandbox {"docker" "ee5486195a819a2bc07cf4a2d0f7d67ea604e7afadf07a656fc26419ded34703"}
E0613 09:06:09.230715    1983 kuberuntime_manager.go:573] killPodWithSyncResult failed: failed to "KillPodSandbox" for "8b899e86-5016-11e7-8f7d-1a20c5ea1298" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"kube-dns-3913472980-wkbf6_kube-system\" network: failed to find plugin \"weave-net\" in path [/opt/cni/bin /opt/weave-net/bin]"
E0613 09:06:09.230745    1983 pod_workers.go:182] Error syncing pod 8b899e86-5016-11e7-8f7d-1a20c5ea1298 ("kube-dns-3913472980-wkbf6_kube-system(8b899e86-5016-11e7-8f7d-1a20c5ea1298)"), skipping: failed to "KillPodSandbox" for "8b899e86-5016-11e7-8f7d-1a20c5ea1298" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"kube-dns-3913472980-wkbf6_kube-system\" network: failed to find plugin \"weave-net\" in path [/opt/cni/bin /opt/weave-net/bin]"

I think at this point it is failing to find the CNI plugins but I gave up trying to band-aid the situation since it is clear I've no idea what I'm doing ;-)

I see the same thing on MacOS (using boot-master.sh) and on Linux where instead of the script I am using:

../../bin/linuxkit run -cpus 2 -mem 4096 -disk kube-master-disk.img,size=4G kube-master

(my actual goal was to fix the script for the Linux case, I may still PR that despite these issues).

All the logs above are from MacOS, the Linux ones are the same AFAICT.

/cc @errordeveloper

ijc commented 7 years ago

Closing in favour of #2131 since that is where all the investigation/fixing is happening.