Open deitch opened 7 years ago
Are you sure no weave containers are run?
What does kubectl get pods --namespace=kube-system -o wide
show ?
If they are actually running and dieing, can you get the logs of one of the dead containers please.
actually it might be https://github.com/kubernetes/kubernetes/issues/43815 - can you make sure you have Kubernetes 1.6.1 please?
Hey @bboreham thanks for getting back so quickly.
Yeah, I am sure none is running. I have a single worker node (systemctl stop kubelet
on the others) so I can focus on where stuff is running and debug.
On master:
ip-10-50-21-250 core # kubectl get pods --namespace=kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-321336704-9gq2p 2/4 Error 23 2h 172.17.0.2 ip-10-50-22-42.ec2.internal
kube-dns-321336704-z349d 2/4 Error 24 2h 172.17.0.3 ip-10-50-22-42.ec2.internal
On worker:
ip-10-50-22-42 kubernetes # docker ps -a | grep weave
ip-10-50-22-42 kubernetes # journalctl -l --no-pager -u kubelet.service -f
-- Logs begin at Tue 2017-04-04 12:49:41 UTC. --
Apr 04 17:29:45 ip-10-50-22-42.ec2.internal kubelet[15748]: E0404 17:29:45.816864 15748 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 04 17:29:50 ip-10-50-22-42.ec2.internal kubelet[15748]: I0404 17:29:50.413517 15748 qos_container_manager_linux.go:285] [ContainerManager]: Updated QoS cgroup configuration
Apr 04 17:29:50 ip-10-50-22-42.ec2.internal kubelet[15748]: W0404 17:29:50.821379 15748 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 04 17:29:50 ip-10-50-22-42.ec2.internal kubelet[15748]: E0404 17:29:50.822066 15748 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Apr 04 17:29:52 ip-10-50-22-42.ec2.internal kubelet[15748]: E0404 17:29:52.414519 15748 pod_workers.go:182] Error syncing pod bf621c17-194a-11e7-be58-0e94e95c9de0 ("kube-dns-321336704-z349d_kube-system(bf621c17-194a-11e7-be58-0e94e95c9de0)"), skipping: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]
Apr 04 17:29:52 ip-10-50-22-42.ec2.internal kubelet[15748]: E0404 17:29:52.416158 15748 pod_workers.go:182] Error syncing pod bf6236ac-194a-11e7-be58-0e94e95c9de0 ("kube-dns-321336704-9gq2p_kube-system(bf6236ac-194a-11e7-be58-0e94e95c9de0)"), skipping: network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]
Apr 04 17:29:55 ip-10-50-22-42.ec2.internal kubelet[15748]: W0404 17:29:55.416988 15748 prober.go:98] No ref for container "docker://bf3428d623ce4740f712161ed284990487381fd8b32f840e117cbb98ef0c5c28" (kube-dns-321336704-z349d_kube-system(bf621c17-194a-11e7-be58-0e94e95c9de0):kubedns)
Apr 04 17:29:55 ip-10-50-22-42.ec2.internal kubelet[15748]: I0404 17:29:55.417581 15748 prober.go:106] Readiness probe for "kube-dns-321336704-z349d_kube-system(bf621c17-194a-11e7-be58-0e94e95c9de0):kubedns" failed (failure): Get http://172.17.0.3:8081/readiness: dial tcp 172.17.0.3:8081: getsockopt: connection refused
Apr 04 17:29:55 ip-10-50-22-42.ec2.internal kubelet[15748]: W0404 17:29:55.823492 15748 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 04 17:29:55 ip-10-50-22-42.ec2.internal kubelet[15748]: E0404 17:29:55.823618 15748 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Ignoring the DNS pod problems, just nothing there. For additional, from the master:
ip-10-50-21-250 core # kubectl describe daemonset weave-net --namespace=kube-system
Name: weave-net
Selector: name=weave-net
Node-Selector: <none>
Labels: name=weave-net
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"name":"weave-net","namespace":"kube-system"},"spec":{"template":{"m...
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: name=weave-net
Service Account: weave-net
Containers:
weave:
Image: weaveworks/weave-kube:1.9.4
Port:
Command:
/home/weave/launch.sh
Requests:
cpu: 10m
Liveness: http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/weavedb from weavedb (rw)
weave-npc:
Image: weaveworks/weave-npc:1.9.4
Port:
Requests:
cpu: 10m
Environment: <none>
Mounts: <none>
Volumes:
weavedb:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2h 5m 27 daemon-set Warning FailedCreate Error creating: pods "" is forbidden: pod.Spec.SecurityContext.SELinuxOptions is forbidden
Not sure what that daemonset/SELinux error is. Running stock CoreOS Stable.
actually it might be kubernetes/kubernetes#43815 - can you make sure you have Kubernetes 1.6.1 please?
Running 1.6.0, but not kubeadm. Downloaded and installed all kube components manually. Happy to try, though.
Maybe the kubelet logs will show something?
Or possibly kubectl describe node <your-node>
Just downloading and installing 1.6.1 now, then will check.
Maybe the kubelet logs will show something?
Yeah, those were the logs from journalctl. Doesn't kubelet just spew to stdout/stderr?
Well, 1.6.1 doesn't appear to solve it.
Apr 04 17:41:31 ip-10-50-22-42.ec2.internal kubelet[20511]: W0404 17:41:31.045714 20511 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Apr 04 17:41:31 ip-10-50-22-42.ec2.internal kubelet[20511]: E0404 17:41:31.045904 20511 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I don't get it. If I pass --network-plugin=cni --network-plugin-dir=/etc/cni/net.d
, it looks for something. How does it actually load them? Weave installs as a daemonset (excellent, by the way), but it looks like the node doesn't even get to ready state because it has nothing in CNI?
Oops, you asked for describe node
ip-10-50-21-250 bin # kubectl describe no ip-10-50-22-42.ec2.internal
Name: ip-10-50-22-42.ec2.internal
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=ip-10-50-22-42.ec2.internal
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 04 Apr 2017 14:46:55 +0000
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Tue, 04 Apr 2017 17:45:54 +0000 Tue, 04 Apr 2017 17:45:04 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 04 Apr 2017 17:45:54 +0000 Tue, 04 Apr 2017 17:45:04 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 04 Apr 2017 17:45:54 +0000 Tue, 04 Apr 2017 17:45:04 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready False Tue, 04 Apr 2017 17:45:54 +0000 Tue, 04 Apr 2017 17:45:04 +0000 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses: 10.50.22.42,10.50.22.42,ip-10-50-22-42.ec2.internal
Capacity:
cpu: 2
memory: 8178308Ki
pods: 110
Allocatable:
cpu: 2
memory: 8075908Ki
pods: 110
System Info:
Machine ID: 22728a39e0794116afb356b59fdb9751
System UUID: EC2BB4F9-1532-7105-796A-D8256882EF5D
Boot ID: a80a3638-2f50-4c94-9384-a57aa205d3ff
Kernel Version: 4.9.16-coreos-r1
OS Image: Container Linux by CoreOS 1298.7.0 (Ladybug)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1
Kube-Proxy Version: v1.6.1
PodCIDR: 10.200.0.0/24
ExternalID: ip-10-50-22-42.ec2.internal
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system kube-dns-321336704-9gq2p 260m (13%) 0 (0%) 140Mi (1%) 220Mi (2%)
kube-system kube-dns-321336704-z349d 260m (13%) 0 (0%) 140Mi (1%) 220Mi (2%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
520m (26%) 0 (0%) 280Mi (3%) 440Mi (5%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
44m 44m 1 kubelet, ip-10-50-22-42.ec2.internal Warning ImageGCFailed unable to find data for container /
44m 44m 2 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientDisk Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientDisk
44m 44m 2 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientMemory Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientMemory
44m 44m 2 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasNoDiskPressure Node ip-10-50-22-42.ec2.internal status is now: NodeHasNoDiskPressure
44m 44m 1 kubelet, ip-10-50-22-42.ec2.internal Normal Starting Starting kubelet.
40m 40m 1 kube-proxy, ip-10-50-22-42.ec2.internal Normal Starting Starting kube-proxy.
38m 38m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasNoDiskPressure Node ip-10-50-22-42.ec2.internal status is now: NodeHasNoDiskPressure
38m 38m 1 kubelet, ip-10-50-22-42.ec2.internal Normal Starting Starting kubelet.
38m 38m 1 kubelet, ip-10-50-22-42.ec2.internal Warning ImageGCFailed unable to find data for container /
38m 38m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientDisk Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientDisk
38m 38m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientMemory Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientMemory
38m 38m 1 kube-proxy, ip-10-50-22-42.ec2.internal Normal Starting Starting kube-proxy.
23m 23m 1 kubelet, ip-10-50-22-42.ec2.internal Normal Starting Starting kubelet.
23m 23m 1 kubelet, ip-10-50-22-42.ec2.internal Warning ImageGCFailed unable to find data for container /
23m 23m 1 kube-proxy, ip-10-50-22-42.ec2.internal Normal Starting Starting kube-proxy.
23m 23m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientDisk Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientDisk
23m 23m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasNoDiskPressure Node ip-10-50-22-42.ec2.internal status is now: NodeHasNoDiskPressure
23m 23m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientMemory Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientMemory
6m 6m 1 kubelet, ip-10-50-22-42.ec2.internal Normal Starting Starting kubelet.
6m 6m 1 kubelet, ip-10-50-22-42.ec2.internal Warning ImageGCFailed unable to find data for container /
5m 5m 1 kube-proxy, ip-10-50-22-42.ec2.internal Normal Starting Starting kube-proxy.
6m 5m 14 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientDisk Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientDisk
6m 5m 14 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientMemory Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientMemory
6m 5m 14 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasNoDiskPressure Node ip-10-50-22-42.ec2.internal status is now: NodeHasNoDiskPressure
5m 5m 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeNotReady Node ip-10-50-22-42.ec2.internal status is now: NodeNotReady
1m 1m 1 kube-proxy, ip-10-50-22-42.ec2.internal Normal Starting Starting kube-proxy.
50s 50s 1 kubelet, ip-10-50-22-42.ec2.internal Normal Starting Starting kubelet.
50s 50s 1 kubelet, ip-10-50-22-42.ec2.internal Warning ImageGCFailed unable to find data for container /
50s 50s 3 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientDisk Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientDisk
50s 50s 3 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasSufficientMemory Node ip-10-50-22-42.ec2.internal status is now: NodeHasSufficientMemory
50s 50s 3 kubelet, ip-10-50-22-42.ec2.internal Normal NodeHasNoDiskPressure Node ip-10-50-22-42.ec2.internal status is now: NodeHasNoDiskPressure
50s 50s 1 kube-proxy, ip-10-50-22-42.ec2.internal Normal Starting Starting kube-proxy.
50s 50s 1 kubelet, ip-10-50-22-42.ec2.internal Normal NodeNotReady Node ip-10-50-22-42.ec2.internal status is now: NodeNotReady
And FWIW, my kubelet systemd unit:
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
ExecStart=/opt/local/bin/kubelet --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --allow-privileged=true --cloud-provider= --cluster-dns=10.100.0.5 --cluster-domain=cluster.local --container-runtime=docker --docker=unix:///var/run/docker.sock --kubeconfig=/var/lib/kubelet/kubeconfig --register-node=true --require-kubeconfig=true --serialize-image-pulls=false --tls-cert-file=/var/lib/kubernetes/kubernetes-worker.pem --tls-private-key-file=/var/lib/kubernetes/kubernetes-worker-key.pem --v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Oh damn it! My own darn stupidity! I had used a config with --admission-control=SecurityContextDeny
(among others), which caused weave deployment (which requires SELinux) to fail. Sigh.
I have no idea how to configure kube so that weave (as a privileged daemonset) can have full privileges, but typical user pods and containers cannot. Should it actually fail with SecurityContextDeny
?
Still leaving this open because:
ip-10-50-21-250 bin # kubectl describe pod weave-net-rc838 -n kube-system
Name: weave-net-rc838
Namespace: kube-system
Node: ip-10-50-22-42.ec2.internal/10.50.22.42
Start Time: Tue, 04 Apr 2017 18:25:43 +0000
Labels: name=weave-net
pod-template-generation=1
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"DaemonSet","namespace":"kube-system","name":"weave-net","uid":"d04b85c7-1935-11e7-b540-026890ffec58","apiV...
Status: Pending
IP: 10.50.22.42
Controllers: DaemonSet/weave-net
Containers:
weave:
Container ID:
Image: weaveworks/weave-kube:1.9.4
Image ID:
Port:
Command:
/home/weave/launch.sh
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
Liveness: http-get http://127.0.0.1:6784/status delay=30s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-82nn3 (ro)
/weavedb from weavedb (rw)
weave-npc:
Container ID:
Image: weaveworks/weave-npc:1.9.4
Image ID:
Port:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-82nn3 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
weavedb:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
weave-net-token-82nn3:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-82nn3
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master=:NoSchedule
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 kubelet, ip-10-50-22-42.ec2.internal Warning FailedSync Error syncing pod, skipping: failed to "CreatePodSandbox" for "weave-net-rc838_kube-system(1c6182e5-1964-11e7-b1af-0e94e95c9de0)" with CreatePodSandboxError: "CreatePodSandbox for pod \"weave-net-rc838_kube-system(1c6182e5-1964-11e7-b1af-0e94e95c9de0)\" failed: rpc error: code = 2 desc = failed to start sandbox container for pod \"weave-net-rc838\": Error response from daemon: {\"message\":\"invalid header field value \\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\"process_linux.go:359: container init caused \\\\\\\\\\\\\\\"write /proc/self/task/2287/attr/exec: invalid argument\\\\\\\\\\\\\\\"\\\\\\\"\\\\n\\\"\"}"
# last line repeated many times
One more piece of the puzzle. SELinux?
Apr 05 08:06:37 ip-10-50-22-42.ec2.internal kubelet[29296]: I0405 08:06:37.633880 29296 kuberuntime_manager.go:384] No ready sandbox for pod "weave-net-rc838_kube-system(1c6182e5-1964-11e7-b1af-0e94e95c9de0)" can be found. Need to start a new one
Apr 05 08:06:37 ip-10-50-22-42.ec2.internal kubelet[29296]: I0405 08:06:37.634046 29296 kuberuntime_manager.go:458] Container {Name:weave Image:weaveworks/weave-kube:1.9.4 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath:} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath:} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath:} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath:} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath:} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath:} {Name:weave-net-token-82nn3 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 05 08:06:37 ip-10-50-22-42.ec2.internal kubelet[29296]: I0405 08:06:37.634091 29296 kuberuntime_manager.go:458] Container {Name:weave-npc Image:weaveworks/weave-npc:1.9.4 Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weave-net-token-82nn3 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 05 08:06:37 ip-10-50-22-42.ec2.internal kernel: SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
Apr 05 08:06:37 ip-10-50-22-42.ec2.internal containerd[1341]: time="2017-04-05T08:06:37.835787158Z" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"write /proc/self/task/14661/attr/exec: invalid argument\\\"\"\n" id=dcafd3ef35a51333219b32148f7afa1aa733fe978066402f0d0627f703df0f1c
Apr 05 08:06:37 ip-10-50-22-42.ec2.internal dockerd[1342]: time="2017-04-05T08:06:37.836307141Z" level=error msg="Create container failed with error: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"write /proc/self/task/14661/attr/exec: invalid argument\\\\\\\"\\\"\\n\""
But no idea how to resolve this. Does weave+kube+coreos not work as a combo?
@deitch check please how to allow restricted calls https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Fixing_Problems-Allowing_Access_audit2allow.html
@pronix
ip-10-50-22-42 core # sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: mcs
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 30
ip-10-50-22-42 core # getenforce
Permissive
Fully permissive. I could disable it entirely and reboot, but if it is permissive, should it matter?
@deitch permissive is just notification. so problem is somewhere other
permissive is just notification. so problem is somewhere other
Um, yeah. It means it notifies but does not enforce. Here, it is enforcing. Or, more correctly, the call is failing (possibly because of selinux, possibly that is a red herring).
Well, disabling selinux entirely - including removing it from the default (on coreos) on dockerd - makes the container run. It then fails trying to get kubernetes info:
E0405 08:54:23.706286 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1.Namespace: Get https://10.100.0.1:443/api/v1/namespaces?resourceVersion=0: dial tcp 10.100.0.1:443: getsockopt: connection refused
E0405 08:54:23.717651 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1beta1.NetworkPolicy: Get https://10.100.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: dial tcp 10.100.0.1:443: getsockopt: connection refused
E0405 08:54:24.707660 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.100.0.1:443/api/v1/pods?resourceVersion=0: dial tcp 10.100.0.1:443: getsockopt: connection refused
That appears to be the kubernetes cluster service:
ip-10-50-21-250 core # kubectl get svc -oyaml kubernetes
apiVersion: v1
kind: Service
metadata:
creationTimestamp: 2017-04-04T12:54:08Z
labels:
component: apiserver
provider: kubernetes
name: kubernetes
namespace: default
resourceVersion: "24"
selfLink: /api/v1/namespaces/default/services/kubernetes
uid: cafee572-1935-11e7-b540-026890ffec58
spec:
clusterIP: 10.100.0.1
ports:
- name: https
port: 443
protocol: TCP
targetPort: 6443
sessionAffinity: ClientIP
type: ClusterIP
status:
loadBalancer: {}
there is example how to handle with enabled selinux https://github.com/weaveworks/weave/issues/293
dial tcp 10.100.0.1:443: getsockopt: connection refused
This means it did manage to contact a host, but there was nothing listening at that port.
You should check that kube-proxy is mapping port 443 to 6443 and mapping to the real address of the api-server. More tips at kubernetes.io/docs/tasks/debug-application-cluster/debug-service/
This means it did manage to contact a host, but there was nothing listening at that port
Obviously. :-)
So all of those, through many layers of logs and debugging, come down to something selinux between docker/coreos/kube/maybe weave?
I dug through the DaemonSet from the weave spec, it asks for certain SELinux capabilities. Should that not handle it?
there is example how to handle with enabled selinux #293
Does that handle the mqueue issue? Also, are there official docs on, "run weave in selinux environment using ____"?
there is example how to handle with enabled selinux #293
That only covers as a systemd unit, not as k8s AddOn.
Obviously. :-)
Oops, sorry @bboreham, that came across as snarky. Completely unintentional.
Curious: weave uses the kubernetes service to reach the api server. Which makes sense. And if the cluster has 3 api servers, but only one functioning? I manually disabled systemctl stop kube-apiserver
2 out of 3, yet the kube-proxy-generated iptables show:
-A KUBE-SEP-7ENDL6QSPNVRD6RQ -s 10.50.22.124/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-7ENDL6QSPNVRD6RQ -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-7ENDL6QSPNVRD6RQ --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.50.22.124:6443
-A KUBE-SEP-IX4LL7XNRMTXIUD2 -s 10.50.20.186/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-IX4LL7XNRMTXIUD2 -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-IX4LL7XNRMTXIUD2 --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.50.20.186:6443
-A KUBE-SEP-QLFYKHFTR2K3O732 -s 10.50.21.250/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-QLFYKHFTR2K3O732 -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-QLFYKHFTR2K3O732 --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.50.21.250:6443
-A KUBE-SERVICES -d 10.100.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-IX4LL7XNRMTXIUD2 --mask 255.255.255.255 --rsource -j KUBE-SEP-IX4LL7XNRMTXIUD2
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-QLFYKHFTR2K3O732 --mask 255.255.255.255 --rsource -j KUBE-SEP-QLFYKHFTR2K3O732
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-7ENDL6QSPNVRD6RQ --mask 255.255.255.255 --rsource -j KUBE-SEP-7ENDL6QSPNVRD6RQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-IX4LL7XNRMTXIUD2
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-QLFYKHFTR2K3O732
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-7ENDL6QSPNVRD6RQ
And, yes, those are the 3 IPs of the master nodes, and they are running at 6443.
Yep, restarting the other nodes gets it to respond. It isn't a service problem per se, but an inability by kube-proxy to recognize loss of an API server.
And with every layer of the onion:
E0405 09:38:38.966305 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.100.0.1:443/api/v1/pods?resourceVersion=0: x509: certificate signed by unknown authority
E0405 09:38:38.971602 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1.Namespace: Get https://10.100.0.1:443/api/v1/namespaces?resourceVersion=0: x509: certificate signed by unknown authority
E0405 09:38:38.992128 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1beta1.NetworkPolicy: Get https://10.100.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: x509: certificate signed by unknown authority
How does Weave handle connecting to the api server with its certs signed by a private CA?
Ah, yes, the cert in /var/run/secrets/
, so debugging that now.
kubeproxy
ought to see those master nodes are bad and take them out of the iptables rules.
Possibly this takes some time for the remaining master nodes to notice; that would be a question for Kubernetes.
How does Weave handle connecting to the api server with its certs signed by a private CA?
Weave Net does not know nor care about these: it uses InClusterConfig
(see: /prog/kube-peers/main.go#L13
), which handles this transparently.
I have no idea how to configure kube so that weave (as a privileged daemonset) can have full privileges
and
I dug through the DaemonSet from the weave spec, it asks for certain SELinux capabilities. Should that not handle it?
I believe that is indeed taken care of by the YAML file available at https://git.io/weave-kube-1.6 already, see:
securityContext:
seLinuxOptions:
type: spc_t
I know close to nothing about SELinux, but I would probably start checking which domains are the following components running under:
weave
: spc_t
docker
: ?kubelet
(and other Kubernetes components): ?systemd
: ?and make sure they can talk to each others.
This may be relevant as well: https://www.weave.works/docs/net/latest/installing-weave/systemd/
kubeproxy ought to see those master nodes are bad and take them out of the iptables rules. Possibly this takes some time for the remaining master nodes to notice; that would be a question for Kubernetes.
@bboreham yes it should. That definitely is not a Weave question. If I can replicate it, I will open a k8s issue.
@marccarre wrote:
I know close to nothing about SELinux
Ha! Join the club. Everywhere enables it, and few actually know how to use it. All I know is that every place I have been has had to disable it because stuff just didn't work. Sometimes I wonder if it is like Plato's ideal. It is a security system that works perfectly only in theory, but few use it in the concrete. :-)
I think the core issue is that mqueue one, but I really don't know what it is about. I will check those links.
So despite the interim issues, in the end, this boils down to: please put big warning that selinux can get in the way, whether enabled on OS or dockerd?
Oh, and still struggling with the certificates. Apparently it is due to SNI, which I have enabled on api server. Internal CA with internal dynamic certs for internal access, externally provided cert for external API access.
Confirm there is an SNI issue. API server has 2 certs configured under SNI. The one with the local CA cert at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
is delivered when I access using the load balancer, the private IP of the specific master node 10.50.22.57
, or the service IP 10.100.0.1
. I can confirm it by doing the following inside the container. Any of the below works and has openssl reporting the server cert as verified:
openssl s_client -connect 10.50.22.57:6443 -servername 10.50.22.57 -CAfile /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
openssl s_client -connect 10.50.22.57:6443 -servername 10.100.0.1 -CAfile /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
openssl s_client -connect 10.100.0.1:443 -servername 10.50.22.57 -CAfile /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
openssl s_client -connect 10.100.0.1:443 -servername 10.100.0.1 -CAfile /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
But docker logs
on the host shows:
E0405 11:54:00.144769 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1.Pod: Get https://10.100.0.1:443/api/v1/pods?resourceVersion=0: x509: certificate signed by unknown authority
E0405 11:54:00.150515 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1beta1.NetworkPolicy: Get https://10.100.0.1:443/apis/extensions/v1beta1/networkpolicies?resourceVersion=0: x509: certificate signed by unknown authority
E0405 11:54:00.154787 1 reflector.go:214] github.com/weaveworks/weave/vendor/k8s.io/client-go/tools/cache/reflector.go:109: Failed to list *v1.Namespace: Get https://10.100.0.1:443/api/v1/namespaces?resourceVersion=0: x509: certificate signed by unknown authority
Confirmed. I wiresharked the comms from the API server side. Even though API server supports SNI (as do kubectl and kubelet and kube-proxy, etc. as clients), weave's request does not include the TLS 1.2 extension server_name
, causing it to serve up the wrong cert.
Thanks for the update @deitch. I had another look at Weave Net's sources, and how we call Kubernetes using their client, and it doesn't look like we can provide anything (argument, flag, etc.) to change this behaviour, therefore this looks like a bug for Kubernetes.
@marccarre can you point me to where in the source? I am happy to dig into it. I know that kubelet and kube-proxy succeed with it, and they should use the same code, so want to see where it is different.
@deitch, I didn't trace the full call tree but the error you see seems to happen in NewReflector
and the only places where weave-kube
interacts with Kubernetes' API directly seem to be:
Yeah, I tracked it down to here https://github.com/weaveworks/weave/blob/master/prog/weave-npc/main.go#L116 and https://github.com/weaveworks/weave/blob/master/prog/weave-npc/main.go#L119
Those don't give much choice, so I opened https://github.com/kubernetes/client-go/issues/173
Now I have no idea how I will make this all work. The problem is a kubernetes one. The masters have an internal CA that generates certs for etcd<->etcd comms, worker<->API server, etc. This is internally generated and dynamic, and is distinct from what clients use, since that needs to be controlled by an outside CA (real admin person dishing out certs).
I used SNI so that the API server would use the internal auto-generated on server startup cert for all internal comms (including from weave and kubelet and kube-proxy). That one is valid for the internal ELB, and the private IP of the API server (determined at boot time because in the cloud it is dynamic), and the service IP for kubernetes service (which weave uses), so it cannot be generated in advance. The external one is generated at cluster creation time.
Now I need a way to solve this without SNI. I know it isn't a weave problem - the only weave issue here appears to be the selinux issue, which isn't really Weave's fault, but probably should be documented - but a more general issue.
Maybe I can use a single cert for all, have it generated in real-time, and have the external clients (including kubectl) trust the CA that is used internally dynamically? For api server to authenticate externals, it would only be via the external CA. Hmm....
Thanks for tracking down this certificate issue! Look forward to the response on https://github.com/kubernetes/client-go/issues/173
@bboreham quite welcome.
Would be nice to figure out the selinux issue, though, so happy to take pointers.
Separately: any chance you are at Continuous Lifecycle in London in a month? Flying in for it.
For the record: feature
and [component/docs]
labels added as we should improve documentation for CoreOS + SELinux. Related issue: #1458 (documentation for CentOS + SELinux)
Would be nice to figure out the selinux issue, though, so happy to take pointers.
@deitch did you find anything suspicious when looking at the SELinux domains for systemd
, Kubernetes and Docker? (see: https://github.com/weaveworks/weave/issues/2881#issuecomment-291823577)
did you find anything suspicious when looking at the SELinux domains for systemd, Kubernetes and Docker?
Not beyond that mqueue issue, which I suspect is more of a docker+coreos thing. I will dig deeper, but might need to wait a bit...
I could not install Weave 1.9.4 on Kubernetes 1.6.2 with Ubuntu 16.04.2 due to pod.Spec.SecurityContext.SELinuxOptions is forbidden
kubectl describe daemonsets/weave-net -n kube-system
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1d 12m 531 daemon-set Warning FailedCreate Error creating: pods "" is forbidden: pod.Spec.SecurityContext.SELinuxOptions is forbidden
Because probably I do not have SELinux installed.
After removing
- securityContext:
- seLinuxOptions:
- type: spc_t
Weave launches.
But this is probably not the idea.
Now I have do dig down, how to allow this SELinux Option
@weitzj, which YAML did you use? This one: https://github.com/weaveworks/weave/releases/download/v1.9.4/weave-daemonset-k8s-1.6.yaml?
@marccarre Yes. This one. I have modified it a bit to incorporate a secret. So here is the whole diff
61a62,63
> args:
> - --log-level=warning
67a70,79
> env:
> - name: CHECKPOINT_DISABLE
> value: "1"
> - name: IPALLOC_RANGE
> value: 172.20.0.0/16
> - name: WEAVE_PASSWORD
> valueFrom:
> secretKeyRef:
> name: weave-passwd
> key: weave-passwd
98,100d109
< securityContext:
< seLinuxOptions:
< type: spc_t
@deitch Did you find the way to run Weave Net with Kubernetes on CoreOS? I continue to have this when describing weave-net pod:
3m 3m 1 {kubelet 10.135.65.230} Warning FailedSync Error syncing pod, skipping: failed to "CreatePodSandbox" for "weave-net-pkl1d_kube-system(a64e9767-2a91-11e7-9963-d2b6d3081ec8)" with CreatePodSandboxError: "CreatePodSandbox for pod \"weave-net-pkl1d_kube-system(a64e9767-2a91-11e7-9963-d2b6d3081ec8)\" failed: rpc error: code = 2 desc = failed to start sandbox container for pod \"weave-net-pkl1d\": Error response from daemon: {\"message\":\"invalid header field value \\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\"process_linux.go:359: container init caused \\\\\\\\\\\\\\\"write /proc/self/task/8630/attr/exec: invalid argument\\\\\\\\\\\\\\\"\\\\\\\"\\\\n\\\"\"}"
@burdiyan I did. I had to disable SELinux entirely (ugh), but it worked. I also stopped using SNI because of the early issue. If you want to see my cloudinit, I probably can share a chunk of it.
@deitch Would be great if you can share it :) I'm already struggling to set it up for a lot longer than I wish.
I'm already struggling to set it up for a lot longer than I wish.
heh, see my comment above, "Save some other poor soul my wasted day." :-)
Here is a simplified and much reduced version of my master cloudinit, ignoring etcd, etc.. We use plain vanilla CoreOS, and then cloudinit to configure it. This makes auto-scaling immensely easier. Actually, we do that for every instance (kube, vpn, ci, you name it).
Also, cloudinit got a little too long for AWS, so we have a cloudinit stub that just installs AWS S3 utils and then downloads everything else from S3 (using IAM roles), including environment, etc.
# get private cloud IP
INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
# ensure binaries are in path
cat > /etc/profile.d/optlocal.sh <<"EOF"
PATH=$PATH:/opt/local/bin
EOF
# setup etcd2
# note: Use image monsantoco/etcd-aws-cluster to simplify figuring out new vs join
# download kubernetes binaries
# install into /opt/local/bin because /usr is read-only
# download CNI from https://github.com/containernetworking/cni/releases/download
# install into /opt/cni/bin
# create systemd entries for kube-apiserver, kube-controller-manager, kube-scheduler
# Note:
# - we removed --admission-control=SecurityContextDeny
# - certs and keys are either passed in or auto-generated
cat > /etc/systemd/system/kube-apiserver.service <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=$BINPATH/kube-apiserver \
--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,ResourceQuota \
--advertise-address=$INTERNAL_IP \
--allow-privileged=true \
--apiserver-count=3 \
--authorization-mode=ABAC \
--authorization-policy-file=$AUTH_POLICY_FILE \
--bind-address=$INTERNAL_IP \
--secure-port=6443 \
--insecure-bind-address=127.0.0.1 \
--insecure-port=8080 \
--enable-swagger-ui=true \
--storage-backend=etcd2 \
--etcd-cafile=/etc/etcd/ca.pem \
--etcd-certfile=/etc/etcd/etcd.pem \
--etcd-keyfile=/etc/etcd/etcd-key.pem \
--kubelet-certificate-authority=$CA_FILE_FULL \
--etcd-servers=$ETCD_SERVERS \
--service-account-key-file=$SERVICEACCOUNT_KEY \
--service-cluster-ip-range=$SERVICE_CIDR \
--service-node-port-range=30000-32767 \
--tls-cert-file $APISERVER_CERT \
--tls-private-key-file $APISERVER_KEY \
--client-ca-file=$CA_FILE_FULL \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# leaving out kube-controller-manager, kube-schedule; ask if you need
# create kube-dns deployment and service, based largely on Kelsey Hightower's templates
# load up with timeout, in case API server not ready yet
# load up weave, same timeout logic
# also enable retries because this is run on every master, and https://github.com/kubernetes/kubernetes/issues/44165
kubectl apply -f https://git.io/weave-kube-1.6
# start cfssl signing server
# because we run a CA signing for each node
And for worker nodes
# get private cloud IP
INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)
# ensure binaries are in path
cat > /etc/profile.d/optlocal.sh <<"EOF"
PATH=$PATH:/opt/local/bin
EOF
# download kubernetes binaries: kubelet kube-proxy kubectl
# install into /opt/local/bin because /usr is read-only
# download CNI from https://github.com/containernetworking/cni/releases/download
# install into /opt/cni/bin
# create kubeconfig file using certs generated from master earlier in here
cat > $KUBELETROOT/kubeconfig <<EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: $CA_CERT
server: $KUBERNETES_API_SERVER_URL
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubelet
name: kubelet
current-context: kubelet
users:
- name: kubelet
user:
client-certificate: $WORKER_CERT
client-key: $WORKER_KEY
EOF
# REALLY IMPORTANT TO MAKE IT WORK... and I dislike it
# unfortunately disable selinux is necessary
mkdir -p /etc/systemd/system/docker.service.d
cat > /etc/systemd/system/docker.service.d/disable_selinux.conf <<EOF
[Service]
Environment=DOCKER_OPTS=--selinux-enabled=false
EOF
# kubelet systemd
cat > /etc/systemd/system/kubelet.service <<EOF
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
ExecStart=$BINPATH/kubelet \
--allow-privileged=true \
--cloud-provider= \
--network-plugin=cni \
--cni-conf-dir=/etc/cni/net.d \
--cni-bin-dir=/opt/cni/bin \
--cluster-dns=$KUBERNETES_CLUSTER_DNS \
--cluster-domain=cluster.local \
--container-runtime=docker \
--docker=unix:///var/run/docker.sock \
--kubeconfig=$KUBELETROOT/kubeconfig \
--register-node=true \
--require-kubeconfig=true \
--serialize-image-pulls=false \
--tls-cert-file=$WORKER_CERT \
--tls-private-key-file=$WORKER_KEY \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# repeat for kube-proxy
Let me know what more I can do
So is this issue all about SELinux? I should change the title if so.
@bboreham I wish I could say yes. All I know for sure is that a combination of the following makes it work:
--admission-control=SecurityContextDeny
on api serverProbably should run each separately and tease out the issue.
Wonder if #3000 would help here? (basically we change the default to blank SELinux options)
Wonder if #3000 would help here?
Not sure. When I can set a new cluster up, I can reenable everything and try it, but might be a while. Just going through testing after I removed the other workarounds (CRI/hairpin, yaml generator).
Were you able to recreate the issue?
i have similar problem on my windows server 2016 node. the weave-net pod can not be started successfully.
i used kubernetes 1.9.3 alpha
https://github.com/kubernetes/kubernetes/issues/56696
any idea?
thank you all very much
I have been banging my head on this (or something like it) for several hours now. I am trying to do something that should be really simple: start a kube cluster with just weavenet networking. As simple as:
And yet:
Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
I do not know if the two are related - or if it is connected to https://github.com/weaveworks/weave/issues/2826 - but I just cannot get "simple one-step install" to be, well, simple. :-)