Trying to start cluster: "network plugin is not ready: cni config uninitialized"

heidemn commented 6 years ago

I followed your instructions to set up an Ubuntu cluster on my Windows 10 laptop. https://github.com/pipo02mix/why_k8s_can_make_our_life_easier/tree/master/cluster/ubuntu

After running all commands listed there, I think the cluster is not yet ready in my case. Not all system pods are running, and I did not manage to succeed with the next step, to start the registry. Looking into the details, it seems like the CNI plugin is not properly initialized.

Do you have any advice what could be the problem here, and how get the cluster to work? Thanks...

$ kubectl get node
NAME     STATUS     ROLES    AGE   VERSION
master   NotReady   master   63m   v1.12.2
node1    NotReady   <none>   60m   v1.12.2
node2    NotReady   <none>   57m   v1.12.2

$ kubectl describe node master
Name:               master
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=master
                    node-role.kubernetes.io/master=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 11 Nov 2018 18:10:12 +0100
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Sun, 11 Nov 2018 19:20:30 +0100   Sun, 11 Nov 2018 18:10:12 +0100   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Sun, 11 Nov 2018 19:20:30 +0100   Sun, 11 Nov 2018 18:10:12 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Sun, 11 Nov 2018 19:20:30 +0100   Sun, 11 Nov 2018 18:10:12 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Sun, 11 Nov 2018 19:20:30 +0100   Sun, 11 Nov 2018 18:10:12 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Sun, 11 Nov 2018 19:20:30 +0100   Sun, 11 Nov 2018 18:10:12 +0100   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:  172.17.4.101
  Hostname:    master
Capacity:
 cpu:                2
 ephemeral-storage:  10098468Ki
 hugepages-2Mi:      0
 memory:             2048012Ki
 pods:               110
Allocatable:
 cpu:                2
 ephemeral-storage:  9306748094
 hugepages-2Mi:      0
 memory:             1945612Ki
 pods:               110
System Info:
 Machine ID:                 d896d248c7a44f87aeb55f972bdd05f3
 System UUID:                F9E43F60-E3F1-4B9D-9400-D854E6C99A46
 Boot ID:                    8804c8df-924e-465c-ac05-d162061bfe88
 Kernel Version:             4.4.0-138-generic
 OS Image:                   Ubuntu 16.04.5 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.12.2
 Kube-Proxy Version:         v1.12.2
PodCIDR:                     192.168.0.0/24
Non-terminated Pods:         (5 in total)
  Namespace                  Name                              CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                              ------------  ----------  ---------------  -------------
  kube-system                etcd-master                       0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-apiserver-master             250m (12%)    0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-controller-manager-master    200m (10%)    0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-proxy-ld548                  0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-scheduler-master             100m (5%)     0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests    Limits
  --------  --------    ------
  cpu       550m (27%)  0 (0%)
  memory    0 (0%)      0 (0%)
Events:
  Type    Reason                   Age                From                Message
  ----    ------                   ----               ----                -------
  Normal  Starting                 70m                kubelet, master     Starting kubelet.
  Normal  NodeAllocatableEnforced  70m                kubelet, master     Updated Node Allocatable limit across pods
  Normal  NodeHasSufficientDisk    70m (x6 over 70m)  kubelet, master     Node master status is now: NodeHasSufficientDisk
  Normal  NodeHasSufficientMemory  70m (x6 over 70m)  kubelet, master     Node master status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    70m (x5 over 70m)  kubelet, master     Node master status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     70m (x6 over 70m)  kubelet, master     Node master status is now: NodeHasSufficientPID
  Normal  Starting                 69m                kube-proxy, master  Starting kube-proxy.
  Normal  Starting                 49m                kubelet, master     Starting kubelet.
  Normal  NodeHasSufficientDisk    49m (x6 over 49m)  kubelet, master     Node master status is now: NodeHasSufficientDisk
  Normal  NodeHasSufficientMemory  49m (x6 over 49m)  kubelet, master     Node master status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    49m (x6 over 49m)  kubelet, master     Node master status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     49m (x5 over 49m)  kubelet, master     Node master status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  49m                kubelet, master     Updated Node Allocatable limit across pods
  Normal  Starting                 48m                kube-proxy, master  Starting kube-proxy.

$ kubectl get pod -n kube-system

NAME                                       READY   STATUS              RESTARTS   AGE
calico-etcd-rlqsz                          0/1     Pending             0          43m
calico-kube-controllers-57c8947c94-wwnk2   1/1     Running             1          43m
calico-node-6gcjs                          0/2     Pending             0          40m
calico-node-lfvcg                          0/2     Pending             0          37m
coredns-576cbf47c7-hkwvn                   0/1     ContainerCreating   0          43m
coredns-576cbf47c7-qbmnp                   0/1     ContainerCreating   0          43m
etcd-master                                1/1     Running             1          42m
kube-apiserver-master                      1/1     Running             1          42m
kube-controller-manager-master             1/1     Running             1          42m
kube-proxy-ld548                           1/1     Running             1          43m
kube-proxy-mlxjt                           1/1     Running             1          37m
kube-proxy-mm9m4                           1/1     Running             1          40m
kube-registry-v0-6d8ff577b8-69ng8          0/1     Pending             0          13m
kube-scheduler-master                      1/1     Running             1          42m

Describing one of the pods shows me the message: Warning NetworkNotReady 9s (x94 over 20m) kubelet, node1 network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]

Full output:

$ kubectl describe pod -n kube-system coredns-576cbf47c7-hkwvn

Name:               coredns-576cbf47c7-hkwvn
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               node1/172.17.4.11
Start Time:         Sun, 11 Nov 2018 18:13:21 +0100
Labels:             k8s-app=kube-dns
                    pod-template-hash=576cbf47c7
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      ReplicaSet/coredns-576cbf47c7
Containers:
  coredns:
    Container ID:
    Image:         k8s.gcr.io/coredns:1.2.2
    Image ID:
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-49zfh (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-49zfh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-49zfh
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  39m (x18 over 42m)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         39m                 default-scheduler  Successfully assigned kube-system/coredns-576cbf47c7-hkwvn to node1
  Warning  NetworkNotReady   24m (x72 over 39m)  kubelet, node1     network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]
  Warning  NetworkNotReady   9s (x94 over 20m)   kubelet, node1     network is not ready: [runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized]

$ kubectl describe pod -n kube-system calico-etcd-rlqsz
Name:               calico-etcd-rlqsz
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             controller-revision-hash=6f7978dc56
                    k8s-app=calico-etcd
                    pod-template-generation=1
Annotations:        scheduler.alpha.kubernetes.io/critical-pod:
Status:             Pending
IP:
Controlled By:      DaemonSet/calico-etcd
Containers:
  calico-etcd:
    Image:      quay.io/coreos/etcd:v3.1.10
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
    Args:
      /usr/local/bin/etcd --name=calico --data-dir=/var/etcd/calico-data --advertise-client-urls=http://$CALICO_ETCD_IP:6666 --listen-client-urls=http://0.0.0.0:6666 --listen-peer-urls=http://0.0.0.0:6667
    Environment:
      CALICO_ETCD_IP:   (v1:status.podIP)
    Mounts:
      /var/etcd from var-etcd (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-k6rpt (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  var-etcd:
    Type:          HostPath (bare host directory volume)
    Path:          /var/etcd
    HostPathType:
  default-token-k6rpt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-k6rpt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  65m (x18 over 68m)   default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  62m (x34 over 65m)   default-scheduler  0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  53m (x119 over 62m)  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match node selector.
  Warning  FailedScheduling  82s (x809 over 46m)  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match node selector.

Baykonur commented 6 years ago

Did you deploy CNI pods? Flannel, weave...?

heidemn commented 6 years ago

Calico seems to be deployed by default when following the instructions. But only 1 of 4 Calico-related pods is running. Further up, find the output of $ kubectl describe pod -n kube-system calico-etcd-rlqsz.

Should I run any CNI-related commands which are not yet included in step-by-step guide?

NAME                                       READY   STATUS              RESTARTS   AGE
calico-etcd-rlqsz                          0/1     Pending             0          43m
calico-kube-controllers-57c8947c94-wwnk2   1/1     Running             1          43m
calico-node-6gcjs                          0/2     Pending             0          40m
calico-node-lfvcg                          0/2     Pending             0          37m

Baykonur commented 6 years ago

I am also participant to the Istio workshop and frankly i did not need to deploy k8s following the first step as I already had my virtual cluster running on my mbp. Only calico controller manager seems to be running successfully and the CNI pods are pending (I am guessing calico-nodes) can you check their logs or do describe?

pipo02mix commented 6 years ago

Looks nodes are not healthy and it can provoke the CNI plugin does not work. I saw the master describe and it looks fine can you send the kubectl describe node node1 output? Also ensure you have last vagrant and virtual box version.

If you still have problems, the workshop can be done using minikube or a cloud provider (GKE gives your 300€ free) if the internet connection allows us.

heidemn commented 5 years ago

Vagrant and Virtual Box are the latest version.

OK thanks, good to know that Minikube works as well. However, here is kubectl describe node node1:

Name:               node1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=node1
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 11 Nov 2018 18:13:21 +0100
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Wed, 14 Nov 2018 09:11:50 +0100   Sun, 11 Nov 2018 18:13:20 +0100   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 14 Nov 2018 09:11:50 +0100   Sun, 11 Nov 2018 18:13:20 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 14 Nov 2018 09:11:50 +0100   Sun, 11 Nov 2018 18:13:20 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 14 Nov 2018 09:11:50 +0100   Sun, 11 Nov 2018 18:13:20 +0100   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 14 Nov 2018 09:11:50 +0100   Sun, 11 Nov 2018 18:13:20 +0100   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
  InternalIP:  172.17.4.11
  Hostname:    node1
Capacity:
 cpu:                1
 ephemeral-storage:  10098468Ki
 hugepages-2Mi:      0
 memory:             2048148Ki
 pods:               110
Allocatable:
 cpu:                1
 ephemeral-storage:  9306748094
 hugepages-2Mi:      0
 memory:             1945748Ki
 pods:               110
System Info:
 Machine ID:                 b5eb18503dc44d058e247b1dba973a39
 System UUID:                C9AB453C-FCBF-4361-9BCD-5B6B2DFDDBCE
 Boot ID:                    88686bae-3dcc-481d-af25-4911f0ab7a7c
 Kernel Version:             4.4.0-138-generic
 OS Image:                   Ubuntu 16.04.5 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.3.2
 Kubelet Version:            v1.12.2
 Kube-Proxy Version:         v1.12.2
PodCIDR:                     192.168.2.0/24
Non-terminated Pods:         (4 in total)
  Namespace                  Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                        ------------  ----------  ---------------  -------------
  kube-system                calico-kube-controllers-57c8947c94-wwnk2    0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                coredns-576cbf47c7-hkwvn                    100m (10%)    0 (0%)      70Mi (3%)        170Mi (8%)
  kube-system                coredns-576cbf47c7-qbmnp                    100m (10%)    0 (0%)      70Mi (3%)        170Mi (8%)
  kube-system                kube-proxy-mm9m4                            0 (0%)        0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests    Limits
  --------  --------    ------
  cpu       200m (20%)  0 (0%)
  memory    140Mi (7%)  340Mi (17%)
Events:
  Type     Reason                   Age    From               Message
  ----     ------                   ----   ----               -------
  Normal   Starting                 4m39s  kubelet, node1     Starting kubelet.
  Normal   NodeHasSufficientDisk    4m39s  kubelet, node1     Node node1 status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory  4m39s  kubelet, node1     Node node1 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    4m39s  kubelet, node1     Node node1 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     4m39s  kubelet, node1     Node node1 status is now: NodeHasSufficientPID
  Warning  Rebooted                 4m39s  kubelet, node1     Node node1 has been rebooted, boot id: 88686bae-3dcc-481d-af25-4911f0ab7a7c
  Normal   NodeAllocatableEnforced  4m38s  kubelet, node1     Updated Node Allocatable limit across pods
  Normal   Starting                 4m31s  kube-proxy, node1  Starting kube-proxy.

cy4n commented 5 years ago

i think tolerations do not match taints. taints are

node.kubernetes.io/not-ready:NoSchedule

but tolerations in the etcd pods are

  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists

you can duplicate this taint (e.g. kubectl edit -n kube-system calico-node-) and replace the effect like this:

  - effect: NoSchedule
    key: node.kubernetes.io/not-ready
    operator: Exists

this will work for calico-node and calico-etcd pods

(it will still result in master being notready)

pipo02mix commented 5 years ago

@cy4n good catch! for some reason, it does not happen in my case, I will add it to a troubleshooting section

cy4n commented 5 years ago

i'll add this to my GiantSwarm application. cough

pipo02mix / why_k8s_can_make_our_life_easier

Trying to start cluster: "network plugin is not ready: cni config uninitialized" #15