kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.72k stars 709 forks source link

kubeadm init stuck on "First node has registered, but is not ready yet" #212

Closed jimmycuadra closed 7 years ago

jimmycuadra commented 7 years ago

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kubeadm

Is this a BUG REPORT or FEATURE REQUEST? (choose one): bug report

Kubernetes version (use kubectl version): 1.6.0

Environment:

What happened:

Following the kubeadm getting started guide exactly:

# kubeadm init --apiserver-cert-extra-sans redacted --pod-network-cidr 10.244.0.0/16
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.6.0
[init] Using Authorization mode: RBAC
[preflight] Running pre-flight checks
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [kube-01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local redacted] and IPs [10.96.0.1 10.0.1.101]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 206.956919 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet

That last message, "First node has registered, but is not ready yet" repeats infinitely, and kubeadm never finishes. I connected to the master server in another session to see if all the Docker containers were running as expected and they are:

$ docker ps
CONTAINER ID        IMAGE                                                                                                                          COMMAND                  CREATED             STATUS              PORTS               NAMES
54733aa1aae3        gcr.io/google_containers/kube-controller-manager-arm@sha256:22f30303212b276b6868b89c8e92c5fb2cb93641e59c312b254c6cb0fa111b2a   "kube-controller-mana"   10 minutes ago      Up 10 minutes                           k8s_kube-controller-manager_kube-controller-manager-kube-01_kube-system_d44abf63e3ab24853ab86643e0b96d81_0
55b6bf2cc09e        gcr.io/google_containers/etcd-arm@sha256:0ce1dcd85968a3242995dfc168abba2c3bc03d0e3955f52a0b1e79f90039dcf2                      "etcd --listen-client"   11 minutes ago      Up 11 minutes                           k8s_etcd_etcd-kube-01_kube-system_90ab26991bf9ad676a430c7592d08bee_0
bd0dc34d5e77        gcr.io/google_containers/kube-apiserver-arm@sha256:c54b8c609a6633b5397173c763aba0656c6cb2601926cce5a5b4870d58ba67bd            "kube-apiserver --ins"   12 minutes ago      Up 12 minutes                           k8s_kube-apiserver_kube-apiserver-kube-01_kube-system_4d99c225ec157dc715c26b59313aeac8_1
1c4c7b69a3eb        gcr.io/google_containers/kube-scheduler-arm@sha256:827449ef1f3d8c0a54d842af9d6528217ccd2d36cc2b49815d746d41c7302050            "kube-scheduler --kub"   13 minutes ago      Up 13 minutes                           k8s_kube-scheduler_kube-scheduler-kube-01_kube-system_3ef1979df7569495bb727d12ac1a7a6f_0
4fd0635f9439        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-controller-manager-kube-01_kube-system_d44abf63e3ab24853ab86643e0b96d81_0
cfb4a758ad96        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_etcd-kube-01_kube-system_90ab26991bf9ad676a430c7592d08bee_0
a631d8b6c11c        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-scheduler-kube-01_kube-system_3ef1979df7569495bb727d12ac1a7a6f_0
309b62fff122        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-apiserver-kube-01_kube-system_4d99c225ec157dc715c26b59313aeac8_0

I copied the admin kubeconfig to my local machine and used kubectl (1.6.0) to see what was going on with the node kubeadm was claiming was registered:

$ kubectl describe node kube-01
Name:           kube-01
Role:
Labels:         beta.kubernetes.io/arch=arm
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=kube-01
Annotations:        node.alpha.kubernetes.io/ttl=0
            volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:         <none>
CreationTimestamp:  Tue, 28 Mar 2017 22:06:40 -0700
Phase:
Conditions:
  Type          Status  LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------  -----------------           ------------------          ------              -------
  OutOfDisk         False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  MemoryPressure    False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletHasNoDiskPressure    kubelet has no disk pressure
  Ready         False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletNotReady         runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:      10.0.1.101,10.0.1.101,kube-01
Capacity:
 cpu:       4
 memory:    882632Ki
 pods:      110
Allocatable:
 cpu:       4
 memory:    780232Ki
 pods:      110
System Info:
 Machine ID:            9989a26f06984d6dbadc01770f018e3b
 System UUID:           9989a26f06984d6dbadc01770f018e3b
 Boot ID:           7a77e2e8-dd62-4989-b9e7-0fb52747162a
 Kernel Version:        4.4.50-hypriotos-v7+
 OS Image:          Raspbian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          arm
 Container Runtime Version: docker://1.12.6
 Kubelet Version:       v1.6.0
 Kube-Proxy Version:        v1.6.0
PodCIDR:            10.244.0.0/24
ExternalID:         kube-01
Non-terminated Pods:        (4 in total)
  Namespace         Name                        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                        ------------    ----------  --------------- -------------
  kube-system           etcd-kube-01                0 (0%)      0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-apiserver-kube-01          250m (6%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-controller-manager-kube-01     200m (5%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-scheduler-kube-01          100m (2%)   0 (0%)      0 (0%)      0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ------------  ----------  --------------- -------------
  550m (13%)    0 (0%)      0 (0%)      0 (0%)
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  14m       14m     1   kubelet, kube-01            Normal      Starting        Starting kubelet.
  14m       10m     55  kubelet, kube-01            Normal      NodeHasSufficientDisk   Node kube-01 status is now: NodeHasSufficientDisk
  14m       10m     55  kubelet, kube-01            Normal      NodeHasSufficientMemory Node kube-01 status is now: NodeHasSufficientMemory
  14m       10m     55  kubelet, kube-01            Normal      NodeHasNoDiskPressure   Node kube-01 status is now: NodeHasNoDiskPressure

This uncovered the reason the kubelet was not ready:

"runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config"

In my experiments with kubeadm 1.5, CNI was not needed to bring up the master node, so this is surprising. Even the getting started guide suggests that kubeadm init should finish successfully before you move on to deploying a CNI plugin.

Anyway, I deployed flannel using kubectl from my local machine:

$ kubectl apply -f kube-flannel.yml

Where the contents of the file was:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.7.0-amd64
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: quay.io/coreos/flannel:v0.7.0-amd64
        command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

But it never scheduled:

$ kubectl describe ds kube-flannel-ds -n kube-system
Name:       kube-flannel-ds
Selector:   app=flannel,tier=node
Node-Selector:  beta.kubernetes.io/arch=amd64
Labels:     app=flannel
        tier=node
Annotations:    kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-ds","n...
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       app=flannel
            tier=node
  Service Account:  flannel
  Containers:
   kube-flannel:
    Image:  quay.io/coreos/flannel:v0.7.0-amd64
    Port:
    Command:
      /opt/bin/flanneld
      --ip-masq
      --kube-subnet-mgr
    Environment:
      POD_NAME:      (v1:metadata.name)
      POD_NAMESPACE:     (v1:metadata.namespace)
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run from run (rw)
   install-cni:
    Image:  quay.io/coreos/flannel:v0.7.0-amd64
    Port:
    Command:
      /bin/sh
      -c
      set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
  Volumes:
   run:
    Type:   HostPath (bare host directory volume)
    Path:   /run
   cni:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/cni/net.d
   flannel-cfg:
    Type:   ConfigMap (a volume populated by a ConfigMap)
    Name:   kube-flannel-cfg
    Optional:   false
Events:     <none>

I tried to join one of the other servers anyway, just to see what would happen. I used kubeadm token create to manually create a token that I could use from another machine. On the other machine:

kubeadm join --token $TOKEN 10.0.1.101:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.0.1.101:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.1.101:6443"
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]

And the final message repeated forever.

What you expected to happen:

kubeadm init should complete and produce a bootstrap token.

ReSearchITEng commented 7 years ago

Thanks all for help. Finally fully working k8s 1.6.1 with flannel. Everything is now in ansible playbooks.
Tested on Centos/RHEL. Preparations started for Debian based also (e.g. Ubuntu), but there might needs some refining.

https://github.com/ReSearchITEng/kubeadm-playbook/blob/master/README.md

PS: work based on sjenning/kubeadm-playbook - Many thanks @sjenning

joaquin386 commented 6 years ago

Geeting this for joining into a cluster: [discovery] Created cluster-info discovery client, requesting info from "https://10.100.2.158:6443" [discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get configmaps in the namespace "kube-public"] [discovery] Failed to request cluster info, will try again: [configmaps "cluster-info" is forbidden: User "system:anonymous" cannot get configmaps in the namespace "kube-public"]

I started the node as SelfHosting.