kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 716 forks source link

kubeadm init stuck on "First node has registered, but is not ready yet" #212

Closed jimmycuadra closed 7 years ago

jimmycuadra commented 7 years ago

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kubeadm

Is this a BUG REPORT or FEATURE REQUEST? (choose one): bug report

Kubernetes version (use kubectl version): 1.6.0

Environment:

What happened:

Following the kubeadm getting started guide exactly:

# kubeadm init --apiserver-cert-extra-sans redacted --pod-network-cidr 10.244.0.0/16
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.6.0
[init] Using Authorization mode: RBAC
[preflight] Running pre-flight checks
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [kube-01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local redacted] and IPs [10.96.0.1 10.0.1.101]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 206.956919 seconds
[apiclient] Waiting for at least one node to register and become ready
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet

That last message, "First node has registered, but is not ready yet" repeats infinitely, and kubeadm never finishes. I connected to the master server in another session to see if all the Docker containers were running as expected and they are:

$ docker ps
CONTAINER ID        IMAGE                                                                                                                          COMMAND                  CREATED             STATUS              PORTS               NAMES
54733aa1aae3        gcr.io/google_containers/kube-controller-manager-arm@sha256:22f30303212b276b6868b89c8e92c5fb2cb93641e59c312b254c6cb0fa111b2a   "kube-controller-mana"   10 minutes ago      Up 10 minutes                           k8s_kube-controller-manager_kube-controller-manager-kube-01_kube-system_d44abf63e3ab24853ab86643e0b96d81_0
55b6bf2cc09e        gcr.io/google_containers/etcd-arm@sha256:0ce1dcd85968a3242995dfc168abba2c3bc03d0e3955f52a0b1e79f90039dcf2                      "etcd --listen-client"   11 minutes ago      Up 11 minutes                           k8s_etcd_etcd-kube-01_kube-system_90ab26991bf9ad676a430c7592d08bee_0
bd0dc34d5e77        gcr.io/google_containers/kube-apiserver-arm@sha256:c54b8c609a6633b5397173c763aba0656c6cb2601926cce5a5b4870d58ba67bd            "kube-apiserver --ins"   12 minutes ago      Up 12 minutes                           k8s_kube-apiserver_kube-apiserver-kube-01_kube-system_4d99c225ec157dc715c26b59313aeac8_1
1c4c7b69a3eb        gcr.io/google_containers/kube-scheduler-arm@sha256:827449ef1f3d8c0a54d842af9d6528217ccd2d36cc2b49815d746d41c7302050            "kube-scheduler --kub"   13 minutes ago      Up 13 minutes                           k8s_kube-scheduler_kube-scheduler-kube-01_kube-system_3ef1979df7569495bb727d12ac1a7a6f_0
4fd0635f9439        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-controller-manager-kube-01_kube-system_d44abf63e3ab24853ab86643e0b96d81_0
cfb4a758ad96        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_etcd-kube-01_kube-system_90ab26991bf9ad676a430c7592d08bee_0
a631d8b6c11c        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-scheduler-kube-01_kube-system_3ef1979df7569495bb727d12ac1a7a6f_0
309b62fff122        gcr.io/google_containers/pause-arm:3.0                                                                                         "/pause"                 14 minutes ago      Up 14 minutes                           k8s_POD_kube-apiserver-kube-01_kube-system_4d99c225ec157dc715c26b59313aeac8_0

I copied the admin kubeconfig to my local machine and used kubectl (1.6.0) to see what was going on with the node kubeadm was claiming was registered:

$ kubectl describe node kube-01
Name:           kube-01
Role:
Labels:         beta.kubernetes.io/arch=arm
            beta.kubernetes.io/os=linux
            kubernetes.io/hostname=kube-01
Annotations:        node.alpha.kubernetes.io/ttl=0
            volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:         <none>
CreationTimestamp:  Tue, 28 Mar 2017 22:06:40 -0700
Phase:
Conditions:
  Type          Status  LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------  -----------------           ------------------          ------              -------
  OutOfDisk         False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  MemoryPressure    False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletHasSufficientMemory  kubelet has sufficient memory available
  DiskPressure      False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletHasNoDiskPressure    kubelet has no disk pressure
  Ready         False   Tue, 28 Mar 2017 22:17:24 -0700     Tue, 28 Mar 2017 22:06:40 -0700     KubeletNotReady         runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:      10.0.1.101,10.0.1.101,kube-01
Capacity:
 cpu:       4
 memory:    882632Ki
 pods:      110
Allocatable:
 cpu:       4
 memory:    780232Ki
 pods:      110
System Info:
 Machine ID:            9989a26f06984d6dbadc01770f018e3b
 System UUID:           9989a26f06984d6dbadc01770f018e3b
 Boot ID:           7a77e2e8-dd62-4989-b9e7-0fb52747162a
 Kernel Version:        4.4.50-hypriotos-v7+
 OS Image:          Raspbian GNU/Linux 8 (jessie)
 Operating System:      linux
 Architecture:          arm
 Container Runtime Version: docker://1.12.6
 Kubelet Version:       v1.6.0
 Kube-Proxy Version:        v1.6.0
PodCIDR:            10.244.0.0/24
ExternalID:         kube-01
Non-terminated Pods:        (4 in total)
  Namespace         Name                        CPU Requests    CPU Limits  Memory Requests Memory Limits
  ---------         ----                        ------------    ----------  --------------- -------------
  kube-system           etcd-kube-01                0 (0%)      0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-apiserver-kube-01          250m (6%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-controller-manager-kube-01     200m (5%)   0 (0%)      0 (0%)      0 (0%)
  kube-system           kube-scheduler-kube-01          100m (2%)   0 (0%)      0 (0%)      0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests Memory Limits
  ------------  ----------  --------------- -------------
  550m (13%)    0 (0%)      0 (0%)      0 (0%)
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  14m       14m     1   kubelet, kube-01            Normal      Starting        Starting kubelet.
  14m       10m     55  kubelet, kube-01            Normal      NodeHasSufficientDisk   Node kube-01 status is now: NodeHasSufficientDisk
  14m       10m     55  kubelet, kube-01            Normal      NodeHasSufficientMemory Node kube-01 status is now: NodeHasSufficientMemory
  14m       10m     55  kubelet, kube-01            Normal      NodeHasNoDiskPressure   Node kube-01 status is now: NodeHasNoDiskPressure

This uncovered the reason the kubelet was not ready:

"runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config"

In my experiments with kubeadm 1.5, CNI was not needed to bring up the master node, so this is surprising. Even the getting started guide suggests that kubeadm init should finish successfully before you move on to deploying a CNI plugin.

Anyway, I deployed flannel using kubectl from my local machine:

$ kubectl apply -f kube-flannel.yml

Where the contents of the file was:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "type": "flannel",
      "delegate": {
        "isDefaultGateway": true
      }
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      hostNetwork: true
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      serviceAccountName: flannel
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.7.0-amd64
        command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
        securityContext:
          privileged: true
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      - name: install-cni
        image: quay.io/coreos/flannel:v0.7.0-amd64
        command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ]
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
        - name: run
          hostPath:
            path: /run
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg

But it never scheduled:

$ kubectl describe ds kube-flannel-ds -n kube-system
Name:       kube-flannel-ds
Selector:   app=flannel,tier=node
Node-Selector:  beta.kubernetes.io/arch=amd64
Labels:     app=flannel
        tier=node
Annotations:    kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-ds","n...
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       app=flannel
            tier=node
  Service Account:  flannel
  Containers:
   kube-flannel:
    Image:  quay.io/coreos/flannel:v0.7.0-amd64
    Port:
    Command:
      /opt/bin/flanneld
      --ip-masq
      --kube-subnet-mgr
    Environment:
      POD_NAME:      (v1:metadata.name)
      POD_NAMESPACE:     (v1:metadata.namespace)
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run from run (rw)
   install-cni:
    Image:  quay.io/coreos/flannel:v0.7.0-amd64
    Port:
    Command:
      /bin/sh
      -c
      set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
  Volumes:
   run:
    Type:   HostPath (bare host directory volume)
    Path:   /run
   cni:
    Type:   HostPath (bare host directory volume)
    Path:   /etc/cni/net.d
   flannel-cfg:
    Type:   ConfigMap (a volume populated by a ConfigMap)
    Name:   kube-flannel-cfg
    Optional:   false
Events:     <none>

I tried to join one of the other servers anyway, just to see what would happen. I used kubeadm token create to manually create a token that I could use from another machine. On the other machine:

kubeadm join --token $TOKEN 10.0.1.101:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.0.1.101:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.1.101:6443"
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]
[discovery] Failed to request cluster info, will try again: [User "system:anonymous" cannot get configmaps in the namespace "kube-public". (get configmaps cluster-info)]

And the final message repeated forever.

What you expected to happen:

kubeadm init should complete and produce a bootstrap token.

racingmars commented 7 years ago

Exact same thing happening to me on Ubuntu 16.04.02, both GCE and local VMWare installations, Docker version 1.12.6, kernel 4.8.0-44-generic 47~16.04.1-Ubuntu SMP.

The kubelet log shows a warning about missing /etc/cni/net.d before the error that we see in jimmycuadra's report:

Mar 29 04:43:25 instance-1 kubelet[6800]: W0329 04:43:25.763117    6800 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 29 04:43:25 instance-1 kubelet[6800]: E0329 04:43:25.763515    6800 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
csarora commented 7 years ago

Same issue on Ubuntu AWS VM. Docker 1.12.5

root@ip-10-43-0-20:~# kubeadm version kubeadm version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5"

root@ip-10-43-0-20:~# uname -a Linux ip-10-43-0-20 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

root@ip-10-43-0-20:~# kubeadm init --config cfg.yaml [kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters. [init] Using Kubernetes version: v1.6.0 [init] Using Authorization mode: RBAC [init] WARNING: For cloudprovider integrations to work --cloud-provider must be set for all kubelets in the cluster. (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf should be edited for this purpose) [preflight] Running pre-flight checks [preflight] Starting the kubelet service [certificates] Generated CA certificate and key. [certificates] Generated API server certificate and key. [certificates] API Server serving cert is signed for DNS names [ip-10-43-0-20 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.43.0.20] [certificates] Generated API server kubelet client certificate and key. [certificates] Generated service account token signing key and public key. [certificates] Generated front-proxy CA certificate and key. [certificates] Generated front-proxy client certificate and key. [certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf" [apiclient] Created API client, waiting for the control plane to become ready [apiclient] All control plane components are healthy after 16.531681 seconds [apiclient] Waiting for at least one node to register and become ready [apiclient] First node has registered, but is not ready yet [apiclient] First node has registered, but is not ready yet [apiclient] First node has registered, but is not ready yet

omazilov commented 7 years ago

++ the same issue (Ubuntu 16.04.1)

antoinefink commented 7 years ago

Same thing here on Ubuntu 16.04

rmohr commented 7 years ago

On CentOS 7, I downgraded the kubelet to 1.5.4. That solved it for me. It seems like the ready check works different in the 1.6.0 kubelet.

vascofg commented 7 years ago

Same issue on CentOS 7 on bare metal x64 machine, since upgrading to k8s 1.6.0

lowstz commented 7 years ago

Same issue on Ubuntu 16.04

ctrlaltdel commented 7 years ago

Same issue on Ubuntu 16.04, manually downgrading the kubelet package solved the issue.

# apt install kubelet=1.5.6-00
Scukerman commented 7 years ago

@ctrlaltdel it didn't work for me.

jbeda commented 7 years ago

I suspect this is a Kubelet issue. It shouldn't mark node as not ready when CNI is unconfigured. Only pods that require CNI should be marked as not ready.

kristiandrucker commented 7 years ago

@jbeda Do you know when will this issue be resolved?

jbeda commented 7 years ago

@kristiandrucker -- no -- still figuring out what is going on. Need to root cause it first.

kristiandrucker commented 7 years ago

@jbeda Ok, but after the issue will be resolved, then what? Rebuild kubelet from source?

jbeda commented 7 years ago

@kristiandrucker This'll have to go out in a point release of k8s if it is a kubelet issue.

I suspect that https://github.com/kubernetes/kubernetes/pull/43474 is the root cause. Going to file a bug and follow up with the network people.

@dcbw You around?

dcbw commented 7 years ago

Looks like the issue is that a DaemonSet is not scheduled to nodes that have the NetworkReady:false condition, because the checks for scheduling pods are not fine-grained enough. We need to fix that; a pod that is hostNetwork:true should be scheduled on a node that is NetworkReady:false, but a hostNetwork:false pod should not.

As a workaround, does adding the scheduler.alpha.kubernetes.io/critical-pod annotation on your DaemonSet make things work again?

0xmichalis commented 7 years ago

@janetkuo @lukaszo can you triage the DS behavior?

dewet22 commented 7 years ago

There is also an ongoing discussion in #sig-network on slack, btw.

prapdm commented 7 years ago

Same issue CentOS 7 x64

errordeveloper commented 7 years ago

@prapdm this appears to undefended of what distro you are running.

prapdm commented 7 years ago

CentOS Linux release 7.3.1611 (Core)

lukaszo commented 7 years ago

I've tried it on one node with Ubuntu 16.04. It hangs with the "not ready yet" message. I also manually created flannel DaemonSet but in my case it scheduled one pod without any problem. The daemon pod itself went in to the CrashLoopBackOff with error: E0329 22:57:03.065651 1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-z3xgn': the server does not allow access to the requested resource (get pods kube-flannel-ds-z3xgn)

I will try on Centos also but I don't think that DaemonSet is to blame here, kubeadm hangs here.

mikedanese commented 7 years ago

that is an rbac permission error.

lukaszo commented 7 years ago

@jimmycuadra I've just noticed that you are running it on raspberry pi which has an arm processor.

For flannel daemon set you have:

        beta.kubernetes.io/arch: amd64

but your node is labeled with:

beta.kubernetes.io/arch=arm

So DaemonSet can not lunch pod on this node, just change the node selector and it will work. You will still get the error with rbac permission but maybe @mikedanese will tell you how to fix it because I don't know it.

jimmycuadra commented 7 years ago

Ah, thanks @lukaszo! I wasn't following the RPi-specific guide this time (which I used for k8s 1.5) and forgot that step. I would've discovered it when the daemon set errored, but as it turns out I didn't get that far. :}

geoffmunn commented 7 years ago

I'm also seeing this problem when I follow the instructions as described here: https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/

jcleira commented 7 years ago

managed to get it working after installing the right flannel network pod.

I think that @jimmycuadra might get it working with @lukaszo comment.

When the message [apiclient] First node has registered, but is not ready yet start flooding the kubernetes API server would be running so you can:

curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | kubectl create -f -

For the raspberry pi install:

curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | sed "s/amd64/arm/g" | kubectl create -f -

Then it will finish:

[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node has registered, but is not ready yet
[apiclient] First node is ready after 245.050597 seconds
[apiclient] Test deployment succeeded
[token] Using token: 4dc99e............
[apiconfig] Created RBAC rules
[addons] Created essential addon: kube-proxy
[addons] Created essential addon: kube-dns

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 4dc99e........... 192.168.1.200:6443
thelastworm commented 7 years ago

I had the same issue and i fixed this way : you should be root

in the 1.6.0 of kubeadm you should remove the environment variable $KUBELET_NETWORK_ARGS in the system file : /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

then restart demons

systemctl daemon-reload

kubeadm init

this take a little while ... after success

download the network add-on you want to use : http://kubernetes.io/docs/admin/addons/

calico seems to be the best one, not sure but still in test for me.

MaximF commented 7 years ago

@thelastworm I just tried to do it, and it didn't work. Ubuntu 16.04.2 LTS, kubeadm 1.6.0 I did the following steps:

  1. edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and remove $KUBELET_NETWORK_ARGS
  2. kubeadm reset to clean up previous attempt to start it
  3. kubeadm init --token=<VALUE> --apiserver-advertise-address=<IP>

[EDITED] It worked after @srinat999 pointed to a necessity of running systemctl daemon-reload before kubeadm init

Noddy76 commented 7 years ago

@jcorral's solution worked for me with one change to the flannel deployment since the insecure API port is no longer created by kubeadm.

curl -sSL https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml | \
kubectl --kubeconfig /etc/kubernetes/admin.conf create -f -
srinat999 commented 7 years ago

@MaximF You have to do systemctl daemon-reload after changing the conf file. Worked for me.

silentred commented 7 years ago

@jcorral Your solution works for me. Thanks.

thelastworm commented 7 years ago

@MaximF i just add the restart demon command line

haribole commented 7 years ago

kubeadm init completes successfully, but when I check the version, get the following error:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port?

thelastworm commented 7 years ago

@haribole You should set the KUBECONFIG env var

ghost commented 7 years ago

Has anyone got Flannel to run after the workarounds related to CNI? I can get passed the not ready issue, but when I run Flannel, I get an error that looks like this:

Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-g5cbj': the server does not allow access to the requested resource (get pods kube-flannel-ds-g5cbj)

Pods status shows "CrashLoopBackOff"

mikedanese commented 7 years ago

You need to add rbac roles to authorize flannel to read from the API.

amacneil commented 7 years ago

You need to add rbac roles to authorize flannel to read from the API.

In case anyone else is wondering what this means, it looks like you need to create kube-flannel-rbac.yml before you create flannel:

kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
MaximF commented 7 years ago

I think because a root issue is solved and related ticket is closed, we should close this one as well :)

m4r10k commented 7 years ago

Just for information: It is working for me now with the updated packages under Ubuntu 16.04.

jimmycuadra commented 7 years ago

1.6.1 works for me! Thanks to everyone that helped get this fix out!

eastcirclek commented 7 years ago

I successfully setup my Kubernetes cluster on centos-release-7-3.1611.el7.centos.x86_64 by taking the following steps (I assume Docker is already installed):

1) (from /etc/yum.repo.d/kubernetes.repo) baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64-unstable => To use the unstable repository for the latest Kubernetes 1.6.1 2) yum install -y kubelet kubeadm kubectl kubernetes-cni 3) (/etc/systemd/system/kubelet.service.d/10-kubeadm.conf) add "--cgroup-driver=systemd" at the end of the last line. => This is because Docker uses systemd for cgroup-driver while kubelet uses cgroupfs for cgroup-driver.
4) systemctl enable kubelet && systemctl start kubelet 5) kubeadm init --pod-network-cidr 10.244.0.0/16 => If you used to add --api-advertise-addresses, you need to use --apiserver-advertise-address instead. 6) cp /etc/kubernetes/admin.conf $HOME/ sudo chown $(id -u):$(id -g) $HOME/admin.conf export KUBECONFIG=$HOME/admin.conf => Without this step, you might get an error with kubectl get => I didn't do it with 1.5.2 7) kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml => 1.6.0 introduces a role-based access control so you should add a ClusterRole and a ClusterRoleBinding before creating a Flannel daemonset 8) kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml => Create a Flannel daemonset 9) (on every slave node) kubeadm join --token (your token) (ip):(port) => as shown in the result of kubeadm init

All the above steps are a result of combining suggestions from various issues around Kubernetes-1.6.0, especially kubeadm.

Hope this will save your time.

xilu0 commented 7 years ago

@eastcirclek @Sliim You are great

jralmaraz commented 7 years ago

@eastcirclek this were the exact steps that I have just executed by querying several forums too. A timezone difference, maybe? Thanks everyone, this topic was really helpful.

overip commented 7 years ago

I have Ubuntu 16.04 server on AWS and followed the steps

  1. edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and remove $KUBELET_NETWORK_ARGS
  2. kubeadm reset to clean up previous attempt to start it
  3. kubeadm init --token= --apiserver-advertise-address=

which apparently worked correctly, but then when I try to install Calico as network plugin, I get the following error The connection to the server localhost:8080 was refused - did you specify the right host or port?

Is the k8s team working on a patch?

Thanks

jimmycuadra commented 7 years ago

@overip I don't think any patch is required for that... You just need to specify the right kubeconfig file when using kubectl. kubeadm should have written it to /etc/kubernetes/admin.conf.

overip commented 7 years ago

@jimmycuadra could you please explain the steps to do that?

jimmycuadra commented 7 years ago

@overip The output of kubeadm init have the instructions:

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

Personally, I prefer to copy the file to $HOME/.kube/config, which is where kubectl will look for it by default. Then you don't need to set the KUBECONFIG environment variable.

If you are planning to use kubectl from your local machine, you can use scp (or even just copy paste the contents) to write it to ~/.kube/config on your own computer.

Search for "admin.conf" in this GitHub issue for more details. It's been mentioned a few times.

ReSearchITEng commented 7 years ago

@eastcirclek - followed the steps, but for some reason the nodes are not able to install flannel properly. (Note: on master everything is smooth.)

Apr 13 22:31:11 node2 kubelet[22893]: I0413 22:31:11.666206   22893 kuberuntime_manager.go:458] Container {Name:install-cni Image:quay.io/coreos/flannel:v0.7.0-amd64 Command:[/bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:cni ReadOnly:false MountPath:/etc/cni/net.d SubPath:} {Name:flannel-cfg ReadOnly:false MountPath:/etc/kube-flannel/ SubPath:} {Name:flannel-token-g65nf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr 13 22:31:11 node2 kubelet[22893]: I0413 22:31:11.666280   22893 kuberuntime_manager.go:742] checking backoff for container "install-cni" in pod "kube-flannel-ds-3smf7_kube-system(2e6ad0f9-207f-11e7-8f34-0050569120ff)"
Apr 13 22:31:12 node2 kubelet[22893]: I0413 22:31:12.846325   22893 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/2e6ad0f9-207f-11e7-8f34-0050569120ff-flannel-cfg" (spec.Name: "flannel-cfg") pod "2e6ad0f9-207f-11e7-8f34-0050569120ff" (UID: "2e6ad0f9-207f-11e7-8f34-0050569120ff").
Apr 13 22:31:12 node2 kubelet[22893]: I0413 22:31:12.846373   22893 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/2e6ad0f9-207f-11e7-8f34-0050569120ff-flannel-token-g65nf" (spec.Name: "flannel-token-g65nf") pod "2e6ad0f9-207f-11e7-8f34-0050569120ff" (UID: "2e6ad0f9-207f-11e7-8f34-0050569120ff").
luckyfengyong commented 7 years ago

Just share my workaround method. Firstly $KUBELET_NETWORK_ARGS is required, otherwise CNI is not enabled/configured. Removing and then restoring $KUBELET_NETWORK_ARGS seems too complicated. When kubeadm init shows "[apiclient] First node has registered, but is not ready yet", the k8s cluster actually is ready to serve request. At that time, user could simply move to step 3/4 of https://kubernetes.io/docs/getting-started-guides/kubeadm/ as follows.

To start using your cluster, you need to run (as a regular user):

  sudo cp /etc/kubernetes/admin.conf $HOME/
  sudo chown $(id -u):$(id -g) $HOME/admin.conf
  export KUBECONFIG=$HOME/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:  http://kubernetes.io/docs/admin/addons/

When a user installs the podnetwork, please make sure the serviceaccount of podnetwork policy is granted enough permission. Taking flannel as an example. I just bind cluster-admin role to service account of flannel as follows. It may not be ideal, and you could define a specific role for flannel serviceacount. BTW, when a user deploy other addon service like dashboard, it also requires to grant enough permission to the related serviceaccount.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: flannel:daemonset
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: flannel
  namespace:  kube-system

After the podnetwork server is ready, kubeadm init will shows the node is ready, and the the user could continue with the instrution.

0xmichalis commented 7 years ago

Taking flannel as an example. I just bind cluster-admin role to service account of flannel as follows. It may not be ideal, and you could define a specific role for flannel serviceacount.

There is https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml already