kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
15.93k stars 6.42k forks source link

Create kubeadm token for joining nodes with 24h expiration fails #5997

Closed arcenik closed 3 years ago

arcenik commented 4 years ago

I'm trying to install a kubernetes cluster on my raspberypi 3b+ cluster but it fails as it try to create a token before the api server/container is started

The result of the command that fails:

time /usr/local/bin/kubeadm --kubeconfig /etc/kubernetes/admin.conf --v=5 token create
I0421 18:55:04.612079    7348 token.go:115] [token] validating mixed arguments
I0421 18:55:04.612332    7348 token.go:122] [token] getting Clientsets from kubeconfig file
I0421 18:55:04.621792    7348 token.go:221] [token] loading configurations
I0421 18:55:04.622784    7348 interface.go:384] Looking for default routes with IPv4 addresses
I0421 18:55:04.622852    7348 interface.go:389] Default route transits interface "eth0"
I0421 18:55:04.623387    7348 interface.go:196] Interface eth0 is up
I0421 18:55:04.623865    7348 interface.go:244] Interface "eth0" has 2 addresses :[192.168.1.121/24 fe80::ba27:ebff:fe80:251/64].
I0421 18:55:04.623991    7348 interface.go:211] Checking addr  192.168.1.121/24.
I0421 18:55:04.624038    7348 interface.go:218] IP found 192.168.1.121
I0421 18:55:04.624080    7348 interface.go:250] Found valid IPv4 address 192.168.1.121 for interface "eth0".
I0421 18:55:04.624116    7348 interface.go:395] Found active IP 192.168.1.121 
I0421 18:55:04.625312    7348 feature_gate.go:216] feature gates: &{map[]}
I0421 18:55:04.625459    7348 token.go:233] [token] creating token
timed out waiting for the condition

real    1m15.462s
user    0m0.494s
sys 0m0.068s

The running containers

root@node1:~# docker ps -a
CONTAINER ID        IMAGE                               COMMAND                 CREATED             STATUS              PORTS               NAMES
63d107be1ffb        quay.io/coreos/etcd:v3.3.12-arm64   "/usr/local/bin/etcd"   3 days ago          Up 3 days                               etcd1

The docker images:

root@node1:~# docker image ls
REPOSITORY                                                       TAG                 IMAGE ID            CREATED             SIZE
nginx                                                            1.17                738663692f3d        4 days ago          120MB
calico/cni                                                       v3.13.2             fd06375bf3e2        3 weeks ago         149MB
calico/kube-controllers                                          v3.13.2             bc2c5bdce917        3 weeks ago         54.6MB
calico/node                                                      v3.13.2             cad30e542430        3 weeks ago         87.2MB
coredns/coredns                                                  1.6.9               3dc5aff08f75        4 weeks ago         40.8MB
gcr.io/google-containers/k8s-dns-node-cache                      1.15.10             43b7f6c94fb8        6 weeks ago         92.1MB
gcr.io/google-containers/kube-apiserver                          v1.16.6             cf3e80cd624c        3 months ago        155MB
gcr.io/google-containers/kube-proxy                              v1.16.6             f9ea384ddb34        3 months ago        82.8MB
gcr.io/google-containers/kube-controller-manager                 v1.16.6             dc87acd5b7ac        3 months ago        147MB
gcr.io/google-containers/kube-scheduler                          v1.16.6             acdfb728332f        3 months ago        83.5MB
calico/node                                                      v3.11.1             59916ce499ee        4 months ago        79.6MB
calico/cni                                                       v3.11.1             885a8922605a        4 months ago        139MB
calico/kube-controllers                                          v3.11.1             6ee4f8c13ad2        4 months ago        50.2MB
gcr.io/google-containers/k8s-dns-node-cache                      1.15.8              9f2842ea3a40        5 months ago        92.1MB
gcr.io/google-containers/cluster-proportional-autoscaler-arm64   1.7.1               c6df8d5b14b3        7 months ago        38.8MB
coredns/coredns                                                  1.6.0               ef5d4c725db6        8 months ago        40.4MB
gcr.io/google-containers/cluster-proportional-autoscaler-arm64   1.6.0               6f4b39d1e0bc        11 months ago       49.8MB
quay.io/coreos/etcd                                              v3.3.12-arm64       607c9fca7d12        14 months ago       143MB
gcr.io/google_containers/kubernetes-dashboard-arm64              v1.10.1             80580b451ad1        16 months ago       120MB
quay.io/coreos/etcd                                              v3.3.10-arm64       aea17a2af1ee        18 months ago       143MB
gcr.io/google-containers/pause                                   3.1                 6cf7c80fe444        2 years ago         525kB
gcr.io/google_containers/pause-arm64                             3.1                 6cf7c80fe444        2 years ago         525kB

Environment:

Kubespray version (commit) (git rev-parse --short HEAD): 08a97eec

Network plugin used:

Output and inventory: https://gist.github.com/arcenik/389343799624498addb1edc7cbfa976c

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"): see gist above

Command used to invoke ansible:

Output of ansible run:

Anything else do we need to know:

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

includerandom commented 4 years ago

Have the same issue

ledroide commented 4 years ago

@arcenik and @includerandom : I have the same issue some time, and solved it in the past. This time, my solution does not work anymore.

You can :

I have used Ansible to do run commands on all hosts :

currentinventory=${ANSIBLE_INVENTORY_PATH}/kube-dev/hosts.yaml
ansible -i $currentinventory kubernetes --list-hosts
ansible -i $currentinventory kubernetes --become -m shell -a "mv /var/lib/kubelet/cpu_manager_state /var/lib/kubelet/cpu_manager_state-OLD"
ansible -i $currentinventory kubernetes --become -m systemd -a "name=kubelet daemon_reload=true state=restarted"
ansible-playbook -i $currentinventory --become cluster.yml

This error "Create kubeadm token for joining nodes with 24h expiration" had been reported many times last months, and I did not find a clear explanation of the solutions. I'm still searching. If anyone finds an other solution, please tell us.

My kubeadm output looks like yours :

$ /usr/local/bin/kubeadm --kubeconfig /etc/kubernetes/admin.conf --v=5 token create
I0904 09:15:37.877589   88884 token.go:121] [token] validating mixed arguments
I0904 09:15:37.877920   88884 token.go:130] [token] getting Clientsets from kubeconfig file
I0904 09:15:37.885142   88884 token.go:243] [token] loading configurations
I0904 09:15:37.885649   88884 interface.go:400] Looking for default routes with IPv4 addresses
I0904 09:15:37.885663   88884 interface.go:405] Default route transits interface "eth0"
I0904 09:15:37.886497   88884 interface.go:208] Interface eth0 is up
I0904 09:15:37.886703   88884 interface.go:256] Interface "eth0" has 2 addresses :[10.150.233.41/24 fe80::250:56ff:fe87:2252/64].
I0904 09:15:37.886742   88884 interface.go:223] Checking addr  10.150.233.41/24.
I0904 09:15:37.886752   88884 interface.go:230] IP found 10.150.233.41
I0904 09:15:37.886765   88884 interface.go:262] Found valid IPv4 address 10.150.233.41 for interface "eth0".
I0904 09:15:37.886778   88884 interface.go:411] Found active IP 10.150.233.41 
W0904 09:15:37.886913   88884 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
I0904 09:15:37.886929   88884 token.go:255] [token] creating token
timed out waiting for the condition

This happens even with the last HEAD f1566cb8

Serge

fejta-bot commented 3 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 3 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/5997#issuecomment-703221429): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ledroide commented 2 years ago

Found an other cause for this error : check that the etcd service is running on the host as a daemon OR as a container ; but not twice. Check what process binds to port 2380. If etcd is configured twice, choose one only, and disable the other one.