Show "Failed to request cluster info, will try again" when kubeadm join

lzw5399 commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:41:54Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:

master: Alibaba Cloud node01: Tencent Cloud
OS (e.g. from /etc/os-release): Both two:

NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
Kernel (e.g. uname -a):

Linux master 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Others:

What happened?

I want to create a cluster with one master and one node. And I initialized master by using below command:

kubeadm init \ --kubernetes-version="v1.14.3" \ --pod-network-cidr="10.244.0.0/16" \ --ignore-preflight-errors="NumCPU"

and then(use root)

cd ~ mkdir .kube cp /etc/kubernetes/admin.conf .kube/config kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

follow the docs, I do the following on the 【node01】(Use --v=4):

kubeadm join 172.19.138.68:6443 --token khng7b.3gg4013ijmfho5lm \ --discovery-token-ca-cert-hash sha256:19d33d8f0961f34ac2f6c1e6854676bcda231bd780d25627e40f241d40800ad5 --v=4

and show error message:

I0617 23:28:46.503478 10405 join.go:427] [preflight] Discovering cluster-info I0617 23:28:46.503558 10405 token.go:200] [discovery] Trying to connect to API Server "172.19.138.68:6443" I0617 23:28:46.504160 10405 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://172.19.138.68:6443" I0617 23:29:16.504617 10405 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.19.138.68:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.19.138.68:6443: i/o timeout] I0617 23:29:51.505186 10405 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.19.138.68:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.19.138.68:6443: i/o timeout] I0617 23:30:26.505281 10405 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.19.138.68:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.19.138.68:6443: i/o timeout]

I have seen #issue: 1613, but I didn't find how to resolve this error correctly.

What you expected to happen?

Should join the cluster.

Others:

pods run well in master

kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-fb8b8dccf-5v4b2 1/1 Running 0 40m coredns-fb8b8dccf-64ddc 1/1 Running 0 40m etcd-master 1/1 Running 0 39m kube-apiserver-master 1/1 Running 0 39m kube-controller-manager-master 1/1 Running 0 39m kube-flannel-ds-amd64-ptd8f 1/1 Running 0 39m kube-proxy-2jrzg 1/1 Running 0 40m kube-scheduler-master 1/1 Running 0 39m

How to reproduce it (as minimally and precisely as possible)?

As described above. Thanks for any help!

neolit123 commented 5 years ago

this seems like a networking issue and not kubeadm's fault. my explanation is that this worker does not have connectivity to the api server (172.19.138.68:6443)

cailiangliang765 commented 5 years ago

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:41:54Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:

master: Alibaba Cloud node01: Tencent Cloud

OS (e.g. from /etc/os-release): Both two:

NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):

Linux master 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Others:

What happened?

I want to create a cluster with one master and one node. And I initialized master by using below command:

kubeadm init --kubernetes-version="v1.14.3" --pod-network-cidr="10.244.0.0/16" --ignore-preflight-errors="NumCPU"

and then(use root)

cd ~ mkdir .kube cp /etc/kubernetes/admin.conf .kube/config kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

follow the docs, I do the following on the 【node01】(Use --v=4):

kubeadm join 172.19.138.68:6443 --token khng7b.3gg4013ijmfho5lm --discovery-token-ca-cert-hash sha256:19d33d8f0961f34ac2f6c1e6854676bcda231bd780d25627e40f241d40800ad5 --v=4

and show error message:

I0617 23:28:46.503478 10405 join.go:427] [preflight] Discovering cluster-info I0617 23:28:46.503558 10405 token.go:200] [discovery] Trying to connect to API Server "172.19.138.68:6443" I0617 23:28:46.504160 10405 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://172.19.138.68:6443" I0617 23:29:16.504617 10405 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.19.138.68:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.19.138.68:6443: i/o timeout] I0617 23:29:51.505186 10405 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.19.138.68:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.19.138.68:6443: i/o timeout] I0617 23:30:26.505281 10405 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.19.138.68:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.19.138.68:6443: i/o timeout]

I have seen #issue: 1613, but I didn't find how to resolve this error correctly.

What you expected to happen?

Should join the cluster.

Others:

pods run well in master

kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-fb8b8dccf-5v4b2 1/1 Running 0 40m coredns-fb8b8dccf-64ddc 1/1 Running 0 40m etcd-master 1/1 Running 0 39m kube-apiserver-master 1/1 Running 0 39m kube-controller-manager-master 1/1 Running 0 39m kube-flannel-ds-amd64-ptd8f 1/1 Running 0 39m kube-proxy-2jrzg 1/1 Running 0 40m kube-scheduler-master 1/1 Running 0 39m

How to reproduce it (as minimally and precisely as possible)?

As described above. Thanks for any help!

I have a similar issue as you in #1611

neolit123 commented 5 years ago

@zhiwen-kooboo did you solve it? please explain what the problem was.

lzw5399 commented 5 years ago

@neolit123 Yes, it is solved. Because my 2 cloud servers are not in an intranet, and the public network ips of these two are not written on the network card, they cannot be directly bound through the public network. I found a post that provides the following solutions, use "iptables", redirect the intranet ip to the master's external network ip：

iptables -t nat -A OUTPUT -d 192.168.0.111 -j DNAT --to-destination 123.123.123.123

nitinsharma985 commented 9 months ago

for me as well it fixed the issue. iptables -t nat -A OUTPUT -d 192.168.0.111 -j DNAT --to-destination 123.123.123.123 my traffic was pointing to private network. so i changed to public traffic first IP --> priavte second ip --> public nat ip

thank you @neolit123

kubernetes / kubeadm