kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 716 forks source link

kubeadm join failed: unable to fetch the kubeadm-config ConfigMap #1596

Closed omegazeng closed 5 years ago

omegazeng commented 5 years ago

Is this a request for help?

yes

What keywords did you search in kubeadm issues before filing this one?

kubeadm join unable to fetch the kubeadm-config ConfigMap controlPlaneEndpoint

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:20:34Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:

- **Cloud provider or hardware configuration**:
vmware VMs: 32 vcpu, 32g memory
- **OS** (e.g. from /etc/os-release):
```bash
cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

What happened?

create join token on master node:

kubeadm token create --print-join-command
kubeadm join k8s-master.HIDE.xyz:6443 --token xxx.yyyyyy     --discovery-token-ca-cert-hash sha256:zzzzzzzzzzzzzzzzzzzzzz

add new worker node

kubeadm join k8s-master.HIDE.xyz:6443 --token xxx.yyyyyy   -v 3  --discovery-token-ca-cert-hash sha256:zzzzzzzzzzzzzzzzzzzzzz
I0605 10:27:54.668924   24149 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName
I0605 10:27:54.669026   24149 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock
[preflight] Running pre-flight checks
I0605 10:27:54.669137   24149 preflight.go:90] [preflight] Running general checks
I0605 10:27:54.669201   24149 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests
I0605 10:27:54.669248   24149 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf
I0605 10:27:54.669257   24149 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0605 10:27:54.669268   24149 checks.go:105] validating the container runtime
I0605 10:27:54.745250   24149 checks.go:131] validating if the service is enabled and active
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
I0605 10:27:54.839995   24149 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0605 10:27:54.840049   24149 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0605 10:27:54.840077   24149 checks.go:653] validating whether swap is enabled or not
I0605 10:27:54.840105   24149 checks.go:382] validating the presence of executable ip
I0605 10:27:54.840129   24149 checks.go:382] validating the presence of executable iptables
I0605 10:27:54.840156   24149 checks.go:382] validating the presence of executable mount
I0605 10:27:54.840171   24149 checks.go:382] validating the presence of executable nsenter
I0605 10:27:54.840188   24149 checks.go:382] validating the presence of executable ebtables
I0605 10:27:54.840206   24149 checks.go:382] validating the presence of executable ethtool
I0605 10:27:54.840238   24149 checks.go:382] validating the presence of executable socat
I0605 10:27:54.840260   24149 checks.go:382] validating the presence of executable tc
I0605 10:27:54.840274   24149 checks.go:382] validating the presence of executable touch
I0605 10:27:54.840299   24149 checks.go:524] running all checks
I0605 10:27:54.873438   24149 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0605 10:27:54.874069   24149 checks.go:622] validating kubelet version
I0605 10:27:54.936806   24149 checks.go:131] validating if the service is enabled and active
I0605 10:27:54.946640   24149 checks.go:209] validating availability of port 10250
I0605 10:27:54.946833   24149 checks.go:292] validating the existence of file /etc/kubernetes/pki/ca.crt
I0605 10:27:54.946849   24149 checks.go:439] validating if the connectivity type is via proxy or direct
I0605 10:27:54.946885   24149 join.go:427] [preflight] Discovering cluster-info
I0605 10:27:54.947024   24149 token.go:200] [discovery] Trying to connect to API Server "k8s-master.HIDE.xyz:6443"
I0605 10:27:54.947792   24149 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://k8s-master.HIDE.xyz:6443"
I0605 10:28:04.982049   24149 token.go:141] [discovery] Requesting info from "https://k8s-master.HIDE.xyz:6443" again to validate TLS against the pinned public key
I0605 10:28:15.008018   24149 token.go:164] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "k8s-master.HIDE.xyz:6443"
I0605 10:28:15.008056   24149 token.go:206] [discovery] Successfully established connection with API Server "k8s-master.HIDE.xyz:6443"
I0605 10:28:15.008090   24149 join.go:441] [preflight] Fetching init configuration
I0605 10:28:15.008104   24149 join.go:474] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://master.HIDE.xyz:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp: lookup master.HIDE.xyz on 114.114.114.114:53: no such host

https://github.com/kubernetes/kubeadm/issues/1447#issuecomment-490434779 I know "master.HIDE.xyz" can not resolve, because I have changed controlPlaneEndpoint "master.HIDE.xyz" to "k8s-master.HIDE.xyz" and delete A recored for "master.HIDE.xyz". but why fetch the kubeadm-config ConfigMap from https://master.HiDE.xyz:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config not https://k8s-master.HiDE.xyz:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config

kubeadm-config

kubectl -n kube-system get cm kubeadm-config -oyaml
apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      certSANs:
      - k8s-master.HIDE.xyz
      - master.HIDE.xyz
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta1
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controlPlaneEndpoint: k8s-master.HIDE.xyz:6443
    controllerManager: {}
    dns:
      type: CoreDNS
    etcd:
      external:
        caFile: /etc/ssl/etcd/etcd-root-ca.pem
        certFile: /etc/ssl/etcd/etcd.pem
        endpoints:
        - https://172.16.10.136:2379
        - https://172.16.10.137:2379
        - https://172.16.10.138:2379
        keyFile: /etc/ssl/etcd/etcd-key.pem
    imageRepository: gcr.azk8s.cn/google_containers
    kind: ClusterConfiguration
    kubernetesVersion: v1.14.2
    networking:
      dnsDomain: cluster.local
      podSubnet: 192.168.0.0/16
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
  ClusterStatus: |
    apiEndpoints:
      master-172-16-10-136:
        advertiseAddress: 172.16.10.136
        bindPort: 6443
      master-172-16-10-137:
        advertiseAddress: 172.16.10.137
        bindPort: 6443
      master-172-16-10-138:
        advertiseAddress: 172.16.10.138
        bindPort: 6443
    apiVersion: kubeadm.k8s.io/v1beta1
    kind: ClusterStatus
kind: ConfigMap
metadata:
  creationTimestamp: "2019-01-17T12:30:52Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "32801430"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
  uid: ba5c5c09-1a53-11e9-b46c-000c296fd64f

What you expected to happen?

add worker node

How to reproduce it (as minimally and precisely as possible)?

rerun kubeadm join

Anything else we need to know?

kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://k8s-master.HIDE.xyz:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED
kubeadm config view
apiServer:
  certSANs:
  - k8s-master.HIDE.xyz
  - master.HIDE.xyz
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: k8s-master.HIDE.xyz:6443
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  external:
    caFile: /etc/ssl/etcd/etcd-root-ca.pem
    certFile: /etc/ssl/etcd/etcd.pem
    endpoints:
    - https://172.16.10.136:2379
    - https://172.16.10.137:2379
    - https://172.16.10.138:2379
    keyFile: /etc/ssl/etcd/etcd-key.pem
imageRepository: gcr.azk8s.cn/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.14.2
networking:
  dnsDomain: cluster.local
  podSubnet: 192.168.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}
$ kubectl get no -owide
NAME                   STATUS   ROLES    AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
master-172-16-10-136   Ready    master   138d    v1.14.2   172.16.10.136   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
master-172-16-10-137   Ready    master   138d    v1.14.2   172.16.10.137   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
master-172-16-10-138   Ready    master   138d    v1.14.2   172.16.10.138   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-139   Ready    <none>   138d    v1.14.2   172.16.10.139   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-140   Ready    <none>   138d    v1.14.2   172.16.10.140   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-141   Ready    <none>   138d    v1.14.2   172.16.10.141   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-142   Ready    <none>   63d     v1.14.2   172.16.10.142   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-143   Ready    <none>   63d     v1.14.2   172.16.10.143   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-145   Ready    <none>   63d     v1.14.2   172.16.10.145   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-169   Ready    <none>   4d11h   v1.14.2   172.16.10.169   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-170   Ready    <none>   4d11h   v1.14.2   172.16.10.170   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6
worker-172-16-10-171   Ready    <none>   4d11h   v1.14.2   172.16.10.171   <none>        CentOS Linux 7 (Core)   3.10.0-957.12.2.el7.x86_64   docker://18.9.6

$ kubectl get cs
NAME                 STATUS    MESSAGE             ERROR
scheduler            Healthy   ok                  
controller-manager   Healthy   ok                  
etcd-2               Healthy   {"health":"true"}   
etcd-1               Healthy   {"health":"true"}   
etcd-0               Healthy   {"health":"true"}

$ kubectl cluster-info
Kubernetes master is running at https://k8s-master.HIDE.xyz:6443
affable-ibex-kubernetes-dashboard is running at https://k8s-master.HIDE.xyz:6443/api/v1/namespaces/kube-system/services/https:affable-ibex-kubernetes-dashboard:https/proxy
KubeDNS is running at https://k8s-master.HIDE.xyz:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Thanks!

SataQiu commented 5 years ago

Thanks for reporting it @omegazeng Could you provide more information about this issue ? What is the output of kubectl get cm cluster-info -oyaml -n kube-public ? And how did you change controlPlaneEndpoint "master.HIDE.xyz" to "k8s-master.HIDE.xyz" ?

omegazeng commented 5 years ago

@SataQiu Thank you!

kubectl get cm cluster-info -oyaml -n kube-public
apiVersion: v1
data:
  jws-kubeconfig-38t43b: eyJhbGciOiJIUzIXXXXXXXXXXX..1JkUOYiNFu5wkxkXXXXXX
  jws-kubeconfig-3dn914: eyJhbGciOiJIUzIXXXXXXXXXXX..eAiFwHUpd2RPJoBXXXXXX
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: XXXXXXXXXXXXXXXXXXXXXXX
        server: https://master.HIDE.xyz:6443
      name: ""
    contexts: []
    current-context: ""
    kind: Config
    preferences: {}
    users: []
kind: ConfigMap
metadata:
  creationTimestamp: "2019-01-17T12:30:54Z"
  name: cluster-info
  namespace: kube-public
  resourceVersion: "34316788"
  selfLink: /api/v1/namespaces/kube-public/configmaps/cluster-info
  uid: bb9ae460-1a53-11e9-b46c-000c296fd64f

I found server: https://master.HIDE.xyz:6443 Is safe to edit cm cluster-info directly?


And how did you change controlPlaneEndpoint "master.HIDE.xyz" to "k8s-master.HIDE.xyz" ?

update kubeadm-config.yaml

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
  certSANs:
  - "k8s-master.HIDE.xyz"  ### ADD
  - "master.HIDE.xyz"
controlPlaneEndpoint: "k8s-master.HIDE.xyz:6443" ### Modify
etcd:
    external:
        endpoints:
        - https://172.16.10.136:2379
        - https://172.16.10.137:2379
        - https://172.16.10.138:2379
        caFile: /etc/ssl/etcd/etcd-root-ca.pem
        certFile: /etc/ssl/etcd/etcd.pem
        keyFile: /etc/ssl/etcd/etcd-key.pem
networking:
    # This CIDR is a calico default. Substitute or remove for your CNI provider.
    podSubnet: "192.168.0.0/16"
imageRepository: gcr.azk8s.cn/google_containers
# renew certs
kubeadm init phase certs apiserver --config kubeadm-config.yaml
# upgrade
kubeadm upgrade apply --config kubeadm-config.yaml
# restart kubelet
systemctl restart kubelet.service
# check config
kubeadm config view
# check certSANs
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text

detail: https://github.com/kubernetes/kubeadm/issues/1447#issuecomment-489513430 https://github.com/kubernetes/kubeadm/issues/1447#issuecomment-490434779

https://github.com/kubernetes/kubeadm/issues/1540

SataQiu commented 5 years ago

As far as I know, kubeadm will use cluster-info ConfigMap as the cluster configuration. This problem may be due to incomplete modifications. I think you can try to edit cluster-info ConfigMap directly.

omegazeng commented 5 years ago

As far as I know, kubeadm will use cluster-info ConfigMap as the cluster configuration. This problem may be due to incomplete modifications. I think you can try to edit cluster-info ConfigMap directly.

Rerun kubeadm join

I0605 17:20:12.457742   19165 join.go:427] [preflight] Discovering cluster-info
I0605 17:20:12.457886   19165 token.go:200] [discovery] Trying to connect to API Server "k8s-master.HIDE.xyz:6443"
I0605 17:20:12.458624   19165 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://k8s-master.HIDE.xyz:6443"
I0605 17:20:22.952489   19165 token.go:141] [discovery] Requesting info from "https://k8s-master.HIDE.xyz:6443" again to validate TLS against the pinned public key
I0605 17:20:32.987015   19165 token.go:164] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "k8s-master.HIDE.xyz:6443"
I0605 17:20:32.987057   19165 token.go:206] [discovery] Successfully established connection with API Server "k8s-master.HIDE.xyz:6443"
I0605 17:20:32.987089   19165 join.go:441] [preflight] Fetching init configuration
I0605 17:20:32.987107   19165 join.go:474] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0605 17:20:33.002346   19165 preflight.go:101] [preflight] Running configuration dependant checks
I0605 17:20:33.002378   19165 controlplaneprepare.go:207] [download-certs] Skipping certs download
I0605 17:20:33.002392   19165 kubelet.go:105] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0605 17:20:33.072207   19165 kubelet.go:130] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
I0605 17:20:33.165665   19165 kubelet.go:147] [kubelet-start] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I0605 17:20:44.259624   19165 kubelet.go:165] [kubelet-start] preserving the crisocket information for the node
I0605 17:20:44.259653   19165 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "worker-172-16-7-51" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
kubectl get no
NAME                   STATUS   ROLES    AGE     VERSION
master-172-16-10-136   Ready    master   138d    v1.14.2
master-172-16-10-137   Ready    master   138d    v1.14.2
master-172-16-10-138   Ready    master   138d    v1.14.2
worker-172-16-10-139   Ready    <none>   138d    v1.14.2
worker-172-16-10-140   Ready    <none>   138d    v1.14.2
worker-172-16-10-141   Ready    <none>   138d    v1.14.2
worker-172-16-10-142   Ready    <none>   63d     v1.14.2
worker-172-16-10-143   Ready    <none>   63d     v1.14.2
worker-172-16-10-145   Ready    <none>   63d     v1.14.2
worker-172-16-10-169   Ready    <none>   4d17h   v1.14.2
worker-172-16-10-170   Ready    <none>   4d17h   v1.14.2
worker-172-16-10-171   Ready    <none>   4d17h   v1.14.2
worker-172-16-7-51     Ready    <none>   2m14s   v1.14.2

Thank you again!

I think "kubadm upgrade" should sync update ConfigMap cluster-info.

SataQiu commented 5 years ago

You are welcome! :blush:

tbernacchi commented 1 year ago

In my case I've recreated the token: kubeadm token create --print-join-command and everything was fine.

omegazeng commented 10 months ago

In my case I've recreated the token: kubeadm token create --print-join-command and everything was fine.

a little different, I ran the kubeadm join but failed, because I changed the K8s endpoint address.