高可用集群部署失败

wangkuan commented 3 years ago

我使用腾讯云 CVM 主机部署高可用集群，配置文件如下

apiVersion: kubekey.kubesphere.io/v1alpha1
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: master-1, address: 192.168.100.11, internalAddress: 192.168.100.11, user: root, password: P@ssw0rd}
  - {name: master-2, address: 192.168.100.12, internalAddress: 192.168.100.12, user: root, password: P@ssw0rd}
  - {name: master-3, address: 192.168.100.13, internalAddress: 192.168.100.13, user: root, password: P@ssw0rd}
  - {name: worker-1, address: 192.168.100.21, internalAddress: 192.168.100.21, user: root, password: P@ssw0rd}
  - {name: worker-2, address: 192.168.100.22, internalAddress: 192.168.100.22, user: root, password: P@ssw0rd}
  - {name: worker-3, address: 192.168.100.23, internalAddress: 192.168.100.23, user: root, password: P@ssw0rd}
  roleGroups:
    etcd:
    - master-1
    - master-2
    - master-3
    master:
    - master-1
    - master-2
    - master-3
    worker:
    - worker-1
    - worker-2
    - worker-3
  controlPlaneEndpoint:
    domain: lb.kubesphere.local
    address: "192.168.100.7"   # 腾讯云 CLB（内网）
    port: "6443"
  kubernetes:
    version: v1.18.6
    imageRepo: kubesphere
    clusterName: cluster.local
    proxyMode: iptables
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
  registry:
    registryMirrors: []
    insecureRegistries: []
    privateRegistry: dockerhub.kubekey.local
  addons: []

错误日志如下：

time="11:37:49 CST" level=info msg="Initializing kubernetes cluster"
[master-1 192.168.100.11] MSG:
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0205 11:42:49.192308   19352 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[preflight] Running pre-flight checks
W0205 11:42:49.192422   19352 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[master-1 192.168.100.11] MSG:
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0205 11:47:54.479579   22260 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: context deadline exceeded
[preflight] Running pre-flight checks
W0205 11:47:54.479712   22260 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
time="11:52:19 CST" level=error msg="Failed to init kubernetes cluster: Failed to exec command: sudo -E /bin/sh -c \"/usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml\" \nW0205 11:47:55.643695   22695 utils.go:26] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [169.254.25.10]\r\nW0205 11:47:55.643865   22695 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]\r\n[init] Using Kubernetes version: v1.18.6\r\n[preflight] Running pre-flight checks\r\n[preflight] Pulling images required for setting up a Kubernetes cluster\r\n[preflight] This might take a minute or two, depending on the speed of your internet connection\r\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\r\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\r\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\r\n[kubelet-start] Starting the kubelet\r\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\r\n[certs] Generating \"ca\" certificate and key\r\n[certs] Generating \"apiserver\" certificate and key\r\n[certs] apiserver serving cert is signed for DNS names [master-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local lb.kubesphere.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost lb.kubesphere.local master-1 master-1.cluster.local master-2 master-2.cluster.local master-3 master-3.cluster.local worker-1 worker-1.cluster.local worker-2 worker-2.cluster.local worker-3 worker-3.cluster.local] and IPs [10.233.0.1 192.168.100.11 127.0.0.1 192.168.100.7 192.168.100.11 192.168.100.12 192.168.100.13 192.168.100.21 192.168.100.22 192.168.100.23 10.233.0.1]\r\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\r\n[certs] Generating \"front-proxy-ca\" certificate and key\r\n[certs] Generating \"front-proxy-client\" certificate and key\r\n[certs] External etcd mode: Skipping etcd/ca certificate authority generation\r\n[certs] External etcd mode: Skipping etcd/server certificate generation\r\n[certs] External etcd mode: Skipping etcd/peer certificate generation\r\n[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation\r\n[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation\r\n[certs] Generating \"sa\" key and public key\r\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\r\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\r\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\r\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\r\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\r\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\r\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\r\nW0205 11:47:57.898313   22695 manifests.go:225] the default kube-apiserver authorization-mode is \"Node,RBAC\"; using \"Node,RBAC\"\r\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\r\nW0205 11:47:57.905505   22695 manifests.go:225] the default kube-apiserver authorization-mode is \"Node,RBAC\"; using \"Node,RBAC\"\r\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\r\nW0205 11:47:57.906520   22695 manifests.go:225] the default kube-apiserver authorization-mode is \"Node,RBAC\"; using \"Node,RBAC\"\r\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\r\n[kubelet-check] Initial timeout of 40s passed.\r\n\r\n\tUnfortunately, an error has occurred:\r\n\t\ttimed out waiting for the condition\r\n\r\n\tThis error is likely caused by:\r\n\t\t- The kubelet is not running\r\n\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\r\n\r\n\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\r\n\t\t- 'systemctl status kubelet'\r\n\t\t- 'journalctl -xeu kubelet'\r\n\r\n\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.\r\n\tTo troubleshoot, list all containers using your preferred container runtimes CLI.\r\n\r\n\tHere is one example how you may list all Kubernetes containers running in docker:\r\n\t\t- 'docker ps -a | grep kube | grep -v pause'\r\n\t\tOnce you have found the failing container, you can inspect its logs with:\r\n\t\t- 'docker logs CONTAINERID'\r\n\r\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster\r\nTo see the stack trace of this error execute with --v=5 or higher: Process exited with status 1" node=192.168.100.11
time="11:52:19 CST" level=warning msg="Task failed ..."
time="11:52:19 CST" level=warning msg="error: interrupted by error"
Error: Failed to init kubernetes cluster: interrupted by error
Failed to init kubernetes cluster: interrupted by error
Usage:
  kk create cluster [flags]

Flags:
  -f, --filename string          Path to a configuration file
  -h, --help                     help for cluster
      --skip-pull-images         Skip pre pull images
      --with-kubernetes string   Specify a supported version of kubernetes
      --with-kubesphere          Deploy a specific version of kubesphere (default v3.0.0)
  -y, --yes                      Skip pre-check of the installation

Global Flags:
      --debug   Print detailed information (default true)

wangkuan commented 3 years ago

如果不配置 address: "192.168.100.7"，比如留空 address: "" 可以部署成功，但是此时只有 Master-1 节点会提供 KubeAPI 服务

pixiake commented 3 years ago

腾讯云的内网lb好像不支持这种用法，需要用外网lb，KubeSphere中文官方论坛里应该有对应的帖子。

PS: 中文问题可以直接在中文官方论坛里发帖提问。

wangkuan commented 3 years ago

腾讯云的内网lb好像不支持这种用法，需要用外网lb，KubeSphere中文官方论坛里应该有对应的帖子。

PS: 中文问题可以直接在中文官方论坛里发帖提问。

好像确实是 CLB 回环的问题导致的，但这个地方卡住是不是因为要等待 master-1 的 KUBE-API 可用？还是有其他逻辑？如果只是等待，是否可以让其他节点交叉执行检查而从而避免 CLB 回环？

wangkuan commented 3 years ago

另外，公网的 CLB 我测试过了，也存在回环的问题。

FeynmanZhou commented 3 years ago

@wangkuan Please try internal CLB in the Tecent cloud.

wangkuan commented 3 years ago

@wangkuan Please try internal CLB in the Tecent cloud.

Intranet CLB has loopback problems

wangkuan commented 3 years ago

To be precise, as long as Tencent Cloud CLB has loopback problems. kubekey does not support Tencent Cloud.

kubesphere / kubekey

高可用集群部署失败 #450