yunionio / cloudpods

A cloud-native open-source unified multi-cloud and hybrid-cloud platform. 开源、云原生的多云管理及混合云融合平台
https://www.cloudpods.org
Apache License 2.0
2.58k stars 528 forks source link

[求助/Help]手动添加控制节点命令执行卡住 #21382

Open ChaoHsin-fang opened 1 day ago

ChaoHsin-fang commented 1 day ago

您好 请教一下手动添加控制节点卡住一小时以上没有报错 ocadm version ocadm version: version.Info{Major:"0", Minor:"0", GitVersion:"v3.11.3-20240423.1", GitBranch:"tags/v3.11.3-20240423.1", GitCommit:"f8d30d14", GitTreeState:"clean", BuildDate:"2024-04-23T10:57:25Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

image

zexi commented 1 day ago

ocadm join 命令加上 -v 10 参数,具体看下报错,另外可以确认下两个节点的系统时间是否一致。

ChaoHsin-fang commented 1 day ago

日志如下 时间调整成一致的了

I1011 15:53:38.785817 49176 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName I1011 15:53:38.785896 49176 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I1011 15:53:38.785944 49176 preflight.go:91] [preflight] Running general checks I1011 15:53:38.785965 49176 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests I1011 15:53:38.785986 49176 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf I1011 15:53:38.785992 49176 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I1011 15:53:38.785998 49176 checks.go:105] validating the container runtime I1011 15:53:38.802360 49176 checks.go:131] validating if the service is enabled and active I1011 15:53:38.829150 49176 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I1011 15:53:38.829183 49176 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward I1011 15:53:38.829196 49176 checks.go:653] validating whether swap is enabled or not I1011 15:53:38.829216 49176 checks.go:382] validating the presence of executable ip I1011 15:53:38.829231 49176 checks.go:382] validating the presence of executable iptables I1011 15:53:38.829242 49176 checks.go:382] validating the presence of executable mount I1011 15:53:38.829251 49176 checks.go:382] validating the presence of executable nsenter I1011 15:53:38.829260 49176 checks.go:382] validating the presence of executable ebtables I1011 15:53:38.829270 49176 checks.go:382] validating the presence of executable ethtool I1011 15:53:38.829279 49176 checks.go:382] validating the presence of executable socat I1011 15:53:38.829288 49176 checks.go:382] validating the presence of executable tc [WARNING FileExisting-tc]: tc not found in system path I1011 15:53:38.829320 49176 checks.go:382] validating the presence of executable touch I1011 15:53:38.829331 49176 checks.go:524] running all checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.24. Latest validated version: 18.09 I1011 15:53:38.839290 49176 checks.go:412] checking whether the given node name is reachable using net.LookupHost I1011 15:53:38.839411 49176 checks.go:622] validating kubelet version I1011 15:53:38.888095 49176 checks.go:131] validating if the service is enabled and active I1011 15:53:38.896103 49176 checks.go:209] validating availability of port 10250 I1011 15:53:38.896172 49176 checks.go:439] validating if the connectivity type is via proxy or direct I1011 15:53:38.896185 49176 join.go:460] [preflight] Discovering cluster-info I1011 15:53:38.896211 49176 token.go:199] [discovery] Trying to connect to API Server "10.64.25.150:6443" I1011 15:53:38.896486 49176 token.go:74] [discovery] Created cluster-info discovery client, requesting info from "https://10.64.25.150:6443" I1011 15:53:38.896517 49176 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: ocadm/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info' I1011 15:53:38.896763 49176 round_trippers.go:438] GET https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info in 0 milliseconds I1011 15:53:38.896775 49176 round_trippers.go:444] Response Headers: I1011 15:53:38.896799 49176 token.go:82] [discovery] Failed to request cluster info, will try again: [Get "https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp 10.64.25.150:6443: connect: connection refused] I1011 15:53:43.897594 49176 round_trippers.go:419] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: ocadm/v0.0.0 (linux/amd64) kubernetes/$Format" 'https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info' I1011 15:53:43.897874 49176 round_trippers.go:438] GET https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info in 0 milliseconds I1011 15:53:43.897884 49176 round_trippers.go:444] Response Headers: I1011 15:53:43.897904 49176 token.go:82] [discovery] Failed to request cluster info, will try again: [Get "https://10.64.25.150:6443/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp 10.64.25.150:6443: connect: connection refused]

zexi commented 1 day ago

dial tcp 10.64.25.150:6443: connect: connection refused 网络不通?

ChaoHsin-fang commented 1 day ago

ping 10.64.25.150

PING 10.64.25.150 (10.64.25.150) 56(84) bytes of data. 64 bytes from 10.64.25.150: icmp_seq=1 ttl=64 time=0.166 ms 64 bytes from 10.64.25.150: icmp_seq=2 ttl=64 time=0.121 ms 64 bytes from 10.64.25.150: icmp_seq=3 ttl=64 time=0.115 ms --- 10.64.25.150 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2043ms rtt min/avg/max/mdev = 0.115/0.134/0.166/0.022 ms telnet: connect to address 10.64.25.150: Connection refused

telnet 10.64.25.150 6443

Trying 10.64.25.150... telnet: connect to address 10.64.25.150: Connection refused

ChaoHsin-fang commented 7 hours ago

现在网络通了,但是添加控制节点报错,提示要访问该机器的etcd的端口,为啥不是访问集群的etcd

ocadm join --control-plane 10.64.25.150:6443 --token uzcxd0.qt3csimnlx2emc13 --certificate-key 3a8039ff2670ba91fff0d598e16a4a60b86cb0a789ff6748e7f49ea4e9b5f6b1 --discovery-token-unsafe-skip-ca-verification --apiserver-advertise-address 10.64.25.96 --node-ip 10.64.25.96 --as-onecloud-controller --host-networks 'bond0/br0/10.64.25.96' --high-availability-vip 10.64.25.150 --keepalived-version-tag v2.0.25 --ignore-preflight-errors=all [preflight] Running pre-flight checks [WARNING FileExisting-tc]: tc not found in system path [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.24. Latest validated version: 18.09 [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm ocadm-config -oyaml' [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' W1012 15:01:12.023894 52795 proxier.go:513] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' got Keepalived version tag from commandline: v2.0.25 [PASS] Installing Keepalived:v2.0.25 as BACKUP, nodeIP[10.64.25.96], interface: bond0[PASS] keepalived path created. 106425150 [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [sym206-cpu-b1211-node096 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.64.25.96 10.64.25.150] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy {"level":"warn","ts":"2024-10-12T15:01:18.377+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.64.25.96:2379/","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 10.64.25.96:2379: connect: connection refused\""} error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded

zexi commented 4 hours ago

现在网络通了,但是添加控制节点报错,提示要访问该机器的etcd的端口,为啥不是访问集群的etcd

ocadm join --control-plane 10.64.25.150:6443 --token uzcxd0.qt3csimnlx2emc13 --certificate-key 3a8039ff2670ba91fff0d598e16a4a60b86cb0a789ff6748e7f49ea4e9b5f6b1 --discovery-token-unsafe-skip-ca-verification --apiserver-advertise-address 10.64.25.96 --node-ip 10.64.25.96 --as-onecloud-controller --host-networks 'bond0/br0/10.64.25.96' --high-availability-vip 10.64.25.150 --keepalived-version-tag v2.0.25 --ignore-preflight-errors=all [preflight] Running pre-flight checks [WARNING FileExisting-tc]: tc not found in system path [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.24. Latest validated version: 18.09 [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm ocadm-config -oyaml' [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' W1012 15:01:12.023894 52795 proxier.go:513] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules [preflight] Running pre-flight checks before initializing the new control plane instance [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' got Keepalived version tag from commandline: v2.0.25 [PASS] Installing Keepalived:v2.0.25 as BACKUP, nodeIP[10.64.25.96], interface: bond0[PASS] keepalived path created. 106425150 [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [sym206-cpu-b1211-node096 localhost] and IPs [10.64.25.96 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [sym206-cpu-b1211-node096 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.64.25.96 10.64.25.150] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Valid certificates and keys now exist in "/etc/kubernetes/pki" [certs] Using the existing "sa" key [kubeconfig] Generating kubeconfig files [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy {"level":"warn","ts":"2024-10-12T15:01:18.377+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.64.25.96:2379/","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing dial tcp 10.64.25.96:2379: connect: connection refused""} error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded

加的是控制节点,所以要访问 etcd 的 2379 端口

ChaoHsin-fang commented 4 hours ago

按照官方文档操作的出现了这个报错 是缺什么操作吗还是