Closed shiqinfeng1 closed 3 years ago
把 config.yaml 中的 kube_version 修改成 v1.21.5,发布的时候版本号忘记更新了
好哒, 我试试
再次运行bash install.sh后,好像是镜像仓库有问题。有如下错误:
TASK [policy_controller/calico : Start of Calico kube controllers] **********************************************************************************
ok: [node1] => (item=calico-kube-controllers.yml)
ok: [node1] => (item=calico-kube-sa.yml)
ok: [node1] => (item=calico-kube-cr.yml)
ok: [node1] => (item=calico-kube-crb.yml)
Thursday 23 September 2021 00:35:34 +0000 (0:00:02.923) 0:00:37.992 ****
Thursday 23 September 2021 00:35:34 +0000 (0:00:00.061) 0:00:38.054 ****
Thursday 23 September 2021 00:35:34 +0000 (0:00:00.101) 0:00:38.155 ****
PLAY [k8s_cluster] **********************************************************************************************************************************
Thursday 23 September 2021 00:35:34 +0000 (0:00:00.130) 0:00:38.286 ****
TASK [Restart containerd for reload CNI] ************************************************************************************************************
changed: [node3]
changed: [node2]
changed: [node4]
ERRO[0041] task must be stopped before deletion: running: failed precondition
changed: [node1]
96255c62f8692516615526650f97ef46ec1a4a7bb2b27575a72ded58c5115a01
FATA[0042] rpc error: code = Unavailable desc = error reading from server: EOF
[root@node1 kubeplay]#
[root@node1 kubeplay]#
查看pod信息:
[root@node1 ~]# kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f747d7747-p4jdr 0/1 ImagePullBackOff 0 13h
calico-node-9trwn 1/1 Running 1 13h
calico-node-c22wm 1/1 Running 0 13h
calico-node-drqxx 1/1 Running 0 13h
calico-node-fsvzp 1/1 Running 0 13h
kube-apiserver-node1 1/1 Running 1 14h
kube-apiserver-node2 1/1 Running 2 14h
kube-apiserver-node3 1/1 Running 2 13h
kube-controller-manager-node1 1/1 Running 3 14h
kube-controller-manager-node2 1/1 Running 3 14h
kube-controller-manager-node3 1/1 Running 4 13h
kube-proxy-5t7sg 1/1 Running 0 13h
kube-proxy-n6bkd 1/1 Running 0 13h
kube-proxy-p4g9z 1/1 Running 0 13h
kube-proxy-smq5g 1/1 Running 0 13h
kube-scheduler-node1 1/1 Running 3 14h
kube-scheduler-node2 1/1 Running 3 14h
kube-scheduler-node3 1/1 Running 4 13h
nginx-proxy-node4 1/1 Running 0 13h
[root@node1 ~]#
拉取镜像失败:
。。。
Volumes:
kube-api-access-d45pw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 60s (x3645 over 13h) kubelet Back-off pulling image "kube.registry.local/library/calico-kube-controllers:v3.19.2"
[root@node1 ~]#
补充一个错误信息,怀疑是证书重新生成后验证失败:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m38s default-scheduler Successfully assigned kube-system/calico-kube-controllers-7f747d7747-rwv4r to node4
Normal Pulling 2m59s (x4 over 4m25s) kubelet Pulling image "kube.registry.local/library/calico-kube-controllers:v3.19.2"
Warning Failed 2m58s (x4 over 4m24s) kubelet Failed to pull image "kube.registry.local/library/calico-kube-controllers:v3.19.2": rpc error: code = Unknown desc = failed to pull and unpack image "kube.registry.local/library/calico-kube-controllers:v3.19.2": failed to resolve reference "kube.registry.local/library/calico-kube-controllers:v3.19.2": failed to do request: Head "https://kube.registry.local/v2/library/calico-kube-controllers/manifests/v3.19.2": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "mkcert root@node1")
Warning Failed 2m58s (x4 over 4m24s) kubelet Error: ErrImagePull
Warning Failed 2m31s (x6 over 4m24s) kubelet Error: ImagePullBackOff
Normal BackOff 2m17s (x7 over 4m24s) kubelet Back-off pulling image "kube.registry.local/library/calico-kube-controllers:v3.19.2"
[root@node1 ~]#
重新生成证书的逻辑确实有点问题,我修复下
我执行remove之后, 重新跑了一把,calico-kube-controllers能够跑起来了, 但是还是会报下面的错误,没有继续执行后续脚本:
TASK [Restart containerd for reload CNI] ************************************************************************************************************
changed: [node4]
changed: [node2]
changed: [node3]
ERRO[0891] task must be stopped before deletion: running: failed precondition
changed: [node1]
3f0df560215b023680d9889ebf3856cfb84a59317453c3cec8ce3e822ebf1bc2
FATA[0891] rpc error: code = Unavailable desc = error reading from server: EOF
calico-kube-controllers在报了一次健康检查失败后,状态变为running:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned kube-system/calico-kube-controllers-7f747d7747-qvt8p to node1
Warning Failed 118s kubelet Failed to pull image "kube.registry.local/library/calico-kube-controllers:v3.19.2": rpc error: code = Unavailable desc = transport is closing
Warning Failed 118s kubelet Error: ErrImagePull
Normal BackOff 117s kubelet Back-off pulling image "kube.registry.local/library/calico-kube-controllers:v3.19.2"
Warning Failed 117s kubelet Error: ImagePullBackOff
Normal Pulling 105s (x2 over 119s) kubelet Pulling image "kube.registry.local/library/calico-kube-controllers:v3.19.2"
Normal Pulled 100s kubelet Successfully pulled image "kube.registry.local/library/calico-kube-controllers:v3.19.2" in 5.145123223s
Normal Created 99s kubelet Created container calico-kube-controllers
Normal Started 99s kubelet Started container calico-kube-controllers
Warning Unhealthy 97s kubelet Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
[root@node1 ~]# kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7f747d7747-qvt8p 1/1 Running 0 2m7s
calico-node-cwzzr 1/1 Running 2 4m2s
calico-node-md7fk 1/1 Running 0 4m2s
calico-node-rkflg 1/1 Running 0 4m2s
calico-node-zzqwn 1/1 Running 0 4m2s
kube-apiserver-node1 1/1 Running 0 7m34s
kube-apiserver-node2 1/1 Running 0 7m11s
kube-apiserver-node3 1/1 Running 0 6m54s
kube-controller-manager-node1 1/1 Running 1 7m49s
kube-controller-manager-node2 1/1 Running 1 7m11s
kube-controller-manager-node3 1/1 Running 1 6m54s
kube-proxy-2v6x8 1/1 Running 0 5m25s
kube-proxy-6s7kn 1/1 Running 0 5m24s
kube-proxy-c9925 1/1 Running 0 5m26s
kube-proxy-lz2kw 1/1 Running 0 5m25s
kube-scheduler-node1 1/1 Running 1 7m48s
kube-scheduler-node2 1/1 Running 1 7m10s
kube-scheduler-node3 1/1 Running 2 6m53s
nginx-proxy-node4 1/1 Running 0 5m29s
对了, 我在执行install.sh remove时, 最后会报nerdctl找不到:
TASK [reset : reset | Restart network] **************************************************************************************************************
changed: [node2]
changed: [node3]
changed: [node4]
changed: [node1]
PLAY RECAP ******************************************************************************************************************************************
localhost : ok=4 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node1 : ok=33 changed=18 unreachable=0 failed=0 skipped=26 rescued=0 ignored=0
node2 : ok=32 changed=18 unreachable=0 failed=0 skipped=21 rescued=0 ignored=0
node3 : ok=32 changed=18 unreachable=0 failed=0 skipped=21 rescued=0 ignored=0
node4 : ok=32 changed=16 unreachable=0 failed=0 skipped=21 rescued=0 ignored=0
Thursday 23 September 2021 01:33:50 +0000 (0:00:06.947) 0:01:49.575 ****
===============================================================================
Gather necessary facts (hardware) ----------------------------------------------------------------------------------------------------------- 17.69s
reset : reset | remove remaining routes set by bird ----------------------------------------------------------------------------------------- 16.10s
reset : reset | stop etcd services ---------------------------------------------------------------------------------------------------------- 15.83s
reset : reset | delete some files and directories ------------------------------------------------------------------------------------------- 13.68s
reset : reset | Restart network -------------------------------------------------------------------------------------------------------------- 6.95s
reset : reset | remove the network device created by calico ---------------------------------------------------------------------------------- 5.34s
reset : reset | get remaining routes set by bird --------------------------------------------------------------------------------------------- 5.34s
reset : reset | check dummy0 network device -------------------------------------------------------------------------------------------------- 5.18s
reset : reset | force remove all cri containers ---------------------------------------------------------------------------------------------- 3.42s
reset : reset | stop all cri containers ------------------------------------------------------------------------------------------------------ 1.98s
download : download | Download files / images ------------------------------------------------------------------------------------------------ 1.75s
reset : reset | remove services -------------------------------------------------------------------------------------------------------------- 1.61s
reset : reset | force remove all cri pods ---------------------------------------------------------------------------------------------------- 1.41s
reset : reset | stop all cri pods ------------------------------------------------------------------------------------------------------------ 1.25s
reset : flush iptables ----------------------------------------------------------------------------------------------------------------------- 1.13s
Gather minimal facts ------------------------------------------------------------------------------------------------------------------------- 1.04s
reset : reset | remove docker dropins -------------------------------------------------------------------------------------------------------- 0.91s
reset : reset | stop services ---------------------------------------------------------------------------------------------------------------- 0.85s
reset : reset | systemctl daemon-reload ------------------------------------------------------------------------------------------------------ 0.79s
reset : reset | unmount kubelet dirs --------------------------------------------------------------------------------------------------------- 0.71s
✔ ###### kubernetes cluster successfully removed ######
/root/kubeplay/library/remove.sh: line 16: /usr/local/bin/nerdctl: No such file or directory
[root@node1 kubeplay]#
compose 节点不能作为 K8s 集群的节点,需要单独部署
那是不是集群节点和compose 节点需要网络互通? 如果集群节点访问不到compose节点就不行了?
是的,一般是将部署工具运行节点作为 compose 节点,集群节点和它之间需要能互相访问
我同步更新了下修改,在install.sh remove之后重新install还是这样的报错:
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m27s default-scheduler Successfully assigned kube-system/calico-node-pddcs to node4
Normal Pulling 2m55s (x4 over 4m26s) kubelet Pulling image "kube.registry.local/library/calico-cni:v3.19.2"
Warning Failed 2m55s (x4 over 4m25s) kubelet Failed to pull image "kube.registry.local/library/calico-cni:v3.19.2": rpc error: code = Unknown desc = failed to pull and unpack image "kube.registry.local/library/calico-cni:v3.19.2": failed to resolve reference "kube.registry.local/library/calico-cni:v3.19.2": failed to do request: Head "https://kube.registry.local/v2/library/calico-cni/manifests/v3.19.2": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "mkcert root@kube.registry.local")
Warning Failed 2m55s (x4 over 4m25s) kubelet Error: ErrImagePull
Warning Failed 2m29s (x6 over 4m24s) kubelet Error: ImagePullBackOff
Normal BackOff 2m18s (x7 over 4m24s) kubelet Back-off pulling image "kube.registry.local/library/calico-cni:v3.19.2"
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
calico-node-mpcbg 0/1 Init:ImagePullBackOff 0 70s
calico-node-ngp7q 0/1 Init:ImagePullBackOff 0 4m41s
calico-node-pd9rv 0/1 Init:ImagePullBackOff 0 4m40s
calico-node-pddcs 0/1 Init:ImagePullBackOff 0 4m40s
kube-apiserver-node1 1/1 Running 0 6m53s
kube-controller-manager-node1 1/1 Running 0 6m57s
kube-proxy-f2gfk 1/1 Running 0 5m19s
kube-proxy-lm528 1/1 Running 0 5m18s
kube-proxy-ng7qz 1/1 Running 0 5m18s
kube-proxy-tzj4h 1/1 Running 0 5m17s
kube-scheduler-node1 1/1 Running 0 6m58s
nginx-proxy-node2 1/1 Running 0 5m23s
nginx-proxy-node3 1/1 Running 0 5m20s
nginx-proxy-node4 1/1 Running 0 5m23s
[root@node1 ~]#
image-repo 是不是可以考虑不采用https访问啊,不存在安全问题, 这样就不需要验证证书了
今天测试了下remove后重新install,没有出现拉取calico-node镜像失败的情况了,原因不明。 PS: 昨天我执行remove后把证书手动删除了, 不知道是否有关系?
我下载了最新的v0.1.0-rc.2版本,在4台新的虚拟机上执行部署脚本,有报错。 运行脚本的机子也作为master的一个节点。IP地址为192.168.40.201。 所有机子都执行过yum update -y OS:
配置文件如下:
报错位置如下:
再次执行 bash install.sh后, 报错如下: