Open wmenjoy opened 3 years ago
2021-03-15 07:25:06.006959 I | embed: rejected connection from "192.168.214.32:32642" (error "tls: failed to verify client's certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-ca\")", ServerName "")
原因:不同的客户端生成的配置是不一样的,ca证书可能失效,删除 /etc/kubernetes的配置,删除对应docker服务,重新生成即可
kubectl patch namespace cattle-system -p '{"metadata":{"finalizers":[]}}' --type='merge' -n cattle-system
kubectl delete namespace cattle-system --grace-period=0 --force
kubectl patch namespace cattle-global-data -p '{"metadata":{"finalizers":[]}}' --type='merge' -n cattle-system kubectl delete namespace cattle-global-data --grace-period=0 --force
kubectl patch namespace local -p '{"metadata":{"finalizers":[]}}' --type='merge' -n cattle-system
for resource in kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -o name -n local
; do kubectl patch $resource -p '{"metadata": {"finalizers": []}}' --type='merge' -n local; done
kubectl delete namespace local --grace-period=0 --force
2. 直接使用api-server删除
- 1. 启动proxy
kubectl proxy --port=8081
- 2. 导出ns的json格式
ns=cattle-fleet-system kubectl get ns $ns -o json > tmp.json
- 3. 修改json
修改spec为
"spec":{ ""}
- 4. 调用接口
ns=cattle-fleet-system curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8081/api/v1/namespaces/$ns/finalize
## 参考
1. [k3s集群监控(Rancher)删除之空间(namespace)cattle-system一直为Terminating状态解决方案_龍尐的博客-CSDN博客](https://blog.csdn.net/qq_37279279/article/details/107961464)
2. [kubernetes无法删除namespace 提示 Terminating_吕楚王的博客-CSDN博客_kubectl 删除命名空间](https://blog.csdn.net/tongzidane/article/details/88988542?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-2-88988542-blog-107213441.pc_relevant_multi_platform_whitelistv2&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-2-88988542-blog-107213441.pc_relevant_multi_platform_whitelistv2&utm_relevant_index=5)
docker rm -f $(docker ps -qa) docker volume rm $(docker volume ls -q) cleanupdirs="/var/lib/etcd /etc/kubernetes /etc/cni /opt/cni /var/lib/cni /var/run/calico" for dir in $cleanupdirs; do echo “Removing $dir” rm -rf $dir done
第二种方法
df -h|grep kubelet |awk -F % ‘{print $2}’|xargs umount
sudo docker rm -f $(sudo docker ps -qa)
sudo rm -rf /var/etcd
for m in $(sudo tac /proc/mounts | sudo awk ‘{print $2}’|sudo grep /var/lib/kubelet);do sudo umount $m||true done sudo rm -rf /var/lib/kubelet/
for m in $(sudo tac /proc/mounts | sudo awk ‘{print $2}’|sudo grep /var/lib/rancher);do sudo umount $m||true done sudo rm -rf /var/lib/rancher/
sudo rm -rf /run/kubernetes/
sudo docker volume rm $(sudo docker volume ls -q)
sudo docker ps -a sudo docker volume ls
rm /var/lib/kubelet/* -rf
rm /etc/kubernetes/* -rf
rm /var/lib/rancher/* -rf
rm /var/lib/etcd/* -rf
rm /var/lib/cni/* -rf
iptables -F && iptables -t nat -F
ip link del flannel.1
docker ps -a|awk ‘{print $1}’|xargs docker rm -f
docker volume ls|awk ‘{print $2}’|xargs docker volume rm
canal 启动失败
rancher Streaming server stopped unexpectedly: listen tcp [::1]:0: bind: cannot assign requested address
发现 /etc/hosts.conf 把locahost 设置为了 #::1 低版本对ipv6支持不好, 删除恢复了
Error from server (BadRequest): a container name must be specified for pod canal-4dj9f, choose one of: [install-cni flexvol-driver calico-node kube-flannel]
[rke@fs01-192-168-131-240 ~]$ kubectl -n kube-system logs canal-4dj9f calico-node
2024-02-20 09:18:56.465 [INFO][9] startup/startup.go 379: Early log level set to info
2024-02-20 09:18:56.465 [INFO][9] startup/startup.go 395: Using NODENAME environment for node name
2024-02-20 09:18:56.466 [INFO][9] startup/startup.go 407: Determined node name: 192.168.126.5
2024-02-20 09:18:56.467 [INFO][9] startup/startup.go 439: Checking datastore connection
2024-02-20 09:18:56.483 [INFO][9] startup/startup.go 463: Datastore connection verified
2024-02-20 09:18:56.484 [INFO][9] startup/startup.go 112: Datastore is ready
2024-02-20 09:18:56.510 [INFO][9] startup/startup.go 759: Using autodetected IPv4 address on interface br-daa07946aef5: 172.18.0.1/16
2024-02-20 09:18:56.510 [INFO][9] startup/startup.go 576: Node IPv4 changed, will check for conflicts
2024-02-20 09:18:56.518 [WARNING][9] startup/startup.go 1119: Calico node '192.168.126.16' is already using the IPv4 address 172.18.0.1.
2024-02-20 09:18:56.518 [WARNING][9] startup/startup.go 1331: Terminating
Calico node failed to start
不要在k8s集群上,直接运行其他的服务
node数量超了 参考:https://askubuntu.com/questions/1088272/inotify-add-watch-failed-no-space-left-on-device
1、 Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system 网络插件部署失败 清空iptables -F && iptables -F -t nat 2、x509: cannot validate certificate for x because it doesn't contain any IP SANs seen when using custom certificates 重启docker
etcd组件
1. 诡异的 K8S 滚动更新异常
1: 重新部署后,deployment总是提示部署中,可用数为0,重新生成的为2, 服务部署成功,kubelet正常,而kube-controller-manager的提示对象不是最新版本。
现象
a. 查看kube-controller-manager的日志
b. describe pod 状态 MinAvailablePodn为false c. 最近一天k8s主机的包含etcd的状态失败 分析刚刚有台主机etcd挂掉,使用rancher重新接入,有可能是etcd数据状态不一致导致,停掉kube-controller-manager,然后自动重定向到其他机器,发现状态恢复
参考