kubesphere / kubekey

Install Kubernetes/K3s only, both Kubernetes/K3s and KubeSphere, and related cloud-native add-ons, it supports all-in-one, multi-node, and HA 🔥 ⎈ 🐳
https://kubesphere.io
Apache License 2.0
2.3k stars 543 forks source link

kubekey安装后的集群会出现ipvs无法正确转发的问题,需要全部节点重启一次才会恢复(Debian11,Ubuntu22.04) #2152

Open lqdflying opened 7 months ago

lqdflying commented 7 months ago

What is version of KubeKey has the issue?

v3.0.13

What is your os environment?

Debian11,Ubuntu22.04

KubeKey config file

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: liuqd-k8
spec:
  hosts:
  - {name: k8-master1, address: 192.168.31.91, internalAddress: 192.168.31.91, user: liuqd, privateKeyPath: "~/.ssh/id_rsa"}
  - {name: k8-master2, address: 192.168.31.92, internalAddress: 192.168.31.92, user: liuqd, privateKeyPath: "~/.ssh/id_rsa"}
  - {name: k8-master3, address: 192.168.31.93, internalAddress: 192.168.31.93, user: liuqd, privateKeyPath: "~/.ssh/id_rsa"}
  - {name: k8-node1, address: 192.168.31.94, internalAddress: 192.168.31.94, user: liuqd, privateKeyPath: "~/.ssh/id_rsa"}
  - {name: k8-node2, address: 192.168.31.95, internalAddress: 192.168.31.95, user: liuqd, privateKeyPath: "~/.ssh/id_rsa"}
#  - {name: k8-node3, address: 192.168.31.96, internalAddress: 192.168.31.96, user: liuqd, privateKeyPath: "~/.ssh/id_rsa"}
  roleGroups:
    etcd:
    - k8-master1
    - k8-master2
    - k8-master3
    control-plane:
    - k8-master1
    - k8-master2
    - k8-master3
    worker:
    - k8-node1
    - k8-node2
 #   - k8-node3
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: k8-lb.liuqd.sg
    address: 192.168.31.97
    port: 6443
  kubernetes:
    version: v1.23.17
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: docker
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.10.64.0/18
    kubeServiceCIDR: 10.10.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: ""
    namespaceOverride: ""
    registryMirrors: ["https://harbor.022010.xyz"]
    insecureRegistries: []
  addons: []

A clear and concise description of what happend.

kk 安装完K8后,组件pod可以启动无报错,但是查询coredns会有大量错误log持续输出

coredns long见下方

新建测试pod并配置nodeport后,发现只有在pod实际运行的node上,clusterIP/podIP/nodeport才都通,其他节点均无法正常转发. 怀疑是ipvs转发的问题. kube-proxy无报错日志

Relevant log output

#### coredns log如下

[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:48051->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:47347->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:47428->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:54594->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:53037->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:46755->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:56905->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:53616->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:47113->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:36468->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:58694->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:46367->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:39482->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:46495->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:56165->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:48759->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:42161->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:43550->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:58893->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:39743->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:48881->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:37857->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:33103->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:52553->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:58903->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:51113->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:47446->1.1.1.1:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:35214->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:45971->192.168.31.254:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.10.123.2:59032->1.1.1.1:53: i/o timeout

### kube-proxy日志如下
```sh
I0302 02:13:39.469985       1 node.go:163] Successfully retrieved node IP: 192.168.31.94
I0302 02:13:39.470169       1 server_others.go:138] "Detected node IP" address="192.168.31.94"
I0302 02:13:39.490837       1 server_others.go:269] "Using ipvs Proxier"
I0302 02:13:39.491024       1 server_others.go:271] "Creating dualStackProxier for ipvs"
I0302 02:13:39.491063       1 server_others.go:502] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0302 02:13:39.491364       1 proxier.go:435] "IPVS scheduler not specified, use rr by default"
I0302 02:13:39.491541       1 proxier.go:435] "IPVS scheduler not specified, use rr by default"
I0302 02:13:39.491605       1 ipset.go:113] "Ipset name truncated" ipSetName="KUBE-6-LOAD-BALANCER-SOURCE-CIDR" truncatedName="KUBE-6-LOAD-BALANCER-SOURCE-CID"
I0302 02:13:39.491681       1 ipset.go:113] "Ipset name truncated" ipSetName="KUBE-6-NODE-PORT-LOCAL-SCTP-HASH" truncatedName="KUBE-6-NODE-PORT-LOCAL-SCTP-HAS"
I0302 02:13:39.491808       1 server.go:656] "Version info" version="v1.23.17"
I0302 02:13:39.495219       1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_max" value=131072
I0302 02:13:39.499099       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0302 02:13:39.499739       1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600
I0302 02:13:39.500247       1 config.go:317] "Starting service config controller"
I0302 02:13:39.500313       1 shared_informer.go:240] Waiting for caches to sync for service config
I0302 02:13:39.500382       1 config.go:226] "Starting endpoint slice config controller"
I0302 02:13:39.500402       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0302 02:13:39.501146       1 config.go:444] "Starting node config controller"
I0302 02:13:39.501283       1 shared_informer.go:240] Waiting for caches to sync for node config
I0302 02:13:39.601421       1 shared_informer.go:247] Caches are synced for node config 
I0302 02:13:39.601474       1 shared_informer.go:247] Caches are synced for service config 
I0302 02:13:39.601570       1 shared_informer.go:247] Caches are synced for endpoint slice config 

master1节点ipvsadm配置如下(`但是master1节点无法正常通过clusterIP/nodepord去访问测试pod):

root@k8-master1:~# ipvsadm  -l  --stats
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port               Conns   InPkts  OutPkts  InBytes OutBytes
  -> RemoteAddress:Port
TCP  k8-master1:31087                    0        0        0        0        0
  -> 10.10.127.1:ssh                     0        0        0        0        0
TCP  k8-master1:31087                    0        0        0        0        0
  -> 10.10.127.1:ssh                     0        0        0        0        0
TCP  k8-master1.liuqd.sg:31087           1        5        0      260        0
  -> 10.10.127.1:ssh                     1        5        0      260        0
TCP  k8-master1:https                   10     4643     3436   418493  3099316
  -> k8-master1.liuqd.sg:6443            3     2800     2323   248277  2647840
  -> k8-master2.cluster.local:644        4     1013      657   101044   350769
  -> k8-master3.cluster.local:644        3      830      456    69172   100707
TCP  k8-master1:domain                   0        0        0        0        0
  -> 10.10.123.1:domain                  0        0        0        0        0
  -> 10.10.123.2:domain                  0        0        0        0        0
TCP  k8-master1:9153                     0        0        0        0        0
  -> 10.10.123.1:9153                    0        0        0        0        0
  -> 10.10.123.2:9153                    0        0        0        0        0
TCP  k8-master1:ssh                      1        2        0      120        0
  -> 10.10.127.1:ssh                     1        2        0      120        0
TCP  k8-master1:31087                    0        0        0        0        0
  -> 10.10.127.1:ssh                     0        0        0        0        0
UDP  k8-master1:domain               21867    22346        0  1005570        0
  -> 10.10.123.1:domain              10933    11194        0   503730        0
  -> 10.10.123.2:domain              10934    11152        0   501840        0
root@k8-master1:~# 

### Additional information

全部节点硬重启一次,恢复正常.看上去像是转发没生效,但是我查询forward参数是有的
```sh
root@k8-master1:~# sysctl -p|grep forward
net.ipv4.ip_forward = 1

我的测试deployment描述文件如下:

---
apiVersion: v1
kind: Service
metadata:
  name: ostools-svc
  namespace: default
spec:
  type: NodePort
  ports:
  - name: ostools
    port: 22
    protocol: TCP
    targetPort: 22
  selector:
    app: ostools
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ostools-deployment
  namespace: default
  labels:
    app: ostools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ostools
  template:
    metadata:
      labels:
        app: ostools
    spec:
#      nodeSelector:
#        kubernetes.io/hostname: k8-node3
      containers:
      - name: ostools
        image: mybiz-tools:2.3
        ports:
        - containerPort: 22
celldance commented 6 months ago

@lqdflying 我也遇到这个问题了,在ubuntu18没问题,升级到22.04就不行了,又解决的办法吗

lqdflying commented 6 months ago

@lqdflying 我也遇到这个问题了,在ubuntu18没问题,升级到22.04就不行了,又解决的办法吗

和我现象一样吗? 安装完后, ipvs不工作, coredns会i/o timeout? 你18正常?

zheng1 commented 6 months ago

It may be related to the rp_filter parameter in sysctl, which can be check to see if it is equal to 2

celldance commented 6 months ago

@lqdflying 我也遇到这个问题了,在ubuntu18没问题,升级到22.04就不行了,又解决的办法吗

和我现象一样吗? 安装完后, ipvs不工作, coredns会i/o timeout? 你18正常?

我单节点,pod之间不通。ubuntu18没事,ubuntu22硬重启也不管用

image

celldance commented 6 months ago

It may be related to the rp_filter parameter in sysctl, which can be check to see if it is equal to 2

@zheng1 all rp_filter equal 2

# sysctl -a | grep rp_filter | grep -v cali | grep -v arp
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.docker0.rp_filter = 2
net.ipv4.conf.ens3f0.rp_filter = 2
net.ipv4.conf.ens3f1.rp_filter = 2
net.ipv4.conf.ens3f2.rp_filter = 2
net.ipv4.conf.ens3f3.rp_filter = 2
net.ipv4.conf.kube-ipvs0.rp_filter = 2
net.ipv4.conf.lo.rp_filter = 2
net.ipv4.conf.nodelocaldns.rp_filter = 2
net.ipv4.conf.tunl0.rp_filter = 2
net.ipv4.conf.veth39d3770.rp_filter = 2
net.ipv4.conf.vetha30d89b.rp_filter = 2
net.ipv4.conf.vethcfe9ce6.rp_filter = 2

prefix cali* all equal 2
lqdflying commented 6 months ago

It may be related to the rp_filter parameter in sysctl, which can be check to see if it is equal to 2

Woo, eventually a member notice this case. So pls let me explain a little more:

Recently, after dozens of K8 with KK re-installations and numerous attempts, I found that the issue ultimately pointed to one point:

  1. All nodes must explicitly add echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
  2. All nodes must reboot. Even sysctl -p doesn't work.

After that, the K8-with-kk installation will succeed in one go without any addtional reboot required after the KK-scripts completes the run. I use the default docker as the CRI and I also notice that kk would update the sysctl.conf during it runs and add net.ipv4.ip_forward=1 to the sysctl.conf. To be frankly speaking, I hv no idea how come all node need a explicit echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf and Reboot. But on my side, it dose works. Any advice/comment on it from ur side? many tks.

celldance commented 6 months ago

It may be related to the rp_filter parameter in sysctl, which can be check to see if it is equal to 2

Woo, eventually a member notice this case. So pls let me explain a little more:

Recently, after dozens of K8 with KK re-installations and numerous attempts, I found that the issue ultimately pointed to one point:

  1. All nodes must explicitly add echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
  2. All nodes must reboot. Even sysctl -p doesn't work.

After that, the K8-with-kk installation will succeed in one go without any addtional reboot required after the KK-scripts completes the run. I use the default docker as the CRI and I also notice that kk would update the sysctl.conf during it runs and add net.ipv4.ip_forward=1 to the sysctl.conf. To be frankly speaking, I hv no idea how come all node need a explicit echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf and Reboot. But on my side, it dose works. Any advice/comment on it from ur side? many tks.

thanks @lqdflying @zheng1 zheng , you saved me

but echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf and reboot is not work for me.

I add net.ipv4.conf.all.rp_filter=1 net.ipv4.conf.default.rp_filter=1 to /etc/sysctl.conf file, and reboot, my k8s is worked