rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 396 forks source link

Network between pods (different nodes) is unreachable #859

Open Alehap opened 1 year ago

Alehap commented 1 year ago

Version (k3OS / kernel) k3os version v0.22.2-k3s2r0 5.13.0-1036-oracle #43~20.04.1-Ubuntu SMP Tue Jun 14 01:06:54 UTC 2022

Architecture aarch64

Describe the bug Network between pods from different nodes is unreachable

To Reproduce Fresh install K3OS with both pre-release & latest version is still come with the same issue; I'm using cloud instance of Oracle Cloud

curl -sfL https://github.com/rancher/k3os/releases/download/v0.22.2-k3s2r0/k3os-rootfs-arm.tar.gz | tar zxvf - --strip-components=1 -C /
cp /home/ubuntu/config.master.yaml /k3os/system/config.yaml
sync
reboot -f

This is the config.master.yaml

hostname: instance-965728507
k3os:
  dns_nameservers:
  - 8.8.8.8
  k3s_args:
  - server
  - --cluster-init
  - --node-ip=10.0.0.X
  - --node-external-ip=168.138.PUBLIC.IP
  - --cluster-cidr=172.17.0.0/16
  - --service-cidr=172.16.0.0/16
  modules:
  - kvm
  - nvme
  ntp_servers:
  - 0.vn.pool.ntp.org
  - 1.vn.pool.ntp.org
  sysctls:
    kernel.kptr_restrict: "1"
    kernel.printk: 4 4 1 7
  token: "XXXXXXXXXXXXXXXXXXXXXXXXXX"
ssh_authorized_keys:
- ssh-rsa XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX=

Expected behavior

instance-965728507 [/home/rancher]$ ping 172.17.0.8

PING 172.17.0.8 (172.17.0.8): 56 data bytes
64 bytes from 172.17.0.8: seq=0 ttl=64 time=0.073 ms
64 bytes from 172.17.0.8: seq=1 ttl=64 time=0.078 ms
^C
--- 172.17.0.8 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.073/0.075/0.078 ms

Ping same node is OK

Actual behavior instance-965728507 [/home/rancher]$ ping 172.17.1.2

PING 172.17.1.2 (172.17.1.2): 56 data bytes
^C
--- 172.17.1.2 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss

Ping to different node is NOT OK.

Additional context

instance-965728507 [/home/rancher]$ kubectl get nodes -o wide

NAME                 STATUS   ROLES                       AGE     VERSION        INTERNAL-IP   EXTERNAL-IP       OS-IMAGE              KERNEL-VERSION       CONTAINER-RUNTIME
instance-526528268   Ready    <none>                      83s     v1.22.2+k3s2   10.0.0.86     168.138.188.227   k3OS v0.22.2-k3s2r0   5.13.0-1036-oracle   containerd://1.5.7-k3s1
instance-965728507   Ready    control-plane,etcd,master   7m27s   v1.22.2+k3s2   10.0.0.105    168.138.166.197   k3OS v0.22.2-k3s2r0   5.13.0-1036-oracle   containerd://1.5.7-k3s1

instance-965728507 [/home/rancher]$ kubectl get pods -o wide --all-namespaces

NAMESPACE     NAME                                         READY   STATUS      RESTARTS        AGE     IP           NODE                 NOMINATED NODE   READINESS GATES
k3os-system   system-upgrade-controller-698f44f7d6-gbmxp   1/1     Running     0               7m19s   172.17.0.6   instance-965728507   <none>           <none>
kube-system   coredns-85cb69466-8zxtc                      1/1     Running     0               7m19s   172.17.0.3   instance-965728507   <none>           <none>
kube-system   helm-install-traefik--1-gg4ph                0/1     Completed   1               7m19s   172.17.0.4   instance-965728507   <none>           <none>
kube-system   helm-install-traefik-crd--1-cd6ds            0/1     Completed   0               7m19s   172.17.0.5   instance-965728507   <none>           <none>
kube-system   local-path-provisioner-64ffb68fd-zv94l       1/1     Running     0               7m19s   172.17.0.7   instance-965728507   <none>           <none>
kube-system   metrics-server-9cf544f65-tcllt               1/1     Running     0               7m19s   172.17.0.2   instance-965728507   <none>           <none>
kube-system   svclb-traefik-xkf5m                          2/2     Running     1 (5m59s ago)   6m32s   172.17.0.8   instance-965728507   <none>           <none>
kube-system   svclb-traefik-xkz82                          2/2     Running     1 (48s ago)     83s     172.17.1.2   instance-526528268   <none>           <none>
kube-system   traefik-74dd4975f9-6pm4z                     1/1     Running     0               6m32s   172.17.0.9   instance-965728507   <none>           <none>

instance-965728507 [/home/rancher]$ route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 enp0s3
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 enp0s3
10.0.0.1        0.0.0.0         255.255.255.255 UH    0      0        0 enp0s3
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
169.254.169.254 10.0.0.1        255.255.255.255 UGH   0      0        0 enp0s3
172.17.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
172.17.1.0      172.17.1.0      255.255.255.0   UG    0      0        0 flannel.1

This is fresh install of k3os on OCI. Base from Ubuntu 20.04 arm64