k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.82k stars 2.26k forks source link

Can't reach internet from pod / container #5349

Closed apiening closed 2 years ago

apiening commented 2 years ago

Environmental Info: K3s Version:

k3s -v
k3s version v1.22.7+k3s1 (8432d7f2)
go version go1.16.10

Host OS Version:

cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

IP Forwarding:

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

Node(s) CPU architecture, OS, and Version:

Linux ansible-awx 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: Single node.

# k3s kubectl get nodes -o wide
NAME          STATUS   ROLES                  AGE     VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ansible-awx   Ready    control-plane,master   5d10h   v1.22.7+k3s1   10.164.12.6   <none>        Ubuntu 20.04.4 LTS   5.4.0-105-generic   containerd://1.5.9-k3s1

Describe the bug: I cannot connect to the internet from within the pod / container:

# time curl https://www.google.de
curl: (7) Failed to connect to www.google.de port 443: Connection timed out

real    2m11.892s
user    0m0.005s
sys     0m0.005s

Steps To Reproduce: Install one node k3s cluster with curl -sfL https://get.k3s.io | sh on a ubuntu 20.04 VM. Setup a simple workload (in my case AWX - https://github.com/ansible/awx-operator#basic-install-on-existing-cluster). Enter a container and try to access the internet (for example with curl on a public address).

Expected behavior: Accessing the internet should be working the same way like it is from the host.

Actual behavior: No connectivity to the internet from the pod / container at all.

Additional context / logs:

# cat /etc/resolv.conf 
search awx.svc.cluster.local svc.cluster.local cluster.local mydomain.com
nameserver 10.43.0.10
options ndots:5
brandond commented 2 years ago

Sounds like something in your environment is blocking the outbound traffic from the pod network. Have you disabled firewalld, ufw, or anything else that might be interfering with the iptables rules added by the kubelet and CNI? Is there anything odd on your network with regards to MTU or multiple interface configurations?

apiening commented 2 years ago

Hi @brandond, this is a pretty clean install with no firewalls installed or enabled.

# ufw status
Status: inactive

The VM has one single interface that is connected to the local network. I have no connectivity issues from the host itself.

apiening commented 2 years ago

I'm still suffering from this issue.

In order to analyze the connectivity issue I deployed a busybox container and checked the routes:

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.42.0.1       0.0.0.0         UG    0      0        0 eth0
10.42.0.0       0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.42.0.0       10.42.0.1       255.255.0.0     UG    0      0        0 eth0

I can ping the gateway:

# ping 10.42.0.1
PING 10.42.0.1 (10.42.0.1): 56 data bytes
64 bytes from 10.42.0.1: seq=0 ttl=64 time=0.048 ms

I can even ping external / public IPs, which I honestly did not expect:

# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=58 time=5.147 ms

But HTTPs get requests, doesn't work:

# time wget https://www.google.com/
Connecting to www.google.com (142.250.185.228:443)
wget: can't connect to remote host (142.250.185.228): Operation timed out
Command exited with non-zero status 1
real    2m 11.31s
user    0m 0.00s
sys     0m 0.00s

Conclusion: I can ping public addresses and DNS is working. But I cannot do HTTPs requests. Any idea what may cause this?

apiening commented 2 years ago

I found an issue that I think is related to my issue: https://github.com/k3s-io/k3s/issues/763 The issue has been closed, even though there was no solution provided. Instead cilium was used instead of flannel as a workaround.

Is flannel generally not advised to be used as the CNI of choice?

brandond commented 2 years ago

Flannel works fine in pretty much every environment. Those that choose an alternative usually do so in search of a specific feature, not because flannel doesn't work.

Have you done a tcpdump on the host to see what's happening? Do you see the traffic leaving the physical interface? Do you see response packets coming back in? Can you confirm that there's not something outside this host (firewall, etc) blocking your HTTP requests while allowing ICMP and DNS?

apiening commented 2 years ago

Thank you for your hint @brandond: I've done a simple capture with tcpdump -i cni0 -nn -s0 -v -l host <container-ip> and observed, that the hostname www.google.com resolved to an IPv6 address. However, the host only had a link-local address and no IPv6 connectivity to the public internet.

I then disabled IPv6 on the host, rebooted and re-created the busybox container and the command wget https://www.google.com once ran successfully. All following tries were failing but every once in a while one request works. I don't understand this and honestly I'm running out of ideas what I can check or try next.

I have attached the tcpdump output while doing a wget "https://github.com/samdoran/demo-playbooks". I hope you can spot something that may cause this strange issue since I'm not very experienced in reading tcpdump output.

21:51:56.847602 IP (tos 0x0, ttl 64, id 52459, offset 0, flags [DF], proto UDP (17), length 79)
    10.42.0.31.36415 > 10.42.0.23.53: 57082+ A? github.com.test.svc.cluster.local. (51)
21:51:56.847604 IP (tos 0x0, ttl 63, id 52459, offset 0, flags [DF], proto UDP (17), length 79)
    10.42.0.31.36415 > 10.42.0.23.53: 57082+ A? github.com.test.svc.cluster.local. (51)
21:51:56.847736 IP (tos 0x0, ttl 64, id 22395, offset 0, flags [DF], proto UDP (17), length 172)
    10.42.0.23.53 > 10.42.0.31.36415: 57082 NXDomain*- 0/1/0 (144)
21:51:56.847757 IP (tos 0x0, ttl 64, id 22396, offset 0, flags [DF], proto UDP (17), length 172)
    10.42.0.23.53 > 10.42.0.31.36415: 57082 NXDomain*- 0/1/0 (144)
21:51:56.847769 IP (tos 0x0, ttl 64, id 52460, offset 0, flags [DF], proto UDP (17), length 79)
    10.42.0.31.36415 > 10.42.0.23.53: 57393+ AAAA? github.com.test.svc.cluster.local. (51)
21:51:56.847769 IP (tos 0x0, ttl 63, id 52460, offset 0, flags [DF], proto UDP (17), length 79)
    10.42.0.31.36415 > 10.42.0.23.53: 57393+ AAAA? github.com.test.svc.cluster.local. (51)
21:51:56.847837 IP (tos 0x0, ttl 64, id 22397, offset 0, flags [DF], proto UDP (17), length 172)
    10.42.0.23.53 > 10.42.0.31.36415: 57393 NXDomain*- 0/1/0 (144)
21:51:56.847873 IP (tos 0x0, ttl 64, id 52461, offset 0, flags [DF], proto UDP (17), length 74)
    10.42.0.31.48389 > 10.42.0.23.53: 35453+ A? github.com.svc.cluster.local. (46)
21:51:56.847874 IP (tos 0x0, ttl 64, id 22398, offset 0, flags [DF], proto UDP (17), length 172)
    10.42.0.23.53 > 10.42.0.31.36415: 57393 NXDomain*- 0/1/0 (144)
21:51:56.847875 IP (tos 0x0, ttl 63, id 52461, offset 0, flags [DF], proto UDP (17), length 74)
    10.42.0.31.48389 > 10.42.0.23.53: 35453+ A? github.com.svc.cluster.local. (46)
21:51:56.847881 IP (tos 0x0, ttl 64, id 52462, offset 0, flags [DF], proto UDP (17), length 74)
    10.42.0.31.48389 > 10.42.0.23.53: 35623+ AAAA? github.com.svc.cluster.local. (46)
21:51:56.847882 IP (tos 0x0, ttl 63, id 52462, offset 0, flags [DF], proto UDP (17), length 74)
    10.42.0.31.48389 > 10.42.0.23.53: 35623+ AAAA? github.com.svc.cluster.local. (46)
21:51:56.847884 IP (tos 0xc0, ttl 64, id 45260, offset 0, flags [none], proto ICMP (1), length 200)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 36415 unreachable, length 180
    IP (tos 0x0, ttl 64, id 22398, offset 0, flags [DF], proto UDP (17), length 172)
    10.42.0.23.53 > 10.42.0.31.36415: 57393 NXDomain*- 0/1/0 (144)
21:51:56.847885 IP (tos 0xc0, ttl 63, id 45260, offset 0, flags [none], proto ICMP (1), length 200)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 36415 unreachable, length 180
    IP (tos 0x0, ttl 64, id 22398, offset 0, flags [DF], proto UDP (17), length 172)
    10.42.0.23.53 > 10.42.0.31.36415: 57393 NXDomain*- 0/1/0 (144)
21:51:56.847943 IP (tos 0x0, ttl 64, id 22399, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35623 NXDomain*- 0/1/0 (139)
21:51:56.847996 IP (tos 0x0, ttl 64, id 22400, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35453 NXDomain*- 0/1/0 (139)
21:51:56.848047 IP (tos 0x0, ttl 64, id 22401, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35453 NXDomain*- 0/1/0 (139)
21:51:56.848047 IP (tos 0x0, ttl 64, id 52463, offset 0, flags [DF], proto UDP (17), length 70)
    10.42.0.31.40711 > 10.42.0.23.53: 3858+ A? github.com.cluster.local. (42)
21:51:56.848050 IP (tos 0x0, ttl 63, id 52463, offset 0, flags [DF], proto UDP (17), length 70)
    10.42.0.31.40711 > 10.42.0.23.53: 3858+ A? github.com.cluster.local. (42)
21:51:56.848054 IP (tos 0xc0, ttl 64, id 45261, offset 0, flags [none], proto ICMP (1), length 195)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 48389 unreachable, length 175
    IP (tos 0x0, ttl 64, id 22401, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35453 NXDomain*- 0/1/0 (139)
21:51:56.848056 IP (tos 0xc0, ttl 63, id 45261, offset 0, flags [none], proto ICMP (1), length 195)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 48389 unreachable, length 175
    IP (tos 0x0, ttl 64, id 22401, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35453 NXDomain*- 0/1/0 (139)
21:51:56.848059 IP (tos 0x0, ttl 64, id 52464, offset 0, flags [DF], proto UDP (17), length 70)
    10.42.0.31.40711 > 10.42.0.23.53: 4148+ AAAA? github.com.cluster.local. (42)
21:51:56.848062 IP (tos 0x0, ttl 63, id 52464, offset 0, flags [DF], proto UDP (17), length 70)
    10.42.0.31.40711 > 10.42.0.23.53: 4148+ AAAA? github.com.cluster.local. (42)
21:51:56.848086 IP (tos 0x0, ttl 64, id 22402, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35623 NXDomain*- 0/1/0 (139)
21:51:56.848092 IP (tos 0xc0, ttl 64, id 45262, offset 0, flags [none], proto ICMP (1), length 195)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 48389 unreachable, length 175
    IP (tos 0x0, ttl 64, id 22402, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35623 NXDomain*- 0/1/0 (139)
21:51:56.848093 IP (tos 0xc0, ttl 63, id 45262, offset 0, flags [none], proto ICMP (1), length 195)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 48389 unreachable, length 175
    IP (tos 0x0, ttl 64, id 22402, offset 0, flags [DF], proto UDP (17), length 167)
    10.42.0.23.53 > 10.42.0.31.48389: 35623 NXDomain*- 0/1/0 (139)
21:51:56.848143 IP (tos 0x0, ttl 64, id 22403, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 4148 NXDomain*- 0/1/0 (135)
21:51:56.848192 IP (tos 0x0, ttl 64, id 22404, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 3858 NXDomain*- 0/1/0 (135)
21:51:56.848221 IP (tos 0x0, ttl 64, id 22405, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 3858 NXDomain*- 0/1/0 (135)
21:51:56.848222 IP (tos 0x0, ttl 64, id 52465, offset 0, flags [DF], proto UDP (17), length 56)
    10.42.0.31.56357 > 10.42.0.23.53: 52336+ A? github.com. (28)
21:51:56.848224 IP (tos 0x0, ttl 63, id 52465, offset 0, flags [DF], proto UDP (17), length 56)
    10.42.0.31.56357 > 10.42.0.23.53: 52336+ A? github.com. (28)
21:51:56.848228 IP (tos 0xc0, ttl 64, id 45263, offset 0, flags [none], proto ICMP (1), length 191)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 40711 unreachable, length 171
    IP (tos 0x0, ttl 64, id 22405, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 3858 NXDomain*- 0/1/0 (135)
21:51:56.848229 IP (tos 0xc0, ttl 63, id 45263, offset 0, flags [none], proto ICMP (1), length 191)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 40711 unreachable, length 171
    IP (tos 0x0, ttl 64, id 22405, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 3858 NXDomain*- 0/1/0 (135)
21:51:56.848233 IP (tos 0x0, ttl 64, id 52466, offset 0, flags [DF], proto UDP (17), length 56)
    10.42.0.31.56357 > 10.42.0.23.53: 52556+ AAAA? github.com. (28)
21:51:56.848235 IP (tos 0x0, ttl 63, id 52466, offset 0, flags [DF], proto UDP (17), length 56)
    10.42.0.31.56357 > 10.42.0.23.53: 52556+ AAAA? github.com. (28)
21:51:56.848255 IP (tos 0x0, ttl 64, id 22406, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 4148 NXDomain*- 0/1/0 (135)
21:51:56.848261 IP (tos 0xc0, ttl 64, id 45264, offset 0, flags [none], proto ICMP (1), length 191)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 40711 unreachable, length 171
    IP (tos 0x0, ttl 64, id 22406, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 4148 NXDomain*- 0/1/0 (135)
21:51:56.848262 IP (tos 0xc0, ttl 63, id 45264, offset 0, flags [none], proto ICMP (1), length 191)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 40711 unreachable, length 171
    IP (tos 0x0, ttl 64, id 22406, offset 0, flags [DF], proto UDP (17), length 163)
    10.42.0.23.53 > 10.42.0.31.40711: 4148 NXDomain*- 0/1/0 (135)
21:51:56.848647 IP (tos 0x0, ttl 64, id 22407, offset 0, flags [DF], proto UDP (17), length 153)
    10.42.0.23.53 > 10.42.0.31.56357: 52556 0/1/0 (125)
21:51:56.848703 IP (tos 0x0, ttl 64, id 22408, offset 0, flags [DF], proto UDP (17), length 430)
    10.42.0.23.53 > 10.42.0.31.56357: 52336 1/8/0 github.com. A 140.82.121.3 (402)
21:51:56.848755 IP (tos 0x0, ttl 64, id 22409, offset 0, flags [DF], proto UDP (17), length 430)
    10.42.0.23.53 > 10.42.0.31.56357: 52336 1/8/0 github.com. A 140.82.121.3 (402)
21:51:56.848760 IP (tos 0x0, ttl 64, id 51637, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0x3d2b), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1811957441 ecr 0,nop,wscale 7], length 0
21:51:56.848762 IP (tos 0xc0, ttl 64, id 45265, offset 0, flags [none], proto ICMP (1), length 458)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 56357 unreachable, length 438
    IP (tos 0x0, ttl 64, id 22409, offset 0, flags [DF], proto UDP (17), length 430)
    10.42.0.23.53 > 10.42.0.31.56357: 52336 1/8/0 github.com. A 140.82.121.3 (402)
21:51:56.848763 IP (tos 0xc0, ttl 63, id 45265, offset 0, flags [none], proto ICMP (1), length 458)
    10.42.0.31 > 10.42.0.23: ICMP 10.42.0.31 udp port 56357 unreachable, length 438
    IP (tos 0x0, ttl 64, id 22409, offset 0, flags [DF], proto UDP (17), length 430)
    10.42.0.23.53 > 10.42.0.31.56357: 52336 1/8/0 github.com. A 140.82.121.3 (402)
21:51:56.848795 IP (tos 0x0, ttl 64, id 22410, offset 0, flags [DF], proto UDP (17), length 140)
    10.42.0.23.53 > 10.42.0.31.56357: 52556 0/1/0 (112)
21:51:57.882086 IP (tos 0x0, ttl 64, id 51638, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0x3922), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1811958474 ecr 0,nop,wscale 7], length 0
21:51:59.894085 IP (tos 0x0, ttl 64, id 51639, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0x3146), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1811960486 ecr 0,nop,wscale 7], length 0
21:52:01.878099 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.42.0.31 tell 10.42.0.23, length 28
21:52:01.878100 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.42.0.1 tell 10.42.0.31, length 28
21:52:01.878106 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.42.0.1 is-at f2:25:4f:3b:53:2c, length 28
21:52:01.878112 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.42.0.31 is-at ea:10:2a:44:ac:eb, length 28
21:52:03.926110 IP (tos 0x0, ttl 64, id 51640, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0x2186), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1811964518 ecr 0,nop,wscale 7], length 0
21:52:12.118101 IP (tos 0x0, ttl 64, id 51641, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0x0186), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1811972710 ecr 0,nop,wscale 7], length 0
21:52:28.246106 IP (tos 0x0, ttl 64, id 51642, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0xc285), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1811988838 ecr 0,nop,wscale 7], length 0
21:52:33.366090 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.42.0.1 tell 10.42.0.31, length 28
21:52:33.366098 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.42.0.1 is-at f2:25:4f:3b:53:2c, length 28
21:53:00.758141 IP (tos 0x0, ttl 64, id 51643, offset 0, flags [DF], proto TCP (6), length 60)
    10.42.0.31.48018 > 140.82.121.3.443: Flags [S], cksum 0x0fcd (incorrect -> 0x4385), seq 4244803833, win 64860, options [mss 1410,sackOK,TS val 1812021350 ecr 0,nop,wscale 7], length 0
21:53:05.878080 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.42.0.1 tell 10.42.0.31, length 28
21:53:05.878085 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.42.0.1 is-at f2:25:4f:3b:53:2c, length 28
brandond commented 2 years ago

I see traffic back and forth between the test pod and the dns pod, and I see the test pod sending traffic out to github at 140.82.121.3 but I don't see a reply. Can you see what you see on the actual physical interface? Do you see the response coming back from github? If not then there's likely something going on outside your node.

apiening commented 2 years ago

Thank's again @brandond,

the host where k3s is running is a VM. Instead of capturing on cni0 I did a capture on eth0 which is the interface of the VM and also has the default route set.

Please ignore that the host jumps between 140.82.121.4 and 140.82.121.3, that is because the github.com domain resolves to different IPs every few attempts.

# tcpdump -i eth0 -nn -s0 -v -l host 140.82.121.4
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:45:52.073261 IP (tos 0x0, ttl 63, id 3041, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x2a15), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404832403 ecr 0,nop,wscale 7], length 0
14:45:53.078112 IP (tos 0x0, ttl 63, id 3042, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x2628), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404833408 ecr 0,nop,wscale 7], length 0
14:45:55.094092 IP (tos 0x0, ttl 63, id 3043, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x1e48), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404835424 ecr 0,nop,wscale 7], length 0
14:45:59.254096 IP (tos 0x0, ttl 63, id 3044, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x0e08), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404839584 ecr 0,nop,wscale 7], length 0
14:46:07.450109 IP (tos 0x0, ttl 63, id 3045, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xee03), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404847780 ecr 0,nop,wscale 7], length 0

I did an additional capture on the host with the actual physical interface and where the VM with k3s is running. First on the internal bridge vmbr1 where the VM is attached to:

# tcpdump -i vmbr1 -nn -s0 -v -l host 140.82.121.4                                                                                                                                               [16:51:54]
tcpdump: listening on vmbr1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:52:18.878518 IP (tos 0x0, ttl 63, id 64964, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xceba), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405219208 ecr 0,nop,wscale 7], length 0
16:52:19.894581 IP (tos 0x0, ttl 63, id 64965, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xcac2), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405220224 ecr 0,nop,wscale 7], length 0
16:52:21.910551 IP (tos 0x0, ttl 63, id 64966, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xc2e2), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405222240 ecr 0,nop,wscale 7], length 0
16:52:26.070570 IP (tos 0x0, ttl 63, id 64967, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xb2a2), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405226400 ecr 0,nop,wscale 7], length 0

Then I did another capture on the bridge vmbr0 where the physical interface enp7s0 is attached to:

# tcpdump -i vmbr0 -nn -s0 -v -l host 140.82.121.3                                                                                                                                               [16:55:53]
tcpdump: listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:56:00.101072 IP (tos 0x0, ttl 62, id 45737, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.6978 > 140.82.121.3.443: Flags [S], cksum 0x9d43 (incorrect -> 0xd9cf), seq 4022764557, win 64860, options [mss 1410,sackOK,TS val 1873400693 ecr 0,nop,wscale 7], length 0
16:56:01.110397 IP (tos 0x0, ttl 62, id 45738, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.6978 > 140.82.121.3.443: Flags [S], cksum 0x9d43 (incorrect -> 0xd5de), seq 4022764557, win 64860, options [mss 1410,sackOK,TS val 1873401702 ecr 0,nop,wscale 7], length 0
16:56:03.126453 IP (tos 0x0, ttl 62, id 45739, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.6978 > 140.82.121.3.443: Flags [S], cksum 0x9d43 (incorrect -> 0xcdfe), seq 4022764557, win 64860, options [mss 1410,sackOK,TS val 1873403718 ecr 0,nop,wscale 7], length 0

As a last step, I did a capture on enp7s0:

# tcpdump -i enp7s0 -nn -s0 -v -l host 140.82.121.4                                                                                                                                              [16:59:11]
tcpdump: listening on enp7s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:59:18.536394 IP (tos 0x0, ttl 62, id 54222, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.28100 > 140.82.121.4.443: Flags [S], cksum 0x9d44 (incorrect -> 0x7438), seq 2184615323, win 64860, options [mss 1410,sackOK,TS val 2405638866 ecr 0,nop,wscale 7], length 0
16:59:18.998438 IP (tos 0x0, ttl 62, id 17537, offset 0, flags [DF], proto TCP (6), length 76)
    162.55.245.135.35716 > 140.82.121.4.443: Flags [FP.], cksum 0x9d54 (incorrect -> 0x583c), seq 3605376996:3605377020, ack 3707823094, win 2345, options [nop,nop,TS val 2405639328 ecr 2485731493], length 24
16:59:19.546323 IP (tos 0x0, ttl 62, id 54223, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.28100 > 140.82.121.4.443: Flags [S], cksum 0x9d44 (incorrect -> 0x7046), seq 2184615323, win 64860, options [mss 1410,sackOK,TS val 2405639876 ecr 0,nop,wscale 7], length 0
16:59:21.558342 IP (tos 0x0, ttl 62, id 54224, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.28100 > 140.82.121.4.443: Flags [S], cksum 0x9d44 (incorrect -> 0x686a), seq 2184615323, win 64860, options [mss 1410,sackOK,TS val 2405641888 ecr 0,nop,wscale 7], length 0

So the traffic is passing through Flannel and to the internal bridge, to the external bridge and even the physical network adapter. All stating something like cksum X (incorrect -> Y) can you tell what that means?

Every 3rd to 5th attempt running wget "https://github.com/samdoran/demo-playbooks" while doing the capture, the download succeeds. I have repeated the test in this case to document the issue. I can resolve hostnames and ping public hosts at all times without a single lost packet. The wget command runs on the VM / k3s node and on the host every single time.

manuelbuil commented 2 years ago

Thank's again @brandond,

the host where k3s is running is a VM. Instead of capturing on cni0 I did a capture on eth0 which is the interface of the VM and also has the default route set.

Please ignore that the host jumps between 140.82.121.4 and 140.82.121.3, that is because the github.com domain resolves to different IPs every few attempts.

# tcpdump -i eth0 -nn -s0 -v -l host 140.82.121.4
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:45:52.073261 IP (tos 0x0, ttl 63, id 3041, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x2a15), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404832403 ecr 0,nop,wscale 7], length 0
14:45:53.078112 IP (tos 0x0, ttl 63, id 3042, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x2628), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404833408 ecr 0,nop,wscale 7], length 0
14:45:55.094092 IP (tos 0x0, ttl 63, id 3043, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x1e48), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404835424 ecr 0,nop,wscale 7], length 0
14:45:59.254096 IP (tos 0x0, ttl 63, id 3044, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0x0e08), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404839584 ecr 0,nop,wscale 7], length 0
14:46:07.450109 IP (tos 0x0, ttl 63, id 3045, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.22483 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xee03), seq 2556866272, win 64860, options [mss 1410,sackOK,TS val 2404847780 ecr 0,nop,wscale 7], length 0

I did an additional capture on the host with the actual physical interface and where the VM with k3s is running. First on the internal bridge vmbr1 where the VM is attached to:

# tcpdump -i vmbr1 -nn -s0 -v -l host 140.82.121.4                                                                                                                                               [16:51:54]
tcpdump: listening on vmbr1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:52:18.878518 IP (tos 0x0, ttl 63, id 64964, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xceba), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405219208 ecr 0,nop,wscale 7], length 0
16:52:19.894581 IP (tos 0x0, ttl 63, id 64965, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xcac2), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405220224 ecr 0,nop,wscale 7], length 0
16:52:21.910551 IP (tos 0x0, ttl 63, id 64966, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xc2e2), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405222240 ecr 0,nop,wscale 7], length 0
16:52:26.070570 IP (tos 0x0, ttl 63, id 64967, offset 0, flags [DF], proto TCP (6), length 60)
    10.164.12.6.11288 > 140.82.121.4.443: Flags [S], cksum 0x1c2f (incorrect -> 0xb2a2), seq 951037618, win 64860, options [mss 1410,sackOK,TS val 2405226400 ecr 0,nop,wscale 7], length 0

Then I did another capture on the bridge vmbr0 where the physical interface enp7s0 is attached to:

# tcpdump -i vmbr0 -nn -s0 -v -l host 140.82.121.3                                                                                                                                               [16:55:53]
tcpdump: listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:56:00.101072 IP (tos 0x0, ttl 62, id 45737, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.6978 > 140.82.121.3.443: Flags [S], cksum 0x9d43 (incorrect -> 0xd9cf), seq 4022764557, win 64860, options [mss 1410,sackOK,TS val 1873400693 ecr 0,nop,wscale 7], length 0
16:56:01.110397 IP (tos 0x0, ttl 62, id 45738, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.6978 > 140.82.121.3.443: Flags [S], cksum 0x9d43 (incorrect -> 0xd5de), seq 4022764557, win 64860, options [mss 1410,sackOK,TS val 1873401702 ecr 0,nop,wscale 7], length 0
16:56:03.126453 IP (tos 0x0, ttl 62, id 45739, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.6978 > 140.82.121.3.443: Flags [S], cksum 0x9d43 (incorrect -> 0xcdfe), seq 4022764557, win 64860, options [mss 1410,sackOK,TS val 1873403718 ecr 0,nop,wscale 7], length 0

As a last step, I did a capture on enp7s0:

# tcpdump -i enp7s0 -nn -s0 -v -l host 140.82.121.4                                                                                                                                              [16:59:11]
tcpdump: listening on enp7s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:59:18.536394 IP (tos 0x0, ttl 62, id 54222, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.28100 > 140.82.121.4.443: Flags [S], cksum 0x9d44 (incorrect -> 0x7438), seq 2184615323, win 64860, options [mss 1410,sackOK,TS val 2405638866 ecr 0,nop,wscale 7], length 0
16:59:18.998438 IP (tos 0x0, ttl 62, id 17537, offset 0, flags [DF], proto TCP (6), length 76)
    162.55.245.135.35716 > 140.82.121.4.443: Flags [FP.], cksum 0x9d54 (incorrect -> 0x583c), seq 3605376996:3605377020, ack 3707823094, win 2345, options [nop,nop,TS val 2405639328 ecr 2485731493], length 24
16:59:19.546323 IP (tos 0x0, ttl 62, id 54223, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.28100 > 140.82.121.4.443: Flags [S], cksum 0x9d44 (incorrect -> 0x7046), seq 2184615323, win 64860, options [mss 1410,sackOK,TS val 2405639876 ecr 0,nop,wscale 7], length 0
16:59:21.558342 IP (tos 0x0, ttl 62, id 54224, offset 0, flags [DF], proto TCP (6), length 60)
    162.55.245.135.28100 > 140.82.121.4.443: Flags [S], cksum 0x9d44 (incorrect -> 0x686a), seq 2184615323, win 64860, options [mss 1410,sackOK,TS val 2405641888 ecr 0,nop,wscale 7], length 0

So the traffic is passing through Flannel and to the internal bridge, to the external bridge and even the physical network adapter. All stating something like cksum X (incorrect -> Y) can you tell what that means?

Every 3rd to 5th attempt running wget "https://github.com/samdoran/demo-playbooks" while doing the capture, the download succeeds. I have repeated the test in this case to document the issue. I can resolve hostnames and ping public hosts at all times without a single lost packet. The wget command runs on the VM / k3s node and on the host every single time.

So packets leave the host correctly. What about the reply? Could you check too?

Could you check the same wget in all levels:

Are you only getting problems when the client is in the pod?

apiening commented 2 years ago

I did run tcpdump on the different levels (host and vm) and interfaces with the following command:

tcpdump -i <interface> -nn -s0 -v -l host <container-ip>

From my understanding, host does filter on the IP-Address no matter if it is the target or source. So this command should include the reply, since the target would be the container IP, correct?

I can confirm, that the command wget "https://github.com/samdoran/demo-playbooks" executes on the host and the vm with no noticeable delay and the HTML page from github.com is downloaded as expected. From within the busybox container it looks like this:

# kubectl exec -it busybox -- sh
/ # wget "https://github.com/samdoran/demo-playbooks"
Connecting to github.com (140.82.121.3:443)
wget: can't connect to remote host (140.82.121.3): Operation timed out

Every once in a while the wget command does work from within the container the same way like it does from the VM and host but most of the time it fails with a timeout. At the same time, I can ping github.com from within the container reliably with the same round-trip time as on the VM and host.

manuelbuil commented 2 years ago

I did run tcpdump on the different levels (host and vm) and interfaces with the following command:

tcpdump -i <interface> -nn -s0 -v -l host <container-ip>

From my understanding, host does filter on the IP-Address no matter if it is the target or source. So this command should include the reply, since the target would be the container IP, correct?

This is correct. But in your logs, I can only see the packets egressing the pod on the different levels. Does that mean you don't see reply packets? I want to understand if the problem is non-existing reply packets or reply packets disappearing at some point. It is possible the github is changing its source ip, so It might be better to filter on the port.

I can confirm, that the command wget "https://github.com/samdoran/demo-playbooks" executes on the host and the vm with no noticeable delay and the HTML page from github.com is downloaded as expected. From within the busybox container it looks like this:

# kubectl exec -it busybox -- sh
/ # wget "https://github.com/samdoran/demo-playbooks"
Connecting to github.com (140.82.121.3:443)
wget: can't connect to remote host (140.82.121.3): Operation timed out

Every once in a while the wget command does work from within the container the same way like it does from the VM and host but most of the time it fails with a timeout. At the same time, I can ping github.com from within the container reliably with the same round-trip time as on the VM and host.

Can you check that there are no other hosts in your network with the same IP of your pod?

apiening commented 2 years ago

Thank you @manuelbuil,

in fact, all I see is what I've posted in https://github.com/k3s-io/k3s/issues/5349#issuecomment-1086022219. But every now and then, when the command finishes successfully, I can see the reply packets in the tcpdump as well.

Most probably github.com does change its source ip, I do get different IPs returned when I resolve github.com with DNS as well. But since the tcpdump filter is set to the IP of the container, it should be the target address at the last hop. This is also shown in the event where the command does succeed after a few unsuccessful attempt.

To my knowledge there should be no other host sharing the same IP address of this pod. But I wanted to verify this, because I have another VM on this host running docker containers so this might be possible theoretically. I tried to check this with traceroute, but if there is a better way, please let me know.

On the VM:

# tracepath -4 10.42.0.31
 1?: [LOCALHOST]                      pmtu 1450
 1:  10.42.0.31                                            0.046ms reached
 1:  10.42.0.31                                            0.014ms reached
     Resume: pmtu 1450 hops 1 back 1

This is what I expected: There is just one hop when I access the container.

When I do this on the host:

# traceroute -4 10.42.0.31
traceroute to 10.42.0.31 (10.42.0.31), 30 hops max, 60 byte packets
 1  100.91.xx.yy (100.91.xx.yy)  0.486 ms  0.518 ms  0.510 ms
 2  core24.fsn1.hetzner.com (213.239.229.109)  0.239 ms  0.592 ms  0.635 ms
 3  core12.nbg1.hetzner.com (213.239.203.121)  2.937 ms core11.nbg1.hetzner.com (213.239.245.225)  2.606 ms core12.nbg1.hetzner.com (213.239.203.121)  2.975 ms
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
...

This seems also fine to me, because there is no route matching the taget address, so the default route is used.

I did the same check with tracepath -4 10.42.0.31 on the other VM running docker containers and the default gateway was traced the same way. So I think this proves that there is no IP address duplication issue at this point.

manuelbuil commented 2 years ago

So, just to recap, ~3/5 times when you try curling github.com (or any http request), you see packets leaving your pod and traversing all the networking stack correctly and egressing the physical NIC correctly (enp7s0). However, you don't see any reply in your physical NIC, right?

The 2/5 times when it works, do you see also the message "cksum 0x9d54 (incorrect -> 0x583c)" in the tcpdump tool? I don't think this is relevant but let's check just in case

apiening commented 2 years ago

Yes that's correct.

I decided to use another url for my tests because it makes it hard to capture the connection to github.com when the IP address changes and there are different IPs are responding.

I captured the full output of

tcpdump -nn -s0 -v -l host 193.99.144.85 > dump.txt

while the HTTP request

wget "https://www.heise.de" -O -

succeeded to a file dump.txt. I have attached the file, because it is quite long.

I can also see these cksum incorrect entries in the dump with the successful request.

One explanation I found is that this is normal because of checksum-offloading to the network card. But I can't confirm or negate this. But it doesn't seem to have an effect on this particular issue.

dump.txt

manuelbuil commented 2 years ago

Yes that's correct.

I decided to use another url for my tests because it makes it hard to capture the connection to github.com when the IP address changes and there are different IPs are responding.

I captured the full output of

tcpdump -nn -s0 -v -l host 193.99.144.85 > dump.txt

while the HTTP request

wget "https://www.heise.de" -O -

succeeded to a file dump.txt. I have attached the file, because it is quite long.

I can also see these cksum incorrect entries in the dump with the successful request.

One explanation I found is that this is normal because of checksum-offloading to the network card. But I can't confirm or negate this. But it doesn't seem to have an effect on this particular issue.

dump.txt

It is indeed strange. If sometimes http reply packets never reach enp7s0, it seems to me it might be more a problem in the network. Doing wget/curl from the VM works in 100% of the cases?

apiening commented 2 years ago

Yes it is strange, at least I'm facing the limits of my network analytic skills.

On the same hosts there are several VMs running, including mail servers, web-servers, monitoring-instances and the system is completely monitored with service checks incl. network latency checks and there are no known issues.

I've done a hundred successful wget requests from the VM where the k3s instance is running:

SUCCESS=0; for i in `seq 1 100`; do wget "https://www.heise.de" -O - &> /dev/null && SUCCESS=$((SUCCESS+1)); sleep 1; done; echo $SUCCESS
100

I added the sleep 1 just to not get banned by the web server.

This test finishes with the same results from the host.

I have no idea what I can try next. There are some posts on the internet complaining about the same or at least very similar issues, for example: https://discuss.kubernetes.io/t/kubernetes-pods-do-not-have-internet-access/5252 The Topic seems to be still unresolved, ending with a user stating that switching from flanel to calico sorted out his issues. I can't tell anything about it because I'm using docker-compose for quite a while but have not a lot experience with kubernetes and k3s and the different CNIs.

I may add, that this is a test node that I've set up to test one application in particular. After facing this issue for the first time I completely wiped the VM and created a fresh one to eliminate possible config issues. But the issue remains the same.

manuelbuil commented 2 years ago

Yes it is strange, at least I'm facing the limits of my network analytic skills.

On the same hosts there are several VMs running, including mail servers, web-servers, monitoring-instances and the system is completely monitored with service checks incl. network latency checks and there are no known issues.

I've done a hundred successful wget requests from the VM where the k3s instance is running:

SUCCESS=0; for i in `seq 1 100`; do wget "https://www.heise.de" -O - &> /dev/null && SUCCESS=$((SUCCESS+1)); sleep 1; done; echo $SUCCESS
100

I added the sleep 1 just to not get banned by the web server.

This test finishes with the same results from the host.

I have no idea what I can try next. There are some posts on the internet complaining about the same or at least very similar issues, for example: https://discuss.kubernetes.io/t/kubernetes-pods-do-not-have-internet-access/5252 The Topic seems to be still unresolved, ending with a user stating that switching from flanel to calico sorted out his issues. I can't tell anything about it because I'm using docker-compose for quite a while but have not a lot experience with kubernetes and k3s and the different CNIs.

I may add, that this is a test node that I've set up to test one application in particular. After facing this issue for the first time I completely wiped the VM and created a fresh one to eliminate possible config issues. But the issue remains the same.

There are several things which are very strange:

Let's try one extra thing: Create a busybox pod with hostNetwork: true. This way, the network namespace of the pod would be the same as the host. In other words, we are completely bypassing the CNI plugin. If this works, we can build the flannel network stack step by step and testing in each step (I can explain you how to do it on Monday ;) )

apiening commented 2 years ago

Thanks again @manuelbuil, that's a good plan.

I tried to bring up a pod with hostNetwork: true with the following command:

kubectl run busybox2 --image=alpine --overrides='{"kind":"Pod", "apiVersion":"v1", "spec": {"hostNetwork": true}}' --command -- sh -c 'echo Hello K3S! && sleep 3600'
kubectl exec -it busybox2 -- sh

I entered the pod/container and verified, that I do have in fact host networking. Then I did the same wget test with 100 tries:

SUCCESS=0; for i in `seq 1 100`; do wget "https://www.heise.de" -O - &> /dev/null && SUCCESS=$((SUCCESS+1)); sleep 1; done; echo $SUCCESS
100

So with hostNetwork: true all requests are passing without issues. The same way they do from the VM and the host.

So this issue must somehow be related to flannel the one way or the other. Maybe a configuration issue or a bug that happens only under specific circumstances.

manuelbuil commented 2 years ago

Ok. Second test, we will try to build a network namespaces and use flannel infrastructure. Pick two ip addresses in the same range as your cluster-cidr (I think in your case is it 10.164.12.0/26 but please verify with kubectl get nodes -o yaml | grep podCIDR). Pick two IP addresses which are not used, and then:

# Create network namespaces
sudo ip netns add ns1
sudo ip netns add ns2

# Create veth interfaces
sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
sudo ip link set ns1-namespaceIf up
sudo ip link set ns1-rootIf up
sudo ip link add ns2-namespaceIf type veth peer name ns2-rootIf
sudo ip link set ns2-namespaceIf up
sudo ip link set ns2-rootIf up

# Add interfaces in namespaces
sudo ip link set ns1-namespaceIf netns ns1
sudo ip link set ns2-namespaceIf netns ns2

# Make sure ipv4 forwarding works
cat /proc/sys/net/ipv4/conf/all/forwarding

# Add interfaces to bridge
sudo brctl addif cni0 ns1-rootIf
sudo brctl addif cni0 ns2-rootIf

# Add ip to the interfaces
sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf $IP_ADDRESS_YOU_PICKED/24
sudo ip netns exec ns2 ip addr add dev ns2-namespaceIf $IP2_ADDRESS_YOU_PICKED/24

# Add routes
sudo ip netns exec ns1 ip r add default via $IP_INTERFACE_CNI0
sudo ip netns exec ns2 ip r add default via $IP_INTERFACE_CNI0

# Ping should work
sudo ip netns exec ns2 ping $IP_ADDRESS_YOU_PICKED/24
sudo ip netns exec ns1 ping $IP2_ADDRESS_YOU_PICKED/24

# Ping to external should work too
sudo ip netns exec ns2 ping 8.8.8.8
sudo ip netns exec ns1 ping 8.8.8.8

Then try to run the http/https traffic from within the namespaces (sudo ip netns exec curl .....)

apiening commented 2 years ago

Thank you @manuelbuil!

I've verified and set the following variables:

export IP_ADDRESS_YOU_PICKED=10.42.0.101
export IP2_ADDRESS_YOU_PICKED=10.42.0.102
export IP_INTERFACE_CNI0=10.42.0.1

Then I executed all commands from your post in order. Everything went fine up to the Add routes section:

# Add routes
sudo ip netns exec ns1 ip r add default via $IP_INTERFACE_CNI0
Error: Nexthop has invalid gateway.

Can you tell what the error message means? Is this already an indication of the network issue that causes my problems?

manuelbuil commented 2 years ago

Thank you @manuelbuil!

I've verified and set the following variables:

export IP_ADDRESS_YOU_PICKED=10.42.0.101
export IP2_ADDRESS_YOU_PICKED=10.42.0.102
export IP_INTERFACE_CNI0=10.42.0.1

Then I executed all commands from your post in order. Everything went fine up to the Add routes section:

# Add routes
sudo ip netns exec ns1 ip r add default via $IP_INTERFACE_CNI0
Error: Nexthop has invalid gateway.

Can you tell what the error message means? Is this already an indication of the network issue that causes my problems?

You are right. I thought that when moving interfaces around, they will be up, but apparently, they are not. Execute these first:

sudo ip netns exec ns1 ip link set ns1-namespaceIf up
sudo ip netns exec ns2 ip link set ns2-namespaceIf up

That should create the routes sudo ip netns exec ns1 ip r so that it knows how to reach 10.42.0.1

apiening commented 2 years ago

After setting and verifying the routes, I entered the commands you stated under Ping should work. It didn't run in the first place, but after removing the subnet suffix /24 it did:

# sudo ip netns exec ns2 ping $IP_ADDRESS_YOU_PICKED
PING 10.42.0.101 (10.42.0.101) 56(84) bytes of data.
64 bytes from 10.42.0.101: icmp_seq=1 ttl=64 time=0.075 ms
64 bytes from 10.42.0.101: icmp_seq=2 ttl=64 time=0.028 ms
^C
--- 10.42.0.101 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.028/0.051/0.075/0.023 ms

# sudo ip netns exec ns1 ping $IP2_ADDRESS_YOU_PICKED
PING 10.42.0.102 (10.42.0.102) 56(84) bytes of data.
64 bytes from 10.42.0.102: icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from 10.42.0.102: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.42.0.102: icmp_seq=3 ttl=64 time=0.033 ms
^C
--- 10.42.0.102 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2031ms
rtt min/avg/max/mdev = 0.033/0.044/0.056/0.009 ms

This is correct so far, right? The pings to external (8.8.8.8) succeeded, too.

But I can't figure out how to execute the connection tests with wget from within this environment:

# sudo ip netns exec wget "https://www.heise.de" -O -
Cannot open network namespace "wget": No such file or directory

wget is installed on the VM (and curl as well) but it doesn't work anyway.

manuelbuil commented 2 years ago

After setting and verifying the routes, I entered the commands you stated under Ping should work. It didn't run in the first place, but after removing the subnet suffix /24 it did:

# sudo ip netns exec ns2 ping $IP_ADDRESS_YOU_PICKED
PING 10.42.0.101 (10.42.0.101) 56(84) bytes of data.
64 bytes from 10.42.0.101: icmp_seq=1 ttl=64 time=0.075 ms
64 bytes from 10.42.0.101: icmp_seq=2 ttl=64 time=0.028 ms
^C
--- 10.42.0.101 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.028/0.051/0.075/0.023 ms

# sudo ip netns exec ns1 ping $IP2_ADDRESS_YOU_PICKED
PING 10.42.0.102 (10.42.0.102) 56(84) bytes of data.
64 bytes from 10.42.0.102: icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from 10.42.0.102: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 10.42.0.102: icmp_seq=3 ttl=64 time=0.033 ms
^C
--- 10.42.0.102 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2031ms
rtt min/avg/max/mdev = 0.033/0.044/0.056/0.009 ms

This is correct so far, right? The pings to external (8.8.8.8) succeeded, too.

But I can't figure out how to execute the connection tests with wget from within this environment:

# sudo ip netns exec wget "https://www.heise.de" -O -
Cannot open network namespace "wget": No such file or directory

wget is installed on the VM (and curl as well) but it doesn't work anyway.

You forgot to tell what network namespace you want to use (ns1 or ns2) and your OS is thinking that wget is the name of the namespace. The command should be: sudo ip netns exec ns1 wget "https://www.heise.de" -O -

apiening commented 2 years ago

Ok, thank you.

Here is what I got:

# sudo ip netns exec ns1 wget "https://www.heise.de" -O -
--2022-04-13 07:53:20--  https://www.heise.de/
Resolving www.heise.de (www.heise.de)... failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘www.heise.de’

Same thing for ns2. Is there some setup regarding DNS required for the network namespaces? Since I can ping DNS servers (like 8.8.8.8) this looks like there is no nameserver defined inside the network namespace. On the VM DNS works fine.

apiening commented 2 years ago

When I enter the network namespace ns1 and check the /etc/resolv.conf

# ip netns exec ns1 bash
# grep -v "#" /etc/resolv.conf 

nameserver 127.0.0.53
options edns0 trust-ad

it shows the same result as from within the VM. That doesn't come as a surprise to me. But i can't ping the IP 127.0.0.53 from the network namespace ns1, which I think is the reason why DNS is not working.

The IP 127.0.0.53 is related to the way systemd-resolved works under Ubuntu 20.04.4 LTS. Is is required to somehow route this IP from the network namespace? Or is it possible to define another DNS server for the network namespace?

manuelbuil commented 2 years ago

That's weird. In any case, can you try curling the IP directly so that it does not need to resolve?

manuelbuil commented 2 years ago

When I enter the network namespace ns1 and check the /etc/resolv.conf

# ip netns exec ns1 bash
# grep -v "#" /etc/resolv.conf 

nameserver 127.0.0.53
options edns0 trust-ad

it shows the same result as from within the VM. That doesn't come as a surprise to me. But i can't ping the IP 127.0.0.53 from the network namespace ns1, which I think is the reason why DNS is not working.

The IP 127.0.0.53 is related to the way systemd-resolved works under Ubuntu 20.04.4 LTS. Is is required to somehow route this IP from the network namespace? Or is it possible to define another DNS server for the network namespace?

I could reproduce the problem and it is something related to systemd-resolved. If you stop that service and in /etc/resolv.conf change the nameserver from 127.0.0.53 to something different (e.g. 8.8.8.8), it should resolve correctly

apiening commented 2 years ago
# time sudo ip netns exec ns1 wget --no-check-certificate https://193.99.144.85 -O -
--2022-04-13 15:43:22--  https://193.99.144.85/
Connecting to 193.99.144.85:443... failed: Connection timed out.
Retrying.

--2022-04-13 15:45:34--  (try: 2)  https://193.99.144.85/
Connecting to 193.99.144.85:443... failed: Connection refused.

real    2m33.215s
user    0m0.003s
sys     0m0.004s

Same thing for ns2. If I repeat the command, every now and then it succeeds. The behaviour is exactly the same as from within the pod / container despite the DNS issue.

apiening commented 2 years ago

I could reproduce the problem and it is something related to systemd-resolved. If you stop that service and in /etc/resolv.conf change the nameserver from 127.0.0.53 to something different (e.g. 8.8.8.8), it should resolve correctly

Yes that works. However, the initial issue that the HTTPS request fails persists.

manuelbuil commented 2 years ago

What a strange issue...

Ok, let's try another approach to get rid of some components and make sure they are not introducing anything weird. First, remove the namespaces ip netns del ns1 and ip netns del ns2

Then:

# Create network namespace
sudo ip netns add ns1

# Create veth interfaces
sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
sudo ip link set ns1-namespaceIf up
sudo ip link set ns1-rootIf up

# Add interfaces in namespaces
sudo ip link set ns1-namespaceIf netns ns1

# Make sure ipv4 forwarding works
cat /proc/sys/net/ipv4/conf/all/forwarding
cat /proc/sys/net/ipv4/conf/eth0/forwarding

# Enable proxy_arp
sudo echo 1 > /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp

# Add ip to the interfaces
sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26

# Add namespace route
sudo ip netns exec ns1 ip link set ns1-namespaceIf up
sudo ip netns exec ns1 ip r add 169.254.1.1 dev ns1-namespaceIf
sudo ip netns exec ns1 ip r add default via 169.254.1.1 dev ns1-namespaceIf

# Set the routes
sudo ip r add 192.168.0.10/32 dev ns1-rootIf

## Access to the internet
sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
sudo ip netns exec ns1 ping 8.8.8.8

And if that works, try the curl / wget again please

apiening commented 2 years ago

The commands were executed without issues, here is a recap:

~# sudo ip netns add ns1
~# sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
~# sudo ip link set ns1-namespaceIf up
~# sudo ip link set ns1-rootIf up
~# sudo ip link set ns1-namespaceIf netns ns1
~# cat /proc/sys/net/ipv4/conf/all/forwarding
1
~# cat /proc/sys/net/ipv4/conf/eth0/forwarding
1
~# cat /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp
0
~# sudo echo 1 > /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp
~# cat /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp
1
~# sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
~# sudo ip netns exec ns1 ip link set ns1-namespaceIf up
~# sudo ip netns exec ns1 ip r add 169.254.1.1 dev ns1-namespaceIf
~# sudo ip netns exec ns1 ip r add default via 169.254.1.1 dev ns1-namespaceIf
~# sudo ip r add 192.168.0.10/32 dev ns1-rootIf
~# sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
~# sudo ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=58 time=88.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=58 time=5.11 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 5.114/46.706/88.299/41.592 ms

And the command

sudo ip netns exec ns1 wget --no-check-certificate https://193.99.144.85 -O -

succeeds every single time. I tried at least 20 times in a row. So the issue is not happening with this setup.

But what can we take away from this? What are the components we've left out?

manuelbuil commented 2 years ago

The commands were executed without issues, here is a recap:

~# sudo ip netns add ns1
~# sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
~# sudo ip link set ns1-namespaceIf up
~# sudo ip link set ns1-rootIf up
~# sudo ip link set ns1-namespaceIf netns ns1
~# cat /proc/sys/net/ipv4/conf/all/forwarding
1
~# cat /proc/sys/net/ipv4/conf/eth0/forwarding
1
~# cat /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp
0
~# sudo echo 1 > /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp
~# cat /proc/sys/net/ipv4/conf/ns1-rootIf/proxy_arp
1
~# sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
~# sudo ip netns exec ns1 ip link set ns1-namespaceIf up
~# sudo ip netns exec ns1 ip r add 169.254.1.1 dev ns1-namespaceIf
~# sudo ip netns exec ns1 ip r add default via 169.254.1.1 dev ns1-namespaceIf
~# sudo ip r add 192.168.0.10/32 dev ns1-rootIf
~# sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
~# sudo ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=58 time=88.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=58 time=5.11 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 5.114/46.706/88.299/41.592 ms

And the command

sudo ip netns exec ns1 wget --no-check-certificate https://193.99.144.85 -O -

succeeds every single time. I tried at least 20 times in a row. So the issue is not happening with this setup.

But what can we take away from this? What are the components we've left out?

There are three differences. 1 - In the first case, there is a bridge which does L2 forwarding. In this case, all is L3 2 - In the first case, flannel is taking care of the masquerade. In this case, we are creating those rules 3 - The IP range is different

Let's try a set-up an env which uses a bridge but does not use the flannel masquerading or the flannel ip range. First let's remove what you currently deployed:

sudo ip r del 192.168.0.10/32 dev ns1-rootIf
sudo iptables -t nat -D POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
sudo ip netns del ns1

Now, the new set-up:

sudo brctl addbr mybr
sudo ip addr add dev mybr 192.168.0.1/26
sudo ip link set mybr up
sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
sudo ip netns exec ns1 ip link set ns1-namespaceIf up
sudo ip netns exec ns1 ip r add default via 192.168.0.1 dev ns1-namespaceIf
sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
sudo ip netns exec ns1 ping 8.8.8.8

Then try the curl/wget

If that works, then we know that the problem is not the bridge but either the IP range or iptables. Probably the former. Could you provide the output of ip r too please?

apiening commented 2 years ago

The command

~# sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
Cannot open network namespace "ns1": No such file or directory

fails.

Should I do a

~# sudo ip netns add ns1

before this command? Is anything else required?

apiening commented 2 years ago

Here is the requested output:

~# ip r
default via 10.164.12.254 dev eth0 proto static 
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1 
10.164.12.0/24 dev eth0 proto kernel scope link src 10.164.12.6 
192.168.0.0/26 dev mybr proto kernel scope link src 192.168.0.1

This is after I've applied the three first commands from your post.

ceefour commented 2 years ago

I also have a "perhaps-similar-issue" with my k3s worker node in a VM in Contabo. This happens when doing a POST to gitlab.com but based on the issue, this will happen with any outgoing network access:

ERROR: Registering runner... failed                 runner=GR134894 status=couldn't execute POST against https://gitlab.com/api/v4/runners: Post "https://gitlab.com/api/v4/runners": dial tcp: i/o timeout
PANIC: Failed to register the runner. 

using k3s v1.22.7.

this bug only happens with pods inside that node in Contabo VM. However, if I reschedule the pod to the other node which is in Hetzner's instance, all network is working fine.

manuelbuil commented 2 years ago

The command

~# sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
Cannot open network namespace "ns1": No such file or directory

fails.

Should I do a

~# sudo ip netns add ns1

before this command? Is anything else required?

Yes, sorry:

sudo ip netns add ns1
sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
sudo ip link set ns1-namespaceIf up
sudo ip link set ns1-rootIf up
sudo ip link set ns1-namespaceIf netns ns1
sudo brctl addbr mybr
sudo ip addr add dev mybr 192.168.0.1/26
sudo ip link set mybr up
sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
sudo ip netns exec ns1 ip link set ns1-namespaceIf up
sudo ip netns exec ns1 ip r add default via 192.168.0.1 dev ns1-namespaceIf
sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
sudo ip netns exec ns1 ping 8.8.8.8
apiening commented 2 years ago

this bug only happens with pods inside that node in Contabo VM. However, if I reschedule the pod to the other node which is in Hetzner's instance, all network is working fine.

Can you try to run a plain alpine image and try to do a wget / curl request from there manually, just to make sure that this is not related to a specific pod?

apiening commented 2 years ago

Thanks again @manuelbuil,

I was able to execute the commands you listed without any errors or warnings:

~# sudo ip netns add ns1
~# sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
~# sudo ip link set ns1-namespaceIf up
~# sudo ip link set ns1-rootIf up
~# sudo ip link set ns1-namespaceIf netns ns1
~# sudo brctl addbr mybr
~# sudo ip addr add dev mybr 192.168.0.1/26
~# sudo ip link set mybr up
~# sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
~# sudo ip netns exec ns1 ip link set ns1-namespaceIf up
~# sudo ip netns exec ns1 ip r add default via 192.168.0.1 dev ns1-namespaceIf
~# sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
~# sudo ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

But the ping commands from ns1 hangs forever, at least no timeout occured within 15 minutes of waiting.

The fact that ping does not work is very different from what I've experienced from within the pods. I had issues with HTTPs request, while ping was working fine all the time. I think there is some other issue preventing ping to work now. But I don't have any idea what it may be.

ceefour commented 2 years ago

@apiening I sidestepped the problem

  1. Previously it was a multicloud cluster. Hetzner has the control plane and worker, and Contabo is worker only.
  2. Now I created a standalone cluster plane in Contabo where the control plane is also the worker.

Current configuration works. So something is wrong with flannel if it's joining a cluster in different cloud.

manuelbuil commented 2 years ago

sudo ip addr add dev mybr 192.168.0.1/26 sudo ip link set mybr up sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26 sudo ip netns exec ns1 ip link set ns1-namespaceIf up sudo ip netns exec ns1 ip r add default via 192.168.0.1 dev ns1-namespaceIf sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE

Thanks again @manuelbuil,

I was able to execute the commands you listed without any errors or warnings:

~# sudo ip netns add ns1
~# sudo ip link add ns1-namespaceIf type veth peer name ns1-rootIf
~# sudo ip link set ns1-namespaceIf up
~# sudo ip link set ns1-rootIf up
~# sudo ip link set ns1-namespaceIf netns ns1
~# sudo brctl addbr mybr
~# sudo ip addr add dev mybr 192.168.0.1/26
~# sudo ip link set mybr up
~# sudo ip netns exec ns1 ip addr add dev ns1-namespaceIf 192.168.0.10/26
~# sudo ip netns exec ns1 ip link set ns1-namespaceIf up
~# sudo ip netns exec ns1 ip r add default via 192.168.0.1 dev ns1-namespaceIf
~# sudo iptables -t nat -A POSTROUTING -o eth0 -s 192.168.0.10 -j MASQUERADE
~# sudo ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

But the ping commands from ns1 hangs forever, at least no timeout occured within 15 minutes of waiting.

The fact that ping does not work is very different from what I've experienced from within the pods. I had issues with HTTPs request, while ping was working fine all the time. I think there is some other issue preventing ping to work now. But I don't have any idea what it may be.

We forgot to add the interface to the bridge :facepalm: sudo brctl addif mybr ns1-rootIf

apiening commented 2 years ago

We forgot to add the interface to the bridge 🤦 sudo brctl addif mybr ns1-rootIf

You're right, it is working fine now:

~# sudo brctl addif mybr ns1-rootIf
~# sudo ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=58 time=5.11 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=58 time=5.10 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=58 time=5.13 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=58 time=5.16 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=58 time=5.17 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=58 time=5.09 ms
^C
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 5.085/5.124/5.167/0.029 ms

The command

sudo ip netns exec ns1 wget --no-check-certificate https://193.99.144.85 -O -

Works perfectly fine as well!

apiening commented 2 years ago

Current configuration works. So something is wrong with flannel if it's joining a cluster in different cloud.

This sounds like a different issue, at least I can't see the correlation to what I observe. I'm using a very basic single node setup with k3s on a clean Ubuntu 20.04 VM. But once (or I should say if) we figured out what caused this issue where Ping works from inside the Pod while HTTPs does not, it may be interesting to check if this applies to your issue as well.

ceefour commented 2 years ago

Also in my previous case

apiening commented 2 years ago
  • however the pod doesn't have internet access

Can you please check specifically if

does work or not?

If you pull a basic alpine image to do this tests, it would be independent from the other pods.

manuelbuil commented 2 years ago

We forgot to add the interface to the bridge facepalm sudo brctl addif mybr ns1-rootIf

You're right, it is working fine now:

~# sudo brctl addif mybr ns1-rootIf
~# sudo ip netns exec ns1 ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=58 time=5.11 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=58 time=5.10 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=58 time=5.13 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=58 time=5.16 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=58 time=5.17 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=58 time=5.09 ms
^C
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 5.085/5.124/5.167/0.029 ms

The command

sudo ip netns exec ns1 wget --no-check-certificate https://193.99.144.85 -O -

Works perfectly fine as well!

Ok. Would you be able to deploy k3s but using other ip range? For example: cluster-cidr: 192.168.0.0/16?

apiening commented 2 years ago

Ok. Would you be able to deploy k3s but using other ip range? For example: cluster-cidr: 192.168.0.0/16?

You mean on another VM as a test? How can I select a different cluster-cidr when I deploy k3s? Or could I change the cluster-cidr of the existing deployment?

Thinking of that: Changing the cluster-cidr and nothing else would only have an effect in case I'm facing a routing or address duplication issue with the currently active cluster-cidr, right? Isn't the ping test a proof that the communication generally works and there is no IP-Address issue? We've also excluded DNS by using direct IP-Addresses. That's why I'm a little bit sceptic that changing the cluster-cidr would change that much. But I can do it anyways. I just don't know how to do it.

manuelbuil commented 2 years ago

Ok. Would you be able to deploy k3s but using other ip range? For example: cluster-cidr: 192.168.0.0/16?

You mean on another VM as a test? How can I select a different cluster-cidr when I deploy k3s? Or could I change the cluster-cidr of the existing deployment?

Thinking of that: Changing the cluster-cidr and nothing else would only have an effect in case I'm facing a routing or address duplication issue with the currently active cluster-cidr, right? Isn't the ping test a proof that the communication generally works and there is no IP-Address issue? We've also excluded DNS by using direct IP-Addresses. That's why I'm a little bit sceptic that changing the cluster-cidr would change that much. But I can do it anyways. I just don't know how to do it.

Yes, in another VM as a test. Before deploying k3s, create the directory /etc/rancher/k3s/ and place there a config.yaml file with the content cluster-cidr: 192.168.0.0/16. Then deploy k3s and it should use that cidr

apiening commented 2 years ago

I have created a new VM based on Ubuntu 20.04 and created the config file:

~# cat /etc/rancher/k3s/config.yaml 
cluster-cidr: 192.168.0.0/16

Then I've deployed k3s with curl -sfL https://get.k3s.io | sh - and checked the CIDR:

~# kubectl describe node | grep PodCIDR
PodCIDR:                      192.168.0.0/24
PodCIDRs:                     192.168.0.0/24

I've created a test pod with kubectl run busybox --image=alpine --command -- sh -c 'echo Hello K3S! && sleep 3600' and did the wget test:

~# kubectl exec -ti busybox -- sh
/ # wget --no-check-certificate https://193.99.144.85 -O -
Connecting to 193.99.144.85 (193.99.144.85:443)
wget: can't connect to remote host (193.99.144.85): Connection refused

So same thing as with the previous install (and the one before). I can ping local and external hosts without issues.

ashissharma97 commented 2 years ago

Hi @apiening, Were you able to fix this issue? Please let us know if you are able to. I'm also facing the same issue.

Thanks

apiening commented 2 years ago

Hi @ashissharma97, unfortunately even after trying a lot of things (including another clean install) the issue persists. Every progress on this issue will be documented here.