rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.82k stars 273 forks source link

No network traffic between pods #6314

Open jackmtpt opened 7 months ago

jackmtpt commented 7 months ago

Actual Behavior

On a fresh install of Rancher Desktop, no pods can connect to anything. CoreDNS can't connect to the WSL VM to resolve anything, no pods can resolve anything via CoreDNS. Ping doesn't work between any pods, nor does any other pod-pod traffic.

What does work is port-forwarding from my Windows host machine to a pod as well as all traffic from the WSL VM to anything. Pods can also ping the WSL VM's IP addresses, both on the cni0 interface (10.42.0.1) and eth0 (172.29.131.171), and send DNS queries to the eth0 address.

When doing a tcpdump in the WSL VM (tcpdump -ni any "udp port 53") I see packets on the vethxxx interface coming from various pods but it never reaches the cni0 interface or the veth of the destination pod.

Steps to Reproduce

New install of rancher desktop. Open a shell in any pod (e.g. traefik):

nslookup google.com 172.29.131.171 # works nslookup google.com 10.42.0.6 # (the IP of the coredns pod) fails

ping 10.42.0.6 # fails

Get a shell inside the WSL VM: wsl -d rancher-desktop

ping 10.42.0.6 # works

Result

No pod-pod traffic works at all.

Expected Behavior

With no network policies in place (the default), every pod should be able to ping/connect to every other pod on ports that they're listening on.

Additional Information

No response

Rancher Desktop Version

1.11.1

Rancher Desktop K8s Version

1.28.5

Which container engine are you using?

moby (docker cli)

What operating system are you using?

Windows

Operating System / Build Version

Windows 11 22H2

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

Rancher networking tunnel is DISABLED. I've tried turning it on, but that makes things worse - rancher desktop is unable to start (stuck at the 'updating kubeconfig' step).

jackmtpt commented 7 months ago

Doing echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables makes normal pod-pod connections work, although pod --> service IP doesn't of course (since those rely on iptables NAT), so this issue seems to be somewhere in the iptables config...

Larswa commented 7 months ago

Yeah. Im seeing the same thing. Rancher Desktop 1.12.2 and 1.12.3 ... .also tried a few earlier versions back to 1.10
Running on windows 11 22h2. Tried with both wsl --update and wsl --update --pre-release. Seeing the issue both with moby and containerD

With moby I was seeing this in the k3s.log

time="2024-02-03T10:57:04Z" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/5105ccf96a758d93d68502b288561b3d248f5b2ae14fbdc3b49db0c2a1549c96/resolv.conf as [nameserver 10.43.0.10 search kube-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
time="2024-02-03T10:57:04Z" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/4cb7e77dc1a405d510805236adc4478db2c121d7b45e14a9abeabae944cd6d34/resolv.conf as [nameserver 10.43.0.10 search kube-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
time="2024-02-03T10:57:04Z" level=error msg="Error adding pod kube-system/metrics-server-67c658944b-xvb7m to network {docker 5105ccf96a758d93d68502b288561b3d248f5b2ae14fbdc3b49db0c2a1549c96}:/proc/2013/ns/net:flannel:cbr0: plugin type=\"flannel\" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory"
time="2024-02-03T10:57:04Z" level=error msg="Error adding pod kube-system/traefik-f4564c4f4-vb6v4 to network {docker 4cb7e77dc1a405d510805236adc4478db2c121d7b45e14a9abeabae944cd6d34}:/proc/1998/ns/net:flannel:cbr0: plugin type=\"flannel\" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory"
time="2024-02-03T10:57:04Z" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/72ca4a960940fd0313e6c7a2e0e8f99eeaa4ed45fe42427dca9f21a9961af6ba/resolv.conf as [nameserver 172.31.83.131]"
time="2024-02-03T10:57:04Z" level=error msg="Error adding pod kube-system/coredns-6799fbcd5-nf8dz to network {docker 72ca4a960940fd0313e6c7a2e0e8f99eeaa4ed45fe42427dca9f21a9961af6ba}:/proc/2171/ns/net:flannel:cbr0: plugin type=\"flannel\" failed (add): loadFlannelSubnetEnv failed: open /run/flannel/subnet.env: no such file or directory"
time="2024-02-03T10:57:04Z" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/7bfffde1521da39d92cf433d94bc32cf6e55b40ae6a2f3fbe5d1fea6ce4a2ba7/resolv.conf as [nameserver 10.43.0.10 search kube-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
time="2024-02-03T10:57:04Z" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/1ffcc411b1b5f2fa0380221f441e4fe36ad79a16974091f431bf56ca4400e280/resolv.conf as [nameserver 10.43.0.10 search flux-system.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"

I also had this on my laptop, and figured it might be the wsl stack. Did a refresh install and its working again on laptop. Then seeing the same thing on my developer tower pc. Did a reset there but still seeing the issue there.

0Styless commented 6 months ago

Can also confirm with wsl version:

WSL-Version: 2.0.14.0
Kernelversion: 5.15.133.1-1
WSLg-Version: 1.0.59
MSRDC-Version: 1.2.4677
Direct3D-Version: 1.611.1-81528511
DXCore-Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-Version: 10.0.22621.819

and Rancher-Desktop version 1.12.3