rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
5.92k stars 281 forks source link

DNS over UDP fails 50% of the time in containers on MacOS #6376

Open codyps opened 8 months ago

codyps commented 8 months ago

Actual Behavior

Runs something like dig +short raw.githubusercontent.com +notcp, it fails with a timeout every other request. This also effects things like curl, causing them to fail to resolve 50% of the time. When using curl, I observe every-other request for the same url returning an error.

Example curl output on error:

# curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh
curl: (6) Could not resolve host: raw.githubusercontent.com

Note: the next execution of the same command succeeded.

dig output:

root@b505e598c74d:/# dig +short raw.githubusercontent.com +notcp +time=1 +tries=1
185.199.111.133
185.199.110.133
185.199.109.133
185.199.108.133
root@b505e598c74d:/# dig +short raw.githubusercontent.com +notcp +time=1 +tries=1
;; communications error to 192.168.5.3#53: timed out
;; no servers could be reached

root@b505e598c74d:/# 

Overriding the dns server so it isn't the /etc/resolv.conf nameserver 192.168.5.3, but instead something like 8.8.8.8 seems to resolve the issue.

I've been testing in a docker run -it --rm debian:bookworm docker image, but the same appears to occur in the lima vm directly:

After entering the vm with LIMA_HOME=~/Library/Application\ Support/rancher-desktop/lima limactl shell 0

lima-rancher-desktop:~$ while true; do curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh -o /dev/null && echo OK 
|| echo BAD; done
OK
curl: (6) Could not resolve host: raw.githubusercontent.com
BAD
OK
curl: (6) Could not resolve host: raw.githubusercontent.com
BAD
OK
^C
lima-rancher-desktop:~$ 

Steps to Reproduce

  1. Enter the lima vm and note that curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh fails half of the time.

Result

DNS resolution failures when using the default DNS host around 50% of the time.

Expected Behavior

DNS resolution succeeds

Additional Information

This is on a corporate laptop with a bunch of network filtering/vpn items, and it's possible this issue is triggered by one of them.

In MacOS Settings under "Network" -> "VPN & Filters" -> "Filters & Proxies", these items are present:

Name Type Status
Falcon Content Filter 🟒 Enabled
Reveal Agent Network Configuration Profile Content Filter 🟒 Enabled
Microsoft Defender Content Filter Content Filter 🟑 Enabled
GlobalProtectEn Content Filter 🟑 Enabled
Cisco Anyconnect Socket Filter Content Filter πŸ”΄ Disabled
GlobalProtectDn DNS Proxy πŸ”΄ Disabled
Cisco Anyconnect Socket Filter DNS Proxy 🟒 Enabled
GlobalProtectDo Transparent Proxy πŸ”΄ Disabled
Cisco Anyconnect Socket Filter Transparent Proxy 🟒 Enabled

Disabling the 2 enabled "Cisco Anyconnect Socket Filter" items does not change the behavior observed. The other enabled items are greyed out and can't be disabled.

Entirely possible this is some weird bug in one of these (though macos dns working seems to indicate some interaction of issues)

Rancher Desktop Version

1.12.2

Rancher Desktop K8s Version

1.28.5

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

14.2.1 (23C71)

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

None

Windows User Only

No response

codyps commented 8 months ago

This appears to no longer be reproducible for me, possibly due to system changes pushed by corporate (iow: DNS now resolves properly inside docker containers running on rancher desktop).

New state of "VPNs & Filters":

image

A change to enable Microsoft Defender more fully seems to have gone out.

Feel free to close for now if this is not reproducible for others (I expect there's some funky configuration/software that is causing it). I'll report back if it re-occurs.

Let me know if there's any additional info I can capture when the issue occurs (or while it isn't occuring) that would be useful for debugging it.

sloppycoder commented 3 months ago

I'm experiencing the same problem on M1 Macbook Air, macOS Sonoma 14.5. Rancher Desktop version 1.13.1

how can I troubleshoot this?

btw, after running rdctl shell, there's no dig command in the VM. there's nslookup though.