rancher-sandbox / rancher-desktop

Container Management and Kubernetes on the Desktop
https://rancherdesktop.io
Apache License 2.0
6.01k stars 283 forks source link

k8s unable to pull images through VPN #1490

Closed micsor-norlys closed 6 months ago

micsor-norlys commented 2 years ago

Rancher Desktop Version

1.0.1

Rancher Desktop K8s Version

1.22.6

Which container runtime are you using?

containerd (nerdctl)

What operating system are you using?

macOS

Operating System / Build Version

macOS Monterey Version 12.2

What CPU architecture are you using?

x64

Linux only: what package format did you use to install Rancher Desktop?

No response

Windows User Only

Cisco AnyConnect 4.10.04071

Actual Behavior

When creating a pod with a image placed on a private registry that is to be accessed through a VPN the pull of the image fails (dial tcp: lookup...)

I'm thinking that it has something with the DNS resolver. This has been an issue before but have been working without problems on v0.7.1. I am able to resolve the name for the registry inside the VM, and inside containers running on k8s, but k8s itself seams not to be able to resolve the name of the registry.

It is possible for me to pull the image using nerdctl and if i pull it with -n k8s.io the pod is also able to start.

Images are pulled from public registries with no issue as normal

Steps to Reproduce

Result

Failed to pull image "docker-registry.int.some-domain.net/platform/mysql:3": rpc error: code = Unknown desc = failed to pull and unpack image "docker-registry.int.some-domain.net/platform/mysql:3": failed to resolve reference "docker-registry.int.some-domain.net/platform/mysql:3": failed to do request: Head "https://docker-registry.int.some-domain.net/v2/platform/mysql/manifests/3": dial tcp: lookup docker-registry.int.some-domain.net: Try again

Please note that the actual domain of the registry is not on some-domain.net I have manually edited the entry

Expected Behavior

That images are pulled and containers are started ... as on version 0.7.1

Additional Information

As suggested by Jan Dubois, I'm adding the output of ip a and ip r from inside the VM on both v1.0.1 and v0.7.1 (for comparison) RD-debug-v1_0_1.txt RD-debug-v0_7_1.txt

I have been rolling back and forth between v0.7.1 and v1.0.1 (doing clean installs) without disconnecting vpn or changing anything else on my system, and I'm able to reproduce the same result every time.

jandubois commented 2 years ago

As suggested by Jan Dubois, I'm adding the output of ip a and ip r from inside the VM on both v1.0.1 and v0.7.1 (for comparison)

This looks fine and does not explain the difference. (It does show that the bridged interface doesn't get an IPv4 address, so explains why the override was necessary in 0.7.1)

thehejik commented 2 years ago

The registry address seems to be odd docker-registry.int..net, esp. the two dots...

micsor-norlys commented 2 years ago

@thehejik: ahh.. sorry 'but that, my redaction of the actual domain has become mangled. The actual log entry contains a valid host name like docker-registry.int.some-domain.net. Have updated the entry to use some-domain.net .. but again please note that this is not the acutal domain.

micsor-norlys commented 2 years ago

@jandubois and I did some debugging last night, transcript can be found here: https://rancher-users.slack.com/archives/C0200L1N1MM/p1643883900317819

dhruvbaldawa commented 2 years ago

I am facing this same issue, however, my registry is hosted in my local k8s.

I am able to curl it from the host, running dig inside lima VM I can resolve the host correctly but if I try to curl the registry from the VM, it is not able to resolve it.

For now, I have added a static entry in /etc/hosts and everything is working fine. I started facing this issue after upgrading to 1.0.1

micsor-norlys commented 2 years ago

I can confirm, that adding entries to the /etc/hosts file also mitigates the issue for me.

kingrichard2005 commented 2 years ago

Hello, I am experiencing similar issues as micsor-norlys at Rancher Desktop 1.0.1 trying to both pull images from and login to my companies private img registries thru a Cisco AnyConnect VPN, I am on Windows 10 but the behavior is the same. Pulling from the public docker registry, e.g. hello-world, busybox, works but when pulling and/or trying to login to our private image registry I get these errors

When trying to login to any of my work's private registries Login did not succeed, error: Error response from daemon: Get "private-docker-registry.company.com": dial tcp: lookup private-docker-registry.company.com on 192.168.99.33:53: cannot unmarshal DNS message

When pulling from my work's private registries Failed to pull image "private-docker-registry.company.com/webapp:0d5daa22": rpc error: code = Unknown desc = Error response from daemon: Get "https://private-docker-registry.company.com": dial tcp: lookup private-docker-registry.company.com on 192.168.99.33:53: cannot unmarshal DNS message

This issue, in general, and the described workarounds involving manual hosts file edits sound similar to this old (circa 2016) Docker Desktop issue that was resolved in earlier versions of that platform, could that same general fix for DNS resolution be adopted and applied to Rancher Desktop?

Nino-K commented 6 months ago

I'm going to close this issue since our later releases should fix this issue. Please try 1.13.1 and feel free to re-open if you encounter this again.