Closed A-Shevchenko closed 2 years ago
@A-Shevchenko Thanks for filing the issue. Just for curiosity, are you performing this process inside or outside of WSL distro?
Also, could you provide some logs from RD? You can get all logs navigating to Troubleshooting
tab and clicking on Show Logs
-- feel free to pack them into a zip file.
@evertonlperes I'm running it outside of WSL. Logs are attached, let me know if you want me to enable debug mode 1557.zip .
@A-Shevchenko thanks for sharing the logs. Unfortunately, there are some known networking/DNS issues with WSL 2 which can cause unpredictable network behavior. We have implemented a process that will update the DNS configuration on the WSL. This process selects the most suitable DNS servers based on the interface metrics (the most preferred interface). Are you able to test out our new release which will be out shortly to see if that addresses this issue?
Meanwhile, can you please forwards us the output from the followings:
/etc/resolve.conf
on the WSL (when the issue is occurring)Feel free to redact any confidential or corporate-specific information.
@A-Shevchenko can you please upgrade to 1.1.1 to see if it eliminates the issue for you?
@Nino-K yes, sorry, due to the war in my country I temporarily lost access to PC where I could do that. Now it's restored, I'll try that in next few days
@A-Shevchenko sorry to hear that. We are here to support you, take care!
@Nino-K unfortunately, still the same:
> docker run --rm tutum/dnsutils nslookup api.github.com
;; connection timed out; no servers could be reached
resolv.conf
:
$ cat /etc/resolv.conf
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 172.17.166.129
route table:
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.17.166.129 0.0.0.0 UG 0 0 0 eth0
10.42.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.17.166.128 0.0.0.0 255.255.255.240 U 0 0 0 eth0
I believe I know why this happens. I'm using a DNS server of my router (192.168.0.1). This is the output of the dig
command from one of my WSL distros (Ubuntu):
dig api.github.com @192.168.0.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50053
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 8, ADDITIONAL: 8
;; QUESTION SECTION:
;api.github.com. IN A
;; ANSWER SECTION:
api.github.com. 60 IN A 140.82.121.6
;; AUTHORITY SECTION:
github.com. 15300 IN NS dns1.p08.nsone.net.
github.com. 15300 IN NS dns4.p08.nsone.net.
github.com. 15300 IN NS dns2.p08.nsone.net.
github.com. 15300 IN NS ns-421.awsdns-52.com.
github.com. 15300 IN NS ns-520.awsdns-01.net.
github.com. 15300 IN NS dns3.p08.nsone.net.
github.com. 15300 IN NS ns-1707.awsdns-21.co.uk.
github.com. 15300 IN NS ns-1283.awsdns-32.org.
;; ADDITIONAL SECTION:
dns1.p08.nsone.net. 35329 IN A 198.51.44.8
dns4.p08.nsone.net. 35914 IN A 198.51.45.72
dns2.p08.nsone.net. 35329 IN A 198.51.45.8
ns-421.awsdns-52.com. 116055 IN A 205.251.193.165
ns-520.awsdns-01.net. 131224 IN A 205.251.194.8
dns3.p08.nsone.net. 35329 IN A 198.51.44.72
ns-1707.awsdns-21.co.uk. 121520 IN A 205.251.198.171
ns-1283.awsdns-32.org. 122657 IN A 205.251.197.3
;; Query time: 30 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Tue Apr 26 08:42:36 CEST 2022
;; MSG SIZE rcvd: 399
Note, that there is just one A entry returned and a couple of NS entries. The additional section contains A entries for each NSes.
Now, let's doing the same from within a container:
(for instance kubectl run iputils --rm -it --image arunvelsriram/utils -- /bin/sh
)
; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> api.github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2087
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 25238b63bed3747a (echoed)
;; QUESTION SECTION:
;api.github.com. IN A
;; ANSWER SECTION:
ns-421.awsdns-52.com. 5 IN A 205.251.193.165
ns-1707.awsdns-21.co.uk. 5 IN A 205.251.198.171
api.github.com. 5 IN A 140.82.121.5
dns4.p08.nsone.net. 5 IN A 198.51.45.72
dns2.p08.nsone.net. 5 IN A 198.51.45.8
ns-1283.awsdns-32.org. 5 IN A 205.251.197.3
dns1.p08.nsone.net. 5 IN A 198.51.44.8
ns-520.awsdns-01.net. 5 IN A 205.251.194.8
dns3.p08.nsone.net. 5 IN A 198.51.44.72
;; Query time: 4 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Tue Apr 26 06:41:52 UTC 2022
;; MSG SIZE rcvd: 369
See, the output is wrong! It only contains A entries - for both the actual domain (api.github.com) and NSes. The A entries are randomly sorted. Sometimes the correct one comes to the top - then the DNS resolution works OK.
I'm actually not sure why this is happening or which component is responsible for this (CoreDNS?). A simple workaround is to use a DNS server that doesn't return the additional section, for instance the google's or cloudflare ones. But with the current changes I'm not sure how to achieve this.
... ok, one possible workaround it to set the DNS server of your host network driver. This will keep the DNS resolution in wsl the same (relying on the dnsmasq-generate
script to generate resolv.conf
and dnsmaq conf as usual). After doing so:
dig api.github.com
; <<>> DiG 9.11.3-1ubuntu1.14-Ubuntu <<>> api.github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45465
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: b0aa497cb72d8feb (echoed)
;; QUESTION SECTION:
;api.github.com. IN A
;; ANSWER SECTION:
api.github.com. 5 IN A 140.82.121.5
;; Query time: 36 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Tue Apr 26 07:25:04 UTC 2022
;; MSG SIZE rcvd: 85
Unfortunately, this has negative impact on the DNS resolution performance. The latency of a public DNS (e.g. 8.8.8.8) is definitely worse than the router's DNS. Sure, I can move this DNS setting to the router itself (to use 8.8.8.8 as the authority, rather than the one I have at the moment), but it will still have negative impact on the DNS resolution for all devices on the LAN in case of the cache-miss.
@Nino-K doesn't work for me. The wsl.log:
2022-04-29T06:38:19.135Z: Launching background process host-resolver vsock host.
2022-04-29T06:38:29.207Z: Background process host-resolver vsock host exited with status 1 signal null
2022-04-29T06:38:29.207Z: Background process host-resolver vsock host will restart.
The host-resolver.log:
Error: Listen, could not determine VM GUID: could not find vsock-peer process on any hyper-v VM(s)
Usage:
host-resolver vsock-host [flags]
Flags:
-c, --built-in-hosts stringToString List of built-in CNAMEs to IPv4, IPv6 or IPv4-mapped IPv6 in host.rancherdesktop.io=111.111.111.111 format. (default [])
-h, --help help for vsock-host
-6, --ipv6 Enable IPv6 address family.
-s, --upstream-servers stringArray List of IP addresses for upstream DNS servers.
The vsock-peer is not running in the distro:
~ # /bin/rc-status
Runlevel: default
rancher-desktop-guestagent [ started 00:17:35 (0) ]
crond [ unsupervised ]
host-resolver [ failed ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed/wanted
docker [ unsupervised ]
cri-dockerd [ unsupervised ]
Dynamic Runlevel: manual
k3s [ started 00:17:23 (0) ]
host-resolver [ failed ]
local [ started ]
@Nino-K - after changing the log file name to something else, the actual issue with vsock peer startup is:
* supervise-daemon: failed to exec `/mnt/c/Work/Playground/rancher-desktop/resources/linux/internal/host-resolver': No such file or directory
@Nino-K - got it working. Somehow the host-resolver executable was downloaded wrong. I downloaded it manually, stored it to the resources and not the vsock peer is able to start OK. This is the output:
Error: Listen, could not determine VM GUID: could not find vsock-peer process on any hyper-v VM(s)
Usage:
host-resolver vsock-host [flags]
Flags:
-c, --built-in-hosts stringToString List of built-in CNAMEs to IPv4, IPv6 or IPv4-mapped IPv6 in host.rancherdesktop.io=111.111.111.111 format. (default [])
-h, --help help for vsock-host
-6, --ipv6 Enable IPv6 address family.
-s, --upstream-servers stringArray List of IP addresses for upstream DNS servers.
time="2022-04-29T11:24:14+02:00" level=info msg="successfully estabilished a handshake with a peer: c59a5f3b-6695-43c9-a61a-629cb618e88c"
time="2022-04-29T11:24:14+02:00" level=warning msg="failed to detect system DNS, falling back to [8.8.8.8 1.1.1.1]" error="open /etc/resolv.conf: The system cannot find the path specified."
time="2022-04-29T11:24:14+02:00" level=info msg="Started vsock-host srv &{udp:<nil> tcp:0xc0000e46c0}"
The missing /etc/resolv.conf
confuses me a little bit, because it must exist in the distro. It is created just shortly before the host-resolver peer service starts.
Nevertheless, the DNS resolution works OK now. Kudos ;)
The missing
/etc/resolv.conf
confuses me a little bit, because it must exist in the distro. It is created just shortly before the host-resolver peer service starts.
@vladonemo it is the vsock-host
process that is looking for the /etc/resolv.conf
on the windows that's is why you see the error, I agree it is confusing and I'm trying to clean up the underlying code to eliminate this kind of misleading logs. One main reason for this confusion was that both vscok-host
and vsock-peer
were both writing to the same log file as you already figured out. I had a PR for this issue but somehow it missed our current release. I will go ahead and close mine and will use yours and thank you for your contribution. :)
Also, did you try this feature with our latest release? I got a bit confused where you mentioned
Somehow the host-resolver executable was downloaded wrong.
Many thanks
@Nino-K
Also, did you try this feature with our latest release?
I actually built and ran it from the source code. Will clean the tree and try again when I'm back at my PC
Thank you for the explanation.
ok, started fresh and it all works as expected. Nice one ;)
I will go ahead and close this issue, please feel free to reopen it if needed.
Rancher Desktop Version
1.0.1
Rancher Desktop K8s Version
1.22.6
Which container runtime are you using?
moby (docker cli)
What operating system are you using?
Windows
Operating System / Build Version
Windows 10 Pro 1909
What CPU architecture are you using?
x64
Linux only: what package format did you use to install Rancher Desktop?
No response
Windows User Only
No response
Actual Behavior
DNS resolution inside containers randomly not working. First time it happened a week ago, but after restart it was fine. Then it was fine for few days, but today I can't get it working even after few restarts.
Steps to Reproduce
Result
connection timed out; no servers could be reached
Expected Behavior
Successful resolution like:
Additional Information
No response