microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.47k stars 822 forks source link

WSL no internet connection / DNS issues #11693

Open cyberjj999 opened 5 months ago

cyberjj999 commented 5 months ago

Windows Version

Microsoft Windows [Version 10.0.22621.3737]

WSL Version

2.2.4.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.153.1-microsoft-standard-WSL2

Distro Version

No response

Other Software

No response

Repro Steps

This indicate a clear network problem.

Expected Behavior

No issues with network problem - ping works and pip install <package> should work.

Actual Behavior

Clear Network/Internet Connection Problem:

My host machine (Windows 11) has no internet issues at all.

What I've Tried

  1. Disable windows firewall entirely shutdown WSL and ping google.com again: doesn't work

  2. Ran the following command to flush my dns on windows:

    netsh winsock reset 
    netsh int ip reset all
    netsh winhttp reset proxy
    ipconfig /flushdns

    then restarted my computer: doesn't work

  3. Updated my /etc/resolv.conf and /etc/wsl.conf to put nameserver of 8.8.8.8 and 8.8.8.4 and even make the /etc/resolve.conf immutable... and it doesn't work.

    sudo rm /etc/resolv.conf
    sudo bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf'
    sudo bash -c 'echo "[network]" > /etc/wsl.conf'
    sudo bash -c 'echo "generateResolvConf = false" >> /etc/wsl.conf'
    sudo chattr +i /etc/resolv.conf
  4. Disabled "fast start-up" option in Power Options then restarted my comp... still doesn't work

  5. Chnaged from my company WiFi to my personal mobile hotspot - doesn't work

Suspected Reasons

  1. Change of network (VPN?) but disabling VPN doesn't yield a meaningful difference

  2. Windows Update (including Quality Updates)

Somehow I have WSL update automatically with an old kernel version though my WSL Ubuntu is installed from Microsoft store? enter image description here

But my WSL version seems to be okay

WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2

Appreciate Any Help

Diagnostic Logs

Added WSL Logs

WslLogs-2024-06-16_22-35-16.zip

1MLightyears commented 2 months ago

@1MLightyears thanks. it looks like you attached a WslLogs zip, do you have a WslNetworkingLogs zip generated by the collect-networking-logs.ps1?

If you still encounter issues with collect-networking-logs.ps1, please try the wpr commands I shared

Hi @CatalinFetoiu , the problem of collect-networking-logs.ps1 remains, but the good news is that I troubleshooted it myself and now it works well. I'm thinking of making a PR to the WSL repo fixing this.


Here is the log of a successful running the fixed collect-networking-logs.ps1: WslNetworkingLogs-2024-09-06_14-18-10.zip

CatalinFetoiu commented 2 months ago

@1MLightyears great to hear the script worked, please feel free to open a PR with the change you made. thanks!

1MLightyears commented 1 month ago

@CatalinFetoiu Sorry to disturb you but is there any update on this issue? The wsl networking logs has been attached. Thank you!

torgeros commented 1 month ago

I was able to solve mine by setting

[wsl2]
dnsTunneling=false

in %UserProfile%\.wslconfig.

CatalinFetoiu commented 1 month ago

@1MLightyears thanks for your patience. I looked at WslNetworkingLogs-2024-09-06_14-18-10.zip

DNS requests for google.com are sent to 127.0.0.1, port 53 this is unexpected, because 10.255.255.254 is configured as DNS server in /etc/resolv.conf (this is the DNS proxy that is used as part of DNS tunneling), so we expect DNS queries to be sent to 10.255.255.254

how are you reproducing the issue? (e.g. are you running ping google.com?) do you have additional DNS configurations done in Linux that are setting up 127.0.0.1 as DNS server?

thanks

1MLightyears commented 1 month ago

@CatalinFetoiu Yes, I reproduced the issue with a ping www.google.com. Also, the nslookup can resolve the domain name correctly:

# nslookup www.google.com
Server:         10.255.255.254
Address:        10.255.255.254#53

Non-authoritative answer:
Name:   www.google.com
Address: 142.250.70.196
Name:   www.google.com
Address: 2404:6800:4015:800::2004

But ping just doesn't resolve it. This issue trouble me a lot as I need to run an apt update and it also raises Temporary failure resolving 'us.archive.ubuntu.com'.

1MLightyears commented 1 month ago

I was able to solve mine by setting

[wsl2]
dnsTunneling=false

in %UserProfile%\.wslconfig.

@torgeros Sadly I've tried this before and it doesn't work. The problem is that I don't know whether it's a problem of wsl or a problem about Linux configuration...

CatalinFetoiu commented 1 month ago

@1MLightyears thanks for following up. Could you please collect and share the following strace outputs? those should give us a hint on why ping uses the wrong DNS server (127.0.0.1 instead of 10.255.255.254)

strace -f ping google.com strace -f nslookup google.com

shigenobuokamoto commented 1 month ago
$ sudo systemctl --now disable systemd-resolved

is not this it?

1MLightyears commented 1 month ago

@1MLightyears thanks for following up. Could you please collect and share the following strace outputs? those should give us a hint on why ping uses the wrong DNS server (127.0.0.1 instead of 10.255.255.254)

strace -f ping google.com strace -f nslookup google.com

The stderr output of these two commands are as follows: strace_ping.txt strace_nslookup.txt

@shigenobuokamoto Thank you for your reply! Sadly after it's disabled, the ping still gives Temporary failure in name resolution. I've double checked that systemd-resolved is inactive.

CatalinFetoiu commented 1 month ago

@1MLightyears thanks for sending the strace logs

from strace_ping, there are failures to open /etc/resolv.conf (and other DNS related files) with error permission denied, so ping seems to fall back to using 127.0.0.1 as DNS server, which will not work

it's not immediately clear why this happens

to narrow down the problem, can you please share the following? 1) does dig @10.255.255.254 google.com work (does it return an IP address for google.com) ? 2) is there a difference when you run "sudo ping google.com" vs "ping google.com" ? 3) what is the output of ls -l /etc/resolv.conf?

newfstatat(AT_FDCWD, "/etc/nsswitch.conf", 0x7ffe385ff100, 0) = -1 EACCES (Permission denied) newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied) newfstatat(AT_FDCWD, "/etc/resolv.conf", 0x7ffe385ff220, 0) = -1 EACCES (Permission denied) openat(AT_FDCWD, "/etc/host.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied) futex(0x7f6aaad1132c, FUTEX_WAKE_PRIVATE, 2147483647) = 0 openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

1MLightyears commented 1 month ago

@CatalinFetoiu Updates:

does dig @10.255.255.254 google.com work (does it return an IP address for google.com) ? Here is the output:


# dig @10.255.255.254 www.google.com
;; communications error to 10.255.255.254#53: timed out
;; communications error to 10.255.255.254#53: timed out
;; communications error to 10.255.255.254#53: timed out

; <<>> DiG 9.18.12-0ubuntu0.22.04.2-Ubuntu <<>> @10.255.255.254 www.google.com ; (1 server found) ;; global options: +cmd ;; no servers could be reached

I also tried a plain `dig www.google.com` and it gave me:
```bash
# dig  www.google.com

; <<>> DiG 9.18.12-0ubuntu0.22.04.2-Ubuntu <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62020
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.google.com.                        IN      A

;; ANSWER SECTION:
www.google.com.         0       IN      A       142.250.70.132

;; Query time: 9 msec
;; SERVER: 172.17.64.1#53(172.17.64.1) (UDP)
;; WHEN: Wed Oct 09 20:25:46 AEDT 2024
;; MSG SIZE  rcvd: 62

is there a difference when you run "sudo ping google.com" vs "ping google.com" ?

There is no difference as I'm using the root user running all the commands(and all the commands in wsl above)

what is the output of ls -l /etc/resolv.conf? It's 777:

# ls -l /etc/resolv.conf
lrwxrwxrwx 1 lightyears root 20 Oct  9 20:24 /etc/resolv.conf -> /mnt/wsl/resolv.conf

I double checked it and ensured that the content is:

nameserver 172.17.64.1

But when I check the mode of /mnt/wsl/resolv.conf, and the other Permission denied file /etc/host.conf, it shows that:

# ls -l /mnt/wsl/resolv.conf
-rw-r--r-- 1 lightyears root 197 Oct  9 20:24 /mnt/wsl/resolv.conf
# ls -l /etc/host.conf
-rw-r--r-- 1 lightyears root 92 Oct 15  2021 /etc/host.conf

Nevertheless, 644 should be enough to read it...I have no idea why it's denied

shigenobuokamoto commented 1 month ago

@1MLightyears

[FAILED] Failed to start Dispatcher daemon for systemd-networkd. See 'systemctl status networkd-dispatcher.service' for details.

it looks like the error is occurring when trying to run systemd-networkd. WSL will set up the network-related stuff, so stop it.

systemctl --now disable systemd-networkd networkd-dispatcher
CatalinFetoiu commented 1 month ago

@1MLightyears could you please collect the following additional diagnostics, to help understanding what's causing the permission denied issue? Please run ping google one more time before collecting the below

dmesg &> dmesg.log and share this file journalctl &> journalctl.log and share this file output of ls -lah /etc output of ls -lah /mnt output of ls -lah /mnt/wsl mount &> mount.log and share this file content of /etc/passwd

thanks

1MLightyears commented 1 month ago

@1MLightyears

[FAILED] Failed to start Dispatcher daemon for systemd-networkd. See 'systemctl status networkd-dispatcher.service' for details.

it looks like the error is occurring when trying to run systemd-networkd. WSL will set up the network-related stuff, so stop it.

systemctl --now disable systemd-networkd networkd-dispatcher

@shigenobuokamoto There is still a failure when pinging...it seems that WSL do will set up the retwork-related stuff, but the native Linux part doesn't work well... I double checked the journal and noticed that networkd-dispatcher is also raising an error: pam_systemd(login:session): Failed to create session: The name org.freedesktop.login1 was not provided by any .service files And I noticed that when I enabled these services, there was a Created symlink /etc/systemd/system/dbus-org.freedesktop.network1.service → /lib/systemd/system/systemd-networkd.service. It looks like that a thirdparty dbus is redirecting systemd-networkd to itself, though I have never done it, and it failed to start. Does it look normal?

1MLightyears commented 1 month ago

@1MLightyears could you please collect the following additional diagnostics, to help understanding what's causing the permission denied issue? Please run ping google one more time before collecting the below

dmesg &> dmesg.log and share this file journalctl &> journalctl.log and share this file output of ls -lah /etc output of ls -lah /mnt output of ls -lah /mnt/wsl mount &> mount.log and share this file content of /etc/passwd

thanks

Thank you @CatalinFetoiu , here is the logs.zip

CatalinFetoiu commented 1 month ago

@1MLightyears thanks the /etc/passwd output shows the uid of root is 999. this needs to be 0 instead root:x:999:0::/root:/bin/bash

if you replace this line in /etc/passwd with the line below, then run "wsl --shutdown" and restart WSL, the permission issue should be fixed root:x:0:0:root:/root:/bin/bash

cc @OneBlue

shigenobuokamoto commented 1 month ago

@1MLightyears

And I noticed that when I enabled these services, there was a Created symlink /etc/systemd/system/dbus-org.freedesktop.network1.service → /lib/systemd/system/systemd-networkd.service.

this symlink is created when enable systemd-networkd.

to verify, i installed WSL + Ubuntu. systemd-networkd is disabled.

$ systemctl list-unit-files | grep systemd-network
systemd-network-generator.service            disabled        enabled
systemd-networkd-wait-online.service         disabled        enabled
systemd-networkd-wait-online@.service        disabled        enabled
systemd-networkd.service                     disabled        enabled
systemd-networkd.socket                      disabled        enabled

i tried enabling systemd-networkd and it did not break anything. there may be no problem if enable it.

the issue seems to be that for some reason systemd-resolved is unable to recognize /etc/resolv.conf.

would you mind trying some?

  1. restart systemd-resolved
    $ systemctl restart systemd-resolved
  2. add a DNS server to systemd-resolved and restart

/etc/systemd/resolved.conf.d/dnsserver.conf

[Resolve]
DNS=1.1.1.1
$ systemctl restart systemd-resolved
  1. stopping systemd-resolved
    $ systemctl stop systemd-resolved