microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.25k stars 812 forks source link

WSL2 DNS stops working #4285

Closed jordansissel closed 2 years ago

jordansissel commented 5 years ago

Please fill out the below information:

Microsoft Windows [Version 10.0.18932.1000]

> bash
% host google.com
;; connection timed out; no servers could be reached

/etc/resolv.conf:

% cat /etc/resolv.conf
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 172.19.224.1

To my knowledge, I didn't change anything. This has happened a few times, and rebooting fixes it. Sometimes just doing wsl --shutdown is sufficient to fix it. It correlates with my workstation going to sleep and resuming later with DNS in WSL2 not working.

DarthJahus commented 4 years ago

@qhaas To avoid changes reverting back, you need to remove the file and recreate it. Because in the default state, it's a symbolic link to a (in reality) read-only or protected file.

blackliner commented 4 years ago

Also having regular problems with DNS and WSL2, 2004. Only wsl --shutdown helps...

ScottyAU commented 4 years ago

I believe this is Windows Firewall related (for me at least). Looking in the Windows Firewall Logs I found the following:

2020-06-02 18:36:22 DROP UDP 172.29.75.219 172.29.64.1 53860 53 79 - - - - - - - RECEIVE 2020-06-02 18:36:27 DROP UDP 172.29.75.219 172.29.64.1 53860 53 79 - - - - - - - RECEIVE 2020-06-02 18:36:32 DROP UDP 172.29.75.219 172.29.64.1 53860 53 79 - - - - - - - RECEIVE

.219 is WSL2 and .1 is the Windows Host.

If I do the following DNS to the host IP immediately starts working (Admin powershell prompt):

Set-NetFirewallProfile -DisabledInterfaceAliases "vEthernet (WSL)"

I have default Windows Firewall settings I believe. I can see two dynamic rules that get created - but they don't appear to be working:

HNS Container Networking - DNS (UDP-In) - E8E88949-89CC-4317-B2A0-B47EDAA08189 - 0 HNS Container Networking - ICS DNS (TCP-In) - E8E88949-89CC-4317-B2A0-B47EDAA08189 - 0

blackliner commented 4 years ago

Thanks for pointing out the firewalls roll in that. For me it helps if i switch the firewall off and on again.

saisandeepvaddi commented 4 years ago

Changing nameserver did not work for me.

$:/mnt/c/WINDOWS/system32$ cat /etc/resolv.conf nameserver 8.8.8.8 nameserver 8.8.4.4 $:/mnt/c/WINDOWS/system32$ ping google.com ping: google.com: Temporary failure in name resolution

I don't have any VPN running now. But I have NordVPN installed. No third-party anti-virus or Firewall solutions installed.

[Edit]: I forgot I had Checkpoint VPN and uninstalling it worked. Related #4246

nue-melexis commented 4 years ago

Hi I got some weird behaviour related to this. When I am not connected to VPN everything works, but when I connect to vpn (OpenVPN) I get strange dns answers inside wsl, but only for domains related to VPN.

e.g. nslookup <MY_PUBLIC_COMPANY_WEBSITE> (alternative every vpn only name) Answer:

Server: 172.17.176.1 Address: 172.17.176.1#53

Non-authoritative answer: Name: Address: IP Name: NS1 Address: NS1IP Name: NS2 Address: NS2IP ...

All the nameservers of the domain are also mentioned there (same for host and dig command) That means I can get connected to any of these ips.

Any idea where this is coming from?

Best Regards, Norman

What-is-water93 commented 4 years ago

@qhaas To avoid changes reverting back, you need to remove the file and recreate it. Because in the default state, it's a symbolic link to a (in reality) read-only or protected file.

~~How do I remove it? the "rm" command tells me no such file or directory If I open the /etc folder in explorer I can see a resolv.conf but i can neither open, rename or delete it. and using "sudo touch /etc/resolv.conf doesntwork, also "no such file or directory"~~

Managed to delete and recreate files with vi.

iridian-ks commented 4 years ago

This is almost certainly a bug. There's some parity between WSL 1 and 2. I am on a fresh image of Windows v2004. My company rolls out a VPN to us and preconfigures firewalls, which I can't change. This shouldn't really be relevant as I think the real issue is WSL. I need to use our internal DNS servers to get to our internal servers etc etc. For those that don't need this then you'd just use 9.9.9.9 or your preferred DNS provider. For me, I'd use 10.x.x.x.

If you look at a WSL1 box you see something like this in /etc/resolv.conf

# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 10.7.xx.xx
nameserver 10.9.xx.xx
nameserver 192.168.1.1
search **lan**

WSL1 is working great because the resolv conf is getting provisioned perfectly.

If we look at WSL2 we see this:

# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 172.17.128.1

The custom nameservers are completely missing. WSL1 is correct and WSL2 is broken. Seems that this behavior was somehow not ported over or simply broken?

The fix is to do what others said above, which I won't repeat. For those who don't need corporate DNS then you'd set to 9.9.9.9 and those who need corporate then make sure you set to your corporate DNS. Obviously, this sucks because 1. provisioning manually generally sucks and 2. if corporate DNS IP addresses update then your VM breaks.

This has nothing to do with virtual network adapters (at all).

I think it's clear what WSL2 needs to fix though.

Here's some more proof that if resolv.conf was correct then things would work: (WSL2 VM)

# working
$ host -t a google.com  10.7.xx.xx
Using domain server:
Name: 10.7.xx.xx
Address: 10.7.xx.xx#53
Aliases:

google.com has address 172.217.5.206

# broken (probably a firewall?)
host -t a google.com 9.9.9.9
Using domain server:
Name: 9.9.9.9
Address: 9.9.9.9#53
Aliases:

Host google.com not found: 3(NXDOMAIN)
JohnButare commented 4 years ago

@iridian-ks your fix worked for me. In WSL 2 non-FQDN searches where broken for me., it failed intermittently and only sometimes queried my local name server. If I fully qualified them the name it worked fine. Somehow it even broke name lookup using nslookup specifying my name server directly. Many of the other posts pointed to using a custom name server, which I did. However local resolution without a suffix was still broken. I was putting the search in before the nameserver. When I put the search term in after the name server it worked fine. Go figure.

The suffix did not work:

# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:`
# [network]
# generateResolvConf = false
search my.suffix
nameserver 192.168.1.1

The suffix DID work:

# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
nameserver 192.168.1.1
search my.suffix
codejunky commented 4 years ago

@non-static thanks man this literally saved me a lot of headaches!!

toryalsip commented 4 years ago

I'm really hoping a solution is found. I'm unfortunately having mixed results with the workarounds myself. On one machine the workaround does help, on another not at all. This is really frustrating and blocking me from adopting WSL 2 in my current workflow.

elswerky commented 4 years ago

My workaround:

  1. Create a file: /etc/wsl.conf.
  2. Put the following lines in the file
[network]
generateResolvConf = false
  1. In a cmd window, run wsl --shutdown
  2. Restart WSL2
  3. Create a file: /etc/resolv.conf. If it exists, replace existing one with this new file.
  4. Put the following lines in the file
nameserver 8.8.8.8
  1. Repeat step 3 and 4. You will see git working fine now.

thanks it worked for me

toryalsip commented 4 years ago

An update, I was able to get the workaround for Cisco AnyConnect VPN to work after I reinstalled Docker Desktop, so I wonder if there is some connection there. Strange, and it would not surprise me if the workaround stops working randomly at some point in time in the future.

abhijeetchopra commented 4 years ago

Replacing the SSL VPN client from Cisco AnyConnect to OpenConnect worked for a colleague.

mikerod-sd commented 4 years ago

+1 to this issue here. For me the issue only happens when I connect to the VPN's we have available at work (Pulse and F5). when not connected to the VPN wsl2 works fine

PreetSangha commented 4 years ago

I got a clue in the following: Issue 4844 (I also removed docker WSL but I don't think that that made any diff)

  1. Created a /etc/wsl.conf containing

    
    [network]
    generateResolvConf = false
  2. I exited wsl and then issued a wsl --shutdown

  3. entered wsl force deleted the /etc/resov.conf to stop it being a symbolic link

    sudo rm -fd /etc/resov.conf

  4. I exited wsl and then issued a wsl --shutdown

  5. entered wsl and recreated a resolv.conf

    sudo  touch /etc/resov.conf
    sudo  nano /etc/resov.conf
  1. Added the following line to the /etc/resolv.conf

nameserver 8.8.8.8

  1. I exited wsl and then issued a wsl --shutdown

  2. Entered wsl and everything was now working

neojp commented 4 years ago

The latest WSL2 update won't let me reach LAN devices like 192.168.1.1 anymore. This used to be possible until this month.

epieddy commented 4 years ago

As @ScottyAU said :

I have default Windows Firewall settings I believe. I can see two dynamic rules that get created - but they don't appear to be working:

HNS Container Networking - DNS (UDP-In) - E8E88949-89CC-4317-B2A0-B47EDAA08189 - 0 HNS Container Networking - ICS DNS (TCP-In) - E8E88949-89CC-4317-B2A0-B47EDAA08189 - 0

I had exactly the same thing : the same firewall rules, the same bug and all was working when I disabled the firewall on the vEthernet (WSL) interface as suggested by ScottyAU

But disabling the firewall was not an option for me. Turns out it was a GPO from my company which was responsible : *[./Vendor/MSFT/Firewall/MdmStore//AllowLocalPolicyMerge](https://docs.microsoft.com/en-us/windows/client-management/mdm/firewall-csp#allowlocalpolicymerge)**

If this value is false, firewall rules from the local store are ignored and not enforced.

The firewall rules created automatically by WSL where simply ignored because of the GPO even though they where displayed in the firewall configuration and nothing was visually indicating that they where ignored.

As soon as the GPO was changed to allowlocalpolicymerge=true, all went back to normal without having to disable the firewall.

Hope it helps some of you.

ScottyAU commented 4 years ago

@epieddy that makes perfect sense - and we have the same GPO set so users can't make local modifications to their firewall policy. I do wonder if this is something Microsoft might consider addressing so that WSL2 will still work without having to open up the firewall to local modifications by users (for enterprise).

archonic commented 4 years ago

The workarounds here have never worked for me. This bug makes WSL2 completely unusable for development (or anything really). I'm on build 19041.

qhaas commented 4 years ago

@qhaas To avoid changes reverting back, you need to remove the file and recreate it. Because in the default state, it's a symbolic link to a (in reality) read-only or protected file.

Thanks, that did the trick, albeit this feels kinda like duct-tape.

On a side note, if you use your router as the DNS server, hostnames to Linux machines on your network resolve.

ionutbaban commented 4 years ago

But disabling the firewall was not an option for me. Turns out it was a GPO from my company which was responsible : *[./Vendor/MSFT/Firewall/MdmStore//AllowLocalPolicyMerge](https://docs.microsoft.com/en-us/windows/client-management/mdm/firewall-csp#allowlocalpolicymerge)** As soon as the GPO was changed to allowlocalpolicymerge=true, all went back to normal without having to disable the firewall.

@epieddy how exactly do you change this GPO? I'm in the same situation, I can't change the Firewall as this is managed by my company. Firewall is done through McAfee. I can't find ./Vendor/MSFT/Firewall/MdmStore/*/AllowLocalPolicyMerge in Local Group Policy Editor

epieddy commented 4 years ago

But disabling the firewall was not an option for me. Turns out it was a GPO from my company which was responsible : *[./Vendor/MSFT/Firewall/MdmStore//AllowLocalPolicyMerge](https://docs.microsoft.com/en-us/windows/client-management/mdm/firewall-csp#allowlocalpolicymerge)** As soon as the GPO was changed to allowlocalpolicymerge=true, all went back to normal without having to disable the firewall.

@epieddy how exactly do you change this GPO? I'm in the same situation, I can't change the Firewall as this is managed by my company. Firewall is done through McAfee. I can't find ./Vendor/MSFT/Firewall/MdmStore/*/AllowLocalPolicyMerge in Local Group Policy Editor

@ionutbaban My bad, I'm fairly new to this MDM / GPO thing. ./Vendor/MSFT/Firewall/MdmStore/*/AllowLocalPolicyMerge is not a GPO but a one of the settings in the Firewall Configuration Service Provider

epieddy commented 4 years ago

@ionutbaban I found the settings in the Local Group Policy Editor (Sorry for the french screenshot) :

Capture

ionutbaban commented 4 years ago

@ionutbaban I found the settings in the Local Group Policy Editor (Sorry for the french screenshot) Merci for the screen shot @epieddy I've applied the change, but still I still have issues with the DNS:

$ host -t A google.com 1.1.1.1
;; connection timed out; no servers could be reached

I've just updated my personal windows system and here wsl2 is working fine. I think the issue might be related to some work related setting or VPN

epieddy commented 4 years ago

Merci for the screen shot @epieddy I've applied the change, but still I still have issues with the DNS:

$ host -t A google.com 1.1.1.1
;; connection timed out; no servers could be reached

I've just updated my personal windows system and here wsl2 is working fine. I think the issue might be related to some work related setting or VPN

@ionutbaban The firewall mess is only about making the default WSL2 setup work in a professional context where your firewall config is managed by your IT Admins . By default WSL2 is using a virtual network interface to forward network traffic in and out of the WSL2. Some firewall rules are automatically created to allow the Windows local DNS resolver to receive query from the WSL2 virtual interface. And all the stuff about GPO/CSP is about allowing these rules to work.

In your exemple, you can't reach 1.1.1.1. It's a different problem. In the firewall problem I had, if I used any DNS server other than the default one, It worked fine.

ionutbaban commented 4 years ago

@epieddy I've just figured out my issue is caused by McAfee Endpoint Security Firewall. If I deactivate it, DNS works fine and I can run the host or apt update commands without issues

epieddy commented 4 years ago

So to sum up, for anyone tumbling here by google search : if inside a WSL2 with a default configuration, the DNS does not work, one of the possible explanation is that some firewall is blocking WSL2 from reaching the windows local dns resolver. Please check :

If none of the above works, your problem is probably somewhere else. Otherwise, it's very likely a firewall issue, keep looking.

Gskartwii commented 4 years ago

The resolv.conf workaround unfortunately prevents the typical method of finding the DISPLAY for X from working. However, I've found this can be placed in .bashrc/.zshrc instead:

export DISPLAY=`netsh.exe interface ip show ipaddresses "vEthernet (WSL)" | head -n 2 - | tail -n 1 | awk '{ print $2; }'`:0.0
mangelozzi commented 4 years ago

So to sum up, for anyone tumbling here by google search : if inside a WSL2 with a default configuration, the DNS does not work, one of the possible explanation is that some firewall is blocking WSL2 from reaching the windows local dns resolver. Please check :

  • do you have an external firewall (McAfee, Avast, etc..) ? Do it work if you disable your external firewall ?
  • if you do not have an external firewall, is your PC managed by IT Admins in your company ? Some GPO/CSP might prevent the firewall rules created by WSL2 to work
  • if you completely disable the windows firewall, does it work ?

If none of the above works, your problem is probably somewhere else. Otherwise, it's very likely a firewall issue, keep looking.

I have the same problem, and I use a normal windows setup (I have not changed any firewall settings), I have no anti virus (other than windows defender with default settings), connected to a router.

archonic commented 4 years ago

This is apparently how several default configuration errors are manifesting. For me, no editing of resolv.conf or disabling firewalls was working but disabling swap worked: https://github.com/microsoft/WSL/issues/5437#issuecomment-647161596

peter-jerry-ye commented 4 years ago

It seems to me that we need to use the nameserver generated by wsl to use xserver as described in #4106. Any solution so that both dns and xserver works?

Gskartwii commented 4 years ago

It seems to me that we need to use the nameserver generated by wsl to use xserver as described in #4106. Any solution so that both dns and xserver works?

@peter-jerry-ye You can use my solution from above to set the display variable, even when resolv.conf is modified:

export DISPLAY=`netsh.exe interface ip show ipaddresses "vEthernet (WSL)" | head -n 2 - | tail -n 1 | awk '{ print $2; }'`:0.0
lackovic commented 4 years ago

It seems to me that we need to use the nameserver generated by wsl to use xserver as described in #4106. Any solution so that both dns and xserver works?

This solution works for me on ArchWSL, with the default configuration and VcXsrv X Server authorized in Windows firewall.

peter-jerry-ye commented 4 years ago

This solution works for me on ArchWSL, with the default configuration and VcXsrv X Server authorized in Windows firewall.

I don't think it will work if the Ethernet adapter vEthernet (WSL) doesn't come first

peter-jerry-ye commented 4 years ago
export DISPLAY=`netsh.exe interface ip show ipaddresses "vEthernet (WSL)" | head -n 2 - | tail -n 1 | awk '{ print $2; }'`:0.0

It works. Thank you.

fhajji commented 4 years ago
export DISPLAY=`netsh.exe interface ip show ipaddresses "vEthernet (WSL)" | head -n 2 - | tail -n 1 | awk '{ print $2; }'`:0.0

To me, this works if I replace vEthernet (WSL) with vEthernet (External Switch).

chrisconlan commented 4 years ago

My /etc/resolv.conf file was randomly filled with ^@^@^@^@^@^@^@^@^@^@ symbols today, and it broke all DNS resolution. I set up the Cloudfare nameserver with nameserver 1.1.1.1 and everything is fine now... strange.

gnomeria commented 4 years ago

By using generateResolvConf = false, I cannot touch or nano to create the file even with root. It just says the file does not exist, which I want to create. Any suggestions?

feralfenrir commented 4 years ago

@gnomeria if you run ls on /etc/resolv.conf, you'll see that its a symlink to /run/resolvconf/resolv.conf. I was able to get around this by deleting the symlink and then adding a new file at /etc/resolv.conf

aubreyzulu commented 4 years ago

for me the temporary fix that worked is by adding the following line to /etc/resolv.conf file

edyu commented 4 years ago

I'm running Build 20201.rs_prerelease.200822-1922 and this happened to me a couple of times. I tried the workarounds but it didn't work even after I restarted my host machine. Eventually I found that if I don't mess with resolve.conf, a reboot would fix things until it happens again. I don't use any other firewall other than the default windows defender.

MikaelUmaN commented 4 years ago

I'm in a professional setting where IT admins manage security policies and nameresolution is handled by an internal nameserver.

Pointing to that nameserver from e.g. a docker container works fine and nameresolution works.

But from WSL2 it's just impossible. I have to maintain a list of IPs to use... which is annoying to say the least...

Dhruval360 commented 4 years ago

I'm running Build 20201.rs_prerelease.200822-1922 and this happened to me a couple of times. I tried the workarounds but it didn't work even after I restarted my host machine. Eventually I found that if I don't mess with resolve.conf, a reboot would fix things until it happens again. I don't use any other firewall other than the default windows defender.

I am also on the same windows build and a reboot doesn't work for me. I have tried all the workarounds here and nothing is working for me. Please help. I believe it has something to do with the following, but have no idea how to fix it:

Ethernet adapter vEthernet (WSL):

Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . :

I got this by running ipconfig on CMD

Dhruval360 commented 4 years ago

I'm running Build 20201.rs_prerelease.200822-1922 and this happened to me a couple of times. I tried the workarounds but it didn't work even after I restarted my host machine. Eventually I found that if I don't mess with resolve.conf, a reboot would fix things until it happens again. I don't use any other firewall other than the default windows defender.

I am also on the same windows build and a reboot doesn't work for me. I have tried all the workarounds here and nothing is working for me. Please help. I believe it has something to do with the following, but have no idea how to fix it:

Ethernet adapter vEthernet (WSL):

Media State . . . . . . . . . . . : Media disconnected Connection-specific DNS Suffix . :

I got this by running ipconfig on CMD

I followed this https://github.com/microsoft/WSL/issues/5821#issuecomment-684092757 and it works now!! Hope it helps

leonheess commented 4 years ago

This needs to be fixed ASAP

aubreyzulu commented 4 years ago

Hey, the solution that worked for me was to uninstall ubuntu 20.04, seems it has got a few issues. I installed ubuntu 18.04 it works like magic, when I ping githuh.com it's faster.

juburr commented 4 years ago

For newcomers to this thread, try pinging a public IP address first to make sure that DNS is actually the culprit here (ping 8.8.8.8). I found myself at this thread because I was receiving error messages about DNS failing to resolve, but it turns out I had actually lost complete external internet connectivity (yet another issue with WSL...). If you're doing local development, this may not be apparent at first because networking between the Windows host & WSL still works great (VS Code, Docker Desktop, https://localhost/my-app/, etc). In my case, it turns out that WSL loses internet connectivity if Windows enters sleep mode (https://github.com/microsoft/WSL/issues/4992). Until that issue is resolved, the only solution that seems to work for me is rebooting Windows. As a result, I'd also recommend turning off sleep mode when the laptop is plugged in via the power settings menu. If the problem is actually DNS, then follow these steps instead: https://github.com/microsoft/WSL/issues/4285#issuecomment-522201021

kobenauf commented 4 years ago

Perhaps there should be a file like resolv.local or resolv.wsl that gets merged with whatever gets generated for wsl.conf so we don't have to choose between permanently disabling the generation process or having to "fix" the generated file every time.

Ideally there would be a mechanism that lets the user indicate whether this is a merge of resolv.conf values or a full replacement.

joret-open commented 3 years ago

I still have issues with DNS. Sometimes it works, sometimes it does not

I followed the instructions to create the resolv.conf and deactivate the creation of the file automatically. After using the solution the first time, it all started working, but after restarting the computer, it broke again.

This is the content of my /etc/resolv.conf

nameserver 8.8.8.8

This is the content of my /etc/wsl.conf [network] generateResolvConf=false

any ideas? ~