Open Neurone opened 4 years ago
Thanks for the detailed report. I can't repro here, but I am pretty sure that's only because my ISP isn't coughing up any AUTHORITY/ADDITIONAL upstream. Please collect logs and backlink the feedback item. Your analysis looks awfully sound on its own merits, but having traces for the devs to look at is better than not.
Sure. I updated the issue adding a query to an authoritative DNS - so you can see by your own even if your ISP does not act like mine - and I added the link to the feedback item. I also updated the new IP of the resolver (now 192.168.16.1) to reflect the recording of the feeedback.
This happens for me too. Any fix in the works? This gets in the way of DNS administration.
This happens for zones that are authoritative for your upstream DNS server. I see it on VPN for the office, but not when I'm not on VPN.
This is still happening with build 10.0.19042.610.
Really surprised to see so few people affected by this.
The "ADDITIONAL SECTION" is added to the "ANSWER SECTION". This makes things really bad when you are managing thousands of targets with Ansible...
I am affected by this issue as well.
Windows build number: 10.0.19042.685
Your Distribution version: Ubuntu 20.04
Whether the issue is on WSL 2 and/or WSL 1: WSL2 Linux version 4.19.104-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Wed Feb 19 06:37:35 UTC 2020
Thanks for the detailed report. I can't repro here, but I am pretty sure that's only because my ISP isn't coughing up any AUTHORITY/ADDITIONAL upstream. Please collect logs and backlink the feedback item. Your analysis looks awfully sound on its own merits, but having traces for the devs to look at is better than not.
You should be able to reproduce this by pointing your windows DNS server(s) to servers that are authoritative for something instead of to your default ISP servers or public servers like google. I don't know of any public servers off hand that are both authoritative AND allow recursive lookups. This, however, is common for corporate DNS servers when on VPN connections. Often the company DNS servers handle internal non-routable names for the company domain as well as recursing requests for external domains.
I have this issue as well! It commonly affects Terraform. When doing a packet capture, I can see the NS records for the lookup of registry.terraform.io being returned in the answer section. Then I see terraform making its request for the url using a NS record IP and not the actual site IP because the NS records were returned incorrectly. Please help!
Here is the DNS response coming into my PC:
Here is the DNS response coming into my WSL:
Thank you guys all,
I've tested on Hyper-V, and I also encountered this problem. I doubt this is problem with the Hyper-V's DNS server.
Should anyone with WSL 1 test if this problem exists?
/cc @therealkenc
Thank you guys all,
I've tested on Hyper-V, and I also encountered this problem. I doubt this is problem with the Hyper-V's DNS server.
Should anyone with WSL 1 test if this problem exists?
/cc @therealkenc
Quick update: our university's DNS provides AUTHORITY SECTION
and ADDITIONAL SECTION
in response.
The issue does not appear to affect WSLv1.
It did affect me back when I was running WSLv1. I've not been running v1 for a while now.
The issue only happens if the upstream DNS server is providing authoritative/additional records. In this case the ADDITIONAL records are merged in to the reply, but I don't see any AUTHORITY records getting merged.
$ dig www.churchofjesuschrist.org @172.27.16.1
; <<>> DiG 9.16.1-Ubuntu <<>> www.churchofjesuschrist.org @172.27.16.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30030
;; flags: qr rd ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;www.churchofjesuschrist.org. IN A
;; ANSWER SECTION:
www.churchofjesuschrist.org. 0 IN CNAME www.churchofjesuschrist.org.edgekey.net.
www.churchofjesuschrist.org.edgekey.net. 0 IN CNAME e15515.dsca.akamaiedge.net.
e15515.dsca.akamaiedge.net. 0 IN A 23.50.233.88
e15515.dsca.akamaiedge.net. 0 IN A 23.50.233.69
n0dsca.akamaiedge.net. 0 IN A 88.221.81.192
n7dsca.akamaiedge.net. 0 IN A 184.50.88.125
n6dsca.akamaiedge.net. 0 IN A 104.124.1.39
n3dsca.akamaiedge.net. 0 IN A 184.28.219.181
n5dsca.akamaiedge.net. 0 IN A 23.66.122.5
n4dsca.akamaiedge.net. 0 IN A 184.28.219.46
n2dsca.akamaiedge.net. 0 IN A 23.66.122.47
n1dsca.akamaiedge.net. 0 IN A 23.66.122.29
n0dsca.akamaiedge.net. 0 IN AAAA 2600:1480:e800::c0
;; Query time: 20 msec
;; SERVER: 172.27.16.1#53(172.27.16.1)
;; WHEN: Mon Sep 27 11:30:21 MDT 2021
;; MSG SIZE rcvd: 607
Why is this taking so long to fix? This is obvious incorrect behavior. Is the source the the DNS proxy available so we can submit a patch?
I agree, it definitely seems like something significant enough that it should not still be dragging on after this long.
This is causing an issue with terraform in WSL2. Can you please fix this?
More and more programs used in WSL2 stop working - Helm and Terraform, just to name a few. Can somebody take this bug seriously?
As a quick workaround, use cloudflare or similar resolver.
Modify /etc/resolv.conf
to
nameserver 1.1.1.1
For VPN or private zones, use appropriate DNS server.
👍 to fix this, it is breaking Terraform usage on WSL2.
Major bug, I lost 1/2 a day to this issue, as it's really not obvious what the problem is Seriously needs fixing
I'm not expecting a great DNS server for such limited use, but please make it standards-compliant.
Couldn't WSL just use the Windows DNS Servers, without hosting its own? Basically, instead of putting the WSL-DNS into the relov.conf, you could just put the Nameservers that Windows is configured to use there... It might lead to double upstream requests, since Windows and Linux might request the same domain independant of each other, but it would work. And it would still allow wsl to use the Interface based DNS-Servers you can set in Windows.
Couldn't WSL just use the Windows DNS Servers, without hosting its own?
WSL is actually using the Hyper-V's default recursive resolver. Creating an Hyper-V guest and it would show the same wrong result.
Basically, instead of putting the WSL-DNS into the relov.conf, you could just put the Nameservers that Windows is configured to use there... It might lead to double upstream requests, since Windows and Linux might request the same domain independant of each other, but it would work. And it would still allow wsl to use the Interface based DNS-Servers you can set in Windows.
Yes.
This bug is in the default Hyper-V resolver? Has it been reported to that team? Anyone have a link to the reported issue in Hyper-V? Does anyone have steps to reproduce this in Hyper-V without being a WSL instance?
This bug is in the default Hyper-V resolver? Does anyone have steps to reproduce this in Hyper-V without being a WSL instance?
I think so. I've tested Hyper-V alone in https://github.com/microsoft/WSL/issues/5806#issuecomment-926858463
Has it been reported to that team? Anyone have a link to the reported issue in Hyper-V?
idk.
As noted in https://github.com/microsoft/WSL/issues/7642, for WSL2 specifically the DNS resolver is provided by Windows Internet Connection Sharing (ICS) aka the SharedAccess
service. This is also documented on Microsoft docs:
Internet Connection Sharing (ICS) is a required component of WSL 2. The ICS service is used by the Host Network Service (HNS) to create the underlying virtual network which WSL 2 relies on for NAT, DNS, DHCP, and host connection sharing.
So I think the bug is not in Hyper-V, but is in ICS.
I tried to quickly reproduce the bad DNS replies using Mobile Hotspot in Windows, to show it's not WSL2-specific, but ICS crashed as soon as I started the hotspot 🙃 (and I was left with no DNS resolution in WSL2). I will try harder a little later.
@craigloewen-msft can you please take a look at this? Is there any way we can provide more info?
I personally found this issue while running dig google.com any
+1. This is affecting many popular CLI tools of WSL2, including terraform / helm
Hey!
This has been a pain for us for a long time now, can we have some update on the status? There are many tools that fail because of this and it's an obvious bug since it goes against the DNS specs.
I just wanted to let everyone know that this is still a problem.
this is still a problem... "helm dep up" takes more than 20 minutes!
This looks like it's an issue with the Hyper-V DNS resolver. Has this been reported to the Hyper-V team? I see similar behavior in other Hyper-V VMs.
Affected too
I've solved all issues regarding DNS and VPN using this tool: https://github.com/sakai135/wsl-vpnkit
Affected too
I have posted this issue in the Windows Feedback hub under Hyper-V, so make sure to upvote it: https://aka.ms/AAlwtcs (in the hope it helps).
Confirmed to be an issue on my end, all DNS queries include root servers:
(.venv) zollo@ws-zollo1:~$ nslookup apple.com
Server: 172.20.176.1
Address: 172.20.176.1#53
Non-authoritative answer:
Name: apple.com
Address: 17.253.144.10
Name: l.root-servers.net
Address: 199.7.83.42
Name: k.root-servers.net
Address: 193.0.14.129
Name: e.root-servers.net
Address: 192.203.230.10
Name: f.root-servers.net
Address: 192.5.5.241
Name: a.root-servers.net
Address: 198.41.0.4
Name: c.root-servers.net
Address: 192.33.4.12
Name: d.root-servers.net
Address: 199.7.91.13
Name: g.root-servers.net
Address: 192.112.36.4
Name: b.root-servers.net
Address: 199.9.14.201
Name: i.root-servers.net
Address: 192.36.148.17
Name: j.root-servers.net
Address: 192.58.128.30
Name: m.root-servers.net
Address: 202.12.27.33
Name: apple.com
Address: 2620:149:af0::10
Name: l.root-servers.net
Address: 199.7.83.42
Name: k.root-servers.net
Address: 193.0.14.129
Name: e.root-servers.net
Address: 192.203.230.10
Name: f.root-servers.net
Address: 192.5.5.241
Name: a.root-servers.net
Address: 198.41.0.4
Name: c.root-servers.net
Address: 192.33.4.12
Name: d.root-servers.net
Address: 199.7.91.13
Name: g.root-servers.net
Address: 192.112.36.4
Name: b.root-servers.net
Address: 199.9.14.201
Name: i.root-servers.net
Address: 192.36.148.17
Name: j.root-servers.net
Address: 192.58.128.30
Name: m.root-servers.net
Address: 202.12.27.33
WSL version: 1.2.5.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.2134
How can something this fundamental still be an issue three years after it was first reported.
Every other OS, even Vista, managed to do DNS correctly.
How can something this fundamental still be an issue three years after it was first reported.
Usually this is because the relevant code once belonged to an entirely different product team, which likely has been long disbanded, and as a result that code has become orphaned. Don't forget that some of that code base (Internet Connection Sharing?) is probably more than a quarter century old, and never was intended to serve DNS to a modern Linux environment via WSL. This DNS server/proxy functionality also seems entirely undocumented, so even finding it will be a challenge. Windows NT has lots of "skeletons in the closest", and the code behind this issue is probably one of those. I suggest the best way forward is to find an experienced code archeologist and start decompiling, as even the source code may have been lost.
If you think I'm joking: https://arstechnica.com/gadgets/2017/11/microsoft-patches-equation-editor-flaw-without-fixing-the-source-code/
I'd suggest that this is a bigger issue than Microsoft will give credit for. It essentially makes it impossible to use DNS reliably in some environments, such as the case where a DNS proxy will forward some requests to Azure to take advantage of Azure private DNS. WSL simply cannot resolve some of these private hostnames because of this bug.
This is a Hyper-V issue. Spin up an Hyper-V virtual machine and you will see the exact same bug.
@mgkuhn even given all that, the fact it breaks so much stuff means there should be some priority on getting it fixed, even if it means rebuilding that library from scratch. Three years should be enough to get something done.
This is a Hyper-V issue. Spin up an Hyper-V virtual machine and you will see the exact same bug.
And now test it on a separate Linux box connected to the Internet via Windows' "Internet Connection Sharing", all the way back to Windows 98 SE. Hyper-V may be just another environment affected, not the actual location of that DNS code.
Will the newly announced dnsTunneling option help to circumvent the problem? https://devblogs.microsoft.com/commandline/windows-subsystem-for-linux-september-2023-update/#dns-tunneling
Please try enabling "dnsTunneling" and let us know if it fixes the issue. thanks!
It looks like dnsTunneling (I've combined this with the new mirrored networking mode as well) works like a charm 👍🏻
Is there a write up on how to do this somewhere?
Well yes, its in this link as posted by @mgkuhn:
Will the newly announced dnsTunneling option help to circumvent the problem? https://devblogs.microsoft.com/commandline/windows-subsystem-for-linux-september-2023-update/#dns-tunneling
Just make sure to have the latest win11 22H2 feature updates (released yesterday, KB5030310) installed.
Environment
Steps to reproduce
Query the
TXT
record of a domain, for example:Please note that DNS server
192.168.16.1
comes from the Hyper-V Virtual Network Adapter and it is dynamically and automatically configured by WSL/ICS/Windows, so the exact DNS server's IP changes every time Windows restarts.Here the link to the collected log and feedback item: https://aka.ms/AA9dnzo
Expected behavior
Correct DNS response like the examples below, where the
ANSWER
section contains only theANSWER
section and not also some info from theAUTHORITY
/ADDITIONAL
sections.The following query is done using the current authoritative DNS server for ultradns.com
The following query is done using my ISP's DNS.
The following query is done using Google's public DNS server.
Actual behavior
Info from the
AUTHORITY
/ADDITIONAL
sections are mixed in theANSWER
section: this behaviour currently creates issues to other programs that need to process the answer.For example, in this issue geth cannot unmarshal the DNS message because it's greater then 512 bytes.
Geth is written in go, and go DNS client follows the RFC 1035 specification. This specification states that via UDP the maximum allowed message size is 512 bytes.
The program works fine with all other DNS servers because
ANSWER
configured in the DNS server is correctly less then 512 bytes, but it fails with WSL that - with the addition of other information - creates anANSWER
section too big.This strange behavior potentially impacts every RFC 1035 compliant library, and at least it impatcs every program written in go-lang and that uses the native DNS client library.
As a final note, I don't know if it is related to the same problem or if it can provide some clues, you can also notice a warning message appearing at the beginning of the DNS response: