Closed doctorpangloss closed 5 months ago
Thanks for opening this and providing repro steps. I've confirmed the repro and have opened MSFT internal 38776581 for reference.
There are a few articles about how Windows nslookup fails if the primary DNS fails, even when the second DNS is working fine:
https://defaultroot.com/index.php/2019/10/08/nslookup-default-behaviour-during-failover-of-primary-dns/#:~:text=Windows%20nslookup%20will%20always%20use%20the%20primary%20DNS,in%20the%20nslookup%20command%3A%20So%20all%20is%20well%21 https://social.technet.microsoft.com/Forums/en-US/b1977a50-c482-4daf-b113-63e87b9430d3/secondary-dns-does-not-resolve-160-the-nslookup-requests-windows-customer-160-when-the-primary
The second article notes that ping is a better basic test of DNS.
@doctorpangloss Just so that we can understand severity, is this a blocking issue for you or are you noting an unexpected behavior that is not blocking? Thanks!
Sounds good if this is a general Windows issue...
I am reopening this because it seems almost every Windows container user encounters it
I'm having this issue since the last Windows Update to 10.0.22621.1105.
does Resolve-DnsName
work? My understanding is that Resolve-DnsName
is the preferred tool for DNS lookups according to the networking team due to a difference in resolvers between the two. We call out using Resolve-DnsName
in the kubernetes docs: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#dns-windows
On Windows, there are multiple DNS resolvers that can be used. As these come with slightly different behaviors, using the Resolve-DNSName powershell cmdlet for name query resolutions is recommended.
unfortunately not
Just so that we can understand severity, is this a blocking issue for you or are you noting an unexpected behavior that is not blocking? Thanks!
It's really hard to say. There's a real issue here. For example would a golang
application use the same mechanism as nslookup
or Resolve-DNSName
?
This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.
This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.
This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.
This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.
This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.
We've the same behaviour in our Kubernetes Environment with Windows Nodes. nslookup doesn't work, Resolve-DnsName works without any problem.
We are also facing name-resolution issues since approximately one week ago. I think these two issues are all related to the same https://github.com/microsoft/Windows-Containers/issues/386 and https://github.com/microsoft/Windows-Containers/issues/420
The primary dns server is defaulting to the default gateway, by any chance did you intend to do that? The secondary dns servers will resolve. However a work around that will allow this to work is: Create a new docker nat network: docker network create -d "nat" --subnet "10.240.0.0/24" -o com.docker.network.windowsshim.disable_gatewaydns=true natgw
docker run -it --rm --net=natgw mcr.microsoft.com/windows/servercore:ltsc2022
You could possibly try deleting the nat network and creating it with the options above as well. Let me know if this works!
Unfortunately the approach of disabling the default gateway resolves the issue with resolving DNS queries however it breaks the internal docker DNS that resolves the containers ip by the container name.
Any actual workarounds for this?
> docker run -it --rm mcr.microsoft.com/windows/servercore:ltsc2022
> ipconfig -all
Windows IP Configuration
Host Name . . . . . . . . . . . . : 34eba5103fea
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
Ethernet adapter Ethernet:
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
Physical Address. . . . . . . . . : 00-15-5D-14-4B-41
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::585b:6496:8770:bb8e%4(Preferred)
IPv4 Address. . . . . . . . . . . : 172.17.91.239(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.240.0
Default Gateway . . . . . . . . . : 172.17.80.1
DHCPv6 IAID . . . . . . . . . . . : 67114333
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2D-08-2A-B6-00-15-5D-14-4B-41
DNS Servers . . . . . . . . . . . : 172.17.80.1
192.168.2.253
NetBIOS over Tcpip. . . . . . . . : Disabled
Ping 172.17.80.1 => FAIL Ping 192.168.2.253 => FAIL Ping 8.8.8.8 => SUCCESS nslookup using 172.17.80.1 => FAIL nslookup using 192.168.2.253 => FAIL nslookup using 8.8.8.8 => SUCCESS
I tried:
Facing the same issue, any updates here? I have no workaround and for me the issue appeared out of nowhere.
A lot of this is mentioned in bits and pieces above, but here is what's going on, which I confirmed in our lab:
The Windows implementation of nslookup uses its own internal implementation of the DNS protocol and will only query one DNS server. By default this is the first DNS server in the list. This differs from nslookup on Linux which will retry with other servers in the resolv.conf file.
For containers the first DNS server in the list and the one used by nslookup by default is the Docker DNS resolver. This is required in order to resolve container IPs by their name. Unfortunately due to the Docker DNS issue Zappa linked up above the Docker resolver does not currently forward requests to an external DNS server. This is in the process of being fixed in the Moby repo.
Given the above, our general recommendation is to use the Resolve-DnsName cmdlet instead. Resolve-DnsName uses the built-in DNS client for the OS which will retry with all of the available DNS servers and unlike nslookup also works with configurations that use newer DNS technologies such as DNSSEC, DNS-over-HTTP (DoH) and DNS-over-TLS (DoT). This is the best way to determine if DNS is functioning within the container. Any application that relies on Windows to do the DNS lookup will get the same behavior as Resolve-DNSName.
If you really want to use nslookup you can, but be aware of the above limitations.
You may also be having DNS connection issues outside of the container host. To narrow that down, use pktmon to trace the packet to see if it leaves the container host in the correct format. In my environment 8.8.8.8 is blocked somewhere on the network. I can confirm it is not an issue with the Windows Container host by doing the following:
On the container host:
PS C:\> pktmon filter remove
PS C:\> pktmon filter add -t tcp -p 53
PS C:\> pktmon filter add -t udp -p 53
PS C:\> pktmon start --capture
In the container:
PS C:\> nslookup bing.com 8.8.8.8
DNS request timed out.
timeout was 2 seconds.
Server: UnKnown
Address: 8.8.8.8
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
*** Request to UnKnown timed-out
Back on the container host:
PS C:\> pktmon stop
PS C:\> pktmon etl2txt PktMon.etl
PS C:\> notepad pktmon.txt
In notepad I can look at the Appearance # to find the last appearance of the DNS request packet, check which component it was last seen on, and confirm that the packet looks correct:
[02]0000.0000::2024-03-20 14:13:45.720095500 [Microsoft-Windows-PktMon] PktGroupId 281474976710677, PktNumber 1, Appearance 14, Direction Tx , Type Ethernet , Component 6, Edge 1, Filter 2, OriginalSize 80, LoggedSize 80
00-15-5D-C8-8E-16 > E8-B5-D0-2C-24-40, ethertype IPv4 (0x0800), length 80: 10.127.130.152.59312 > 8.8.8.8.53: 1+ PTR? 8.8.8.8.in-addr.arpa. (38)
Further down in the file I can see that Component 6 is the ethernet adapter:
[00]1D7C.0AFC::2024-03-20 14:13:59.067489700 [Microsoft-Windows-PktMon] Component 6, Type Miniport , Name netvsc.sys, Microsoft Hyper-V Network Adapter #2
[00]1D7C.0AFC::2024-03-20 14:13:59.067489900 [Microsoft-Windows-PktMon] Property: Component 6, PhysAddress = 0x00155DC88E16
[00]1D7C.0AFC::2024-03-20 14:13:59.067490200 [Microsoft-Windows-PktMon] Property: Component 6, NdisMedium = Ethernet
Whenever the last appearance is the ethernet adapter it's safe to assume the packet left the machine. I verified that the IP addresses are correct, and that the destination MAC address is the physical ethernet switch. Since I never see a response in the pktmon log I know that I never received a reply on the container host.
I will leave this issue open for a few days longer, if anyone can show a pktmon log that suggests the container host dropped the packet incorrectly (not including the Moby issue above), I can look into that. If not, I'll close this issue.
It makes sense to me if there were documentation somewhere that nslookup
should simply not be used on Windows.
Describe the bug
nslookup
fails inside a container.To Reproduce
Expected behavior
nslookup
should work against the default DNS server.Configuration:
Additional context