microsoft / Windows-Containers

Welcome to our Windows Containers GitHub community! Ask questions, report bugs, and suggest features -- let's work together.
MIT License
410 stars 64 forks source link

nslookup fails inside a container using the default DNS server #216

Closed doctorpangloss closed 5 months ago

doctorpangloss commented 2 years ago

Describe the bug nslookup fails inside a container.

To Reproduce

docker run -it --rm mcr.microsoft.com/windows/servercore:ltsc2022
Microsoft Windows [Version 10.0.20348.587]
(c) Microsoft Corporation. All rights reserved.

C:\>nslookup www.google.com
Server:  UnKnown
Address:  172.30.224.1

C:\>nslookup www.google.com 8.8.8.8
Server:  dns.google
Address:  8.8.8.8

Non-authoritative answer:
Name:    www.google.com
Addresses:  2607:f8b0:4005:80e::2004
          216.58.195.68
*** UnKnown can't find www.google.com: Server failed

Expected behavior nslookup should work against the default DNS server.

Configuration:

Client:
Cloud integration: v1.0.22
Version:           20.10.12
API version:       1.41
Go version:        go1.16.12
Git commit:        e91ed57
Built:             Mon Dec 13 11:44:07 2021
OS/Arch:           windows/amd64
Context:           default
Experimental:      true

Server: Docker Desktop 4.5.1 (74721)
Engine:
 Version:          20.10.12
 API version:      1.41 (minimum version 1.24)
 Go version:       go1.16.12
 Git commit:       459d0df
 Built:            Mon Dec 13 11:42:13 2021
 OS/Arch:          windows/amd64
 Experimental:     false

Additional context

cwilhit commented 2 years ago

Thanks for opening this and providing repro steps. I've confirmed the repro and have opened MSFT internal 38776581 for reference.

michbern-ms commented 2 years ago

There are a few articles about how Windows nslookup fails if the primary DNS fails, even when the second DNS is working fine:

https://defaultroot.com/index.php/2019/10/08/nslookup-default-behaviour-during-failover-of-primary-dns/#:~:text=Windows%20nslookup%20will%20always%20use%20the%20primary%20DNS,in%20the%20nslookup%20command%3A%20So%20all%20is%20well%21 https://social.technet.microsoft.com/Forums/en-US/b1977a50-c482-4daf-b113-63e87b9430d3/secondary-dns-does-not-resolve-160-the-nslookup-requests-windows-customer-160-when-the-primary

The second article notes that ping is a better basic test of DNS.

@doctorpangloss Just so that we can understand severity, is this a blocking issue for you or are you noting an unexpected behavior that is not blocking? Thanks!

doctorpangloss commented 2 years ago

Sounds good if this is a general Windows issue...

doctorpangloss commented 1 year ago

I am reopening this because it seems almost every Windows container user encounters it

lippertmarkus commented 1 year ago

I'm having this issue since the last Windows Update to 10.0.22621.1105.

jsturtevant commented 1 year ago

does Resolve-DnsName work? My understanding is that Resolve-DnsName is the preferred tool for DNS lookups according to the networking team due to a difference in resolvers between the two. We call out using Resolve-DnsName in the kubernetes docs: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#dns-windows

On Windows, there are multiple DNS resolvers that can be used. As these come with slightly different behaviors, using the Resolve-DNSName powershell cmdlet for name query resolutions is recommended.

lippertmarkus commented 1 year ago

unfortunately not

doctorpangloss commented 1 year ago

Just so that we can understand severity, is this a blocking issue for you or are you noting an unexpected behavior that is not blocking? Thanks!

It's really hard to say. There's a real issue here. For example would a golang application use the same mechanism as nslookup or Resolve-DNSName?

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. no assignees, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been open for 30 days with no updates. @sbangari, @MikeZappa87, please provide an update or close this issue.

burkhat commented 1 year ago

We've the same behaviour in our Kubernetes Environment with Windows Nodes. nslookup doesn't work, Resolve-DnsName works without any problem.

sam-sla commented 1 year ago

We are also facing name-resolution issues since approximately one week ago. I think these two issues are all related to the same https://github.com/microsoft/Windows-Containers/issues/386 and https://github.com/microsoft/Windows-Containers/issues/420

MikeZappa87 commented 11 months ago

The primary dns server is defaulting to the default gateway, by any chance did you intend to do that? The secondary dns servers will resolve. However a work around that will allow this to work is: Create a new docker nat network: docker network create -d "nat" --subnet "10.240.0.0/24" -o com.docker.network.windowsshim.disable_gatewaydns=true natgw

docker run -it --rm --net=natgw mcr.microsoft.com/windows/servercore:ltsc2022

You could possibly try deleting the nat network and creating it with the options above as well. Let me know if this works!

MikeZappa87 commented 10 months ago

Unfortunately the approach of disabling the default gateway resolves the issue with resolving DNS queries however it breaks the internal docker DNS that resolves the containers ip by the container name.

davhdavh commented 9 months ago

Any actual workarounds for this?

> docker run -it --rm mcr.microsoft.com/windows/servercore:ltsc2022
> ipconfig -all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : 34eba5103fea
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
   Physical Address. . . . . . . . . : 00-15-5D-14-4B-41
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::585b:6496:8770:bb8e%4(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.17.91.239(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 172.17.80.1
   DHCPv6 IAID . . . . . . . . . . . : 67114333
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-2D-08-2A-B6-00-15-5D-14-4B-41
   DNS Servers . . . . . . . . . . . : 172.17.80.1
                                       192.168.2.253
   NetBIOS over Tcpip. . . . . . . . : Disabled

Ping 172.17.80.1 => FAIL Ping 192.168.2.253 => FAIL Ping 8.8.8.8 => SUCCESS nslookup using 172.17.80.1 => FAIL nslookup using 192.168.2.253 => FAIL nslookup using 8.8.8.8 => SUCCESS

I tried:

florianehmke commented 6 months ago

Facing the same issue, any updates here? I have no workaround and for me the issue appeared out of nowhere.

grcusanz commented 6 months ago

A lot of this is mentioned in bits and pieces above, but here is what's going on, which I confirmed in our lab:

  1. The Windows implementation of nslookup uses its own internal implementation of the DNS protocol and will only query one DNS server. By default this is the first DNS server in the list. This differs from nslookup on Linux which will retry with other servers in the resolv.conf file.

  2. For containers the first DNS server in the list and the one used by nslookup by default is the Docker DNS resolver. This is required in order to resolve container IPs by their name. Unfortunately due to the Docker DNS issue Zappa linked up above the Docker resolver does not currently forward requests to an external DNS server. This is in the process of being fixed in the Moby repo.

Given the above, our general recommendation is to use the Resolve-DnsName cmdlet instead. Resolve-DnsName uses the built-in DNS client for the OS which will retry with all of the available DNS servers and unlike nslookup also works with configurations that use newer DNS technologies such as DNSSEC, DNS-over-HTTP (DoH) and DNS-over-TLS (DoT). This is the best way to determine if DNS is functioning within the container. Any application that relies on Windows to do the DNS lookup will get the same behavior as Resolve-DNSName.

If you really want to use nslookup you can, but be aware of the above limitations.

You may also be having DNS connection issues outside of the container host. To narrow that down, use pktmon to trace the packet to see if it leaves the container host in the correct format. In my environment 8.8.8.8 is blocked somewhere on the network. I can confirm it is not an issue with the Windows Container host by doing the following:

On the container host:

   PS C:\> pktmon filter remove
   PS C:\> pktmon filter add -t tcp -p 53
   PS C:\> pktmon filter add -t udp -p 53
   PS C:\> pktmon start --capture

In the container:

   PS C:\> nslookup bing.com 8.8.8.8
   DNS request timed out.
       timeout was 2 seconds.
   Server:  UnKnown
   Address:  8.8.8.8

   DNS request timed out.
       timeout was 2 seconds.
   DNS request timed out.
       timeout was 2 seconds.
   DNS request timed out.
       timeout was 2 seconds.
   DNS request timed out.
       timeout was 2 seconds.
   *** Request to UnKnown timed-out

Back on the container host:

   PS C:\> pktmon stop
   PS C:\> pktmon etl2txt PktMon.etl
   PS C:\> notepad pktmon.txt

In notepad I can look at the Appearance # to find the last appearance of the DNS request packet, check which component it was last seen on, and confirm that the packet looks correct:

   [02]0000.0000::2024-03-20 14:13:45.720095500 [Microsoft-Windows-PktMon] PktGroupId 281474976710677, PktNumber 1, Appearance 14, Direction Tx , Type Ethernet , Component 6, Edge 1, Filter 2, OriginalSize 80, LoggedSize 80 
    00-15-5D-C8-8E-16 > E8-B5-D0-2C-24-40, ethertype IPv4 (0x0800), length 80: 10.127.130.152.59312 > 8.8.8.8.53: 1+ PTR? 8.8.8.8.in-addr.arpa. (38)

Further down in the file I can see that Component 6 is the ethernet adapter:

   [00]1D7C.0AFC::2024-03-20 14:13:59.067489700 [Microsoft-Windows-PktMon] Component 6, Type Miniport , Name netvsc.sys, Microsoft Hyper-V Network Adapter #2 
   [00]1D7C.0AFC::2024-03-20 14:13:59.067489900 [Microsoft-Windows-PktMon] Property: Component 6, PhysAddress  = 0x00155DC88E16 
   [00]1D7C.0AFC::2024-03-20 14:13:59.067490200 [Microsoft-Windows-PktMon] Property: Component 6, NdisMedium  = Ethernet  

Whenever the last appearance is the ethernet adapter it's safe to assume the packet left the machine. I verified that the IP addresses are correct, and that the destination MAC address is the physical ethernet switch. Since I never see a response in the pktmon log I know that I never received a reply on the container host.

I will leave this issue open for a few days longer, if anyone can show a pktmon log that suggests the container host dropped the packet incorrectly (not including the Moby issue above), I can look into that. If not, I'll close this issue.

doctorpangloss commented 6 months ago

It makes sense to me if there were documentation somewhere that nslookup should simply not be used on Windows.

grcusanz commented 5 months ago

@doctorpangloss Thanks for the suggestion, I've submitted a PR to the container networking docs with this information.