microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 399 forks source link

DNS doesn't work in containers #741

Open chaholl opened 6 years ago

chaholl commented 6 years ago

I have: Windows 10, Docker for Windows CE, Latest Service Fabric SDK and a local single node cluster.

I create two asp.net core services and deploy them to Service Fabric as container hosts:

    <EntryPoint>
      <ContainerHost>
        <ImageName>webapplication2:dev</ImageName>
      </ContainerHost>
    </EntryPoint>

If I connect to the running containers using docker exec and try pinging the configured DNS names, I get no response. If I use PowerShell and try: Invoke-WebRequest http://the_dns_name I also get no response. These things work fine when using the IP address.

Is this a configuration issue or am I trying to do something that can't be done and if it can't be done then how is it possible to sensibly resolve container endpoints?

ashishnegi commented 6 years ago

@ninzavivek for DNS issue.

samedder commented 6 years ago

@chaholl did you enable the DNS service?

chaholl commented 6 years ago

It’s enabled by default and seems to be working. Or are you referring to something else that isn’t visible on the service fabric dashboard?

samedder commented 6 years ago

How did you create the service, can you share the service description you used?

chaholl commented 6 years ago

I’ll recreate it and share the details but the steps I followed were:

(In Visual Studio 2017 Latest) Create new Asp.Net Core project Add container orchestration Select Service Fabric The service fabric project is added with the necessary config. Publish directly from Visual Studio

All works as expected. However, if I ‘docker exec’ into the container I get no response from ping . I tested this by creating two applications and assigning dns names to them in the service config.

It’s worth mentioning that I’m running this on Windows 10 in a VMWare virtual machine. I have made the UDP tweak but that hasn’t helped.

mikkelhegn commented 6 years ago

@chaholl Are you aware of these known limitations: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-how-to-debug-windows-containers - look a bit down the article.

mani-ramaswamy commented 6 years ago

@chaholl please let us know if Mikkel's reply answered your question.

chaholl commented 6 years ago

Unfortunately no. It still doesn't work after recreating the cluster using the machine name and making the other tweaks suggested in the guide.

I can resolve the local machine name within the containers but not any dns names I assign using ServiceDnsName:

 <Service Name="WebApplication2" 
             ServiceDnsName="test.app2" 
             ServicePackageActivationMode="ExclusiveProcess">
      <StatelessService ServiceTypeName="WebApplication2Type" 
                        InstanceCount="[WebApplication2_InstanceCount]">
        <SingletonPartition />
      </StatelessService>
    </Service>
chaholl commented 6 years ago

Interestingly, I see this:

PS C:\app> resolve-dnsname test.app1
resolve-dnsname : test.app1 : Not enough storage is available to complete this operation
At line:1 char:1
+ resolve-dnsname test.app1
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (test.app1:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : ERROR_OUTOFMEMORY,Microsoft.DnsClient.Commands.ResolveDnsName

PS C:\app> resolve-dnsname www.google.com

Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
www.google.com                                 AAAA   25    Answer     2a00:1450:4009:801::2004
www.google.com                                 A      25    Answer     172.217.23.4

(Since nano doesn't have nslookup)

test.app1 is the dnsname of the container that I'm connected to, whereas if I do a lookup on another container I see:

PS C:\app> resolve-dnsname test.app2
resolve-dnsname : test.app2 : DNS name does not exist
At line:1 char:1
+ resolve-dnsname test.app2
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (test.app2:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : DNS_ERROR_RCODE_NAME_ERROR,Microsoft.DnsClient.Commands.ResolveDnsName

My point is that it's behaving differently depending on whether the dns name is mapped to the local adapter or not.

ninzavivek commented 6 years ago

Please provide following information:

  1. SF version.
  2. Host OS version.
  3. Is this one box environment?
  4. Could you share the link to Nano server image? As far I know Nano server regular image doesn't have Powershell. The Powershell image variant has the .Net Core version, which is very thin of system cmdlets.
  5. Also share the output of ipconfig /all output of both on host and inside the container.
chaholl commented 6 years ago

Service Fabric:

C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup> Get-ServiceFabricRuntimeSupportedVersion
Trace folder already exists. Traces will be written to existing trace folder: C:\WINDOWS\system32\DeploymentTraces

Version      SupportExpiryDate   TargetPackageLocation
-------      -----------------   ---------------------
6.1.456.9494 15/07/2018 00:00:00 https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.1.456.9494/Micro...
6.1.467.9494 15/07/2018 00:00:00 https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.1.467.9494/Micro...
6.1.472.9494 15/07/2018 00:00:00 https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.1.472.9494/Micro...
6.1.480.9494 15/07/2018 00:00:00 https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.1.480.9494/Micro...
6.2.274.9494                     https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.2.274.9494/Micro...
6.2.283.9494                     https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.2.283.9494/Micro...
6.2.301.9494                     https://download.microsoft.com/download/B/0/B/B0BCCAC5-65AA-4BE3-AB13-D5FF5890F4B5/6.2.301.9494/Micro...

Windows Version:

C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup> [System.Environment]::OSVersion.Version

Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      17134  0

This is a single machine running Windows 10 under VMWare Workstation. It's running a single node development cluster (though i see the same behaviour on a 5-node)

The docker file is using:

microsoft/dotnet:2.1-aspnetcore-runtime

ipconfig /All for HOST:


C:\WINDOWS\system32> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : DESKTOP-J3RVK99
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : Home

Ethernet adapter Ethernet on Host:

   Connection-specific DNS Suffix  . : Home
   Description . . . . . . . . . . . : Intel(R) 82574L Gigabit Network Connection
   Physical Address. . . . . . . . . : 00-0C-29-AB-BB-9D
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 192.168.1.71(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : 13 July 2018 09:30:49
   Lease Expires . . . . . . . . . . : 14 July 2018 09:30:49
   Default Gateway . . . . . . . . . : 192.168.1.254
   DHCP Server . . . . . . . . . . . : 192.168.1.254
   DNS Servers . . . . . . . . . . . : 192.168.1.71
                                       192.168.1.254
   NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter vEthernet (Default Switch):

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter
   Physical Address. . . . . . . . . : C2-15-52-E1-EA-A9
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::d9e4:2839:f3cb:62b4%4(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.24.74.193(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.240
   Default Gateway . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 234886493
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-22-63-8A-44-00-0C-29-AB-BB-9D
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Disabled

Ethernet adapter vEthernet (nat):

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter microsoft/service-fabric-issues#2
   Physical Address. . . . . . . . . : 00-15-5D-67-B7-02
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::99dc:bf4b:db70:e0a2%34(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.18.0.1(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :
   DHCPv6 IAID . . . . . . . . . . . : 570430813
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-22-63-8A-44-00-0C-29-AB-BB-9D
   DNS Servers . . . . . . . . . . . : fec0:0:0:ffff::1%1
                                       fec0:0:0:ffff::2%1
                                       fec0:0:0:ffff::3%1
   NetBIOS over Tcpip. . . . . . . . : Enabled

ipconfig /all for container 1:

C:\app>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : 6a817f5509c8
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : WebApplication2Application

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : Home
   Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
   Physical Address. . . . . . . . . : 00-15-5D-67-BB-9E
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::752f:4719:404a:114c%4(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.18.13.85(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 172.18.0.1
   DHCPv6 IAID . . . . . . . . . . . : 67114333
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-22-D9-DB-82-00-15-5D-67-BB-9E
   DNS Servers . . . . . . . . . . . : 172.18.0.1
                                       192.168.1.71
                                       192.168.1.254
   NetBIOS over Tcpip. . . . . . . . : Disabled

ipconfig /all for container 2:

C:\app>ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : 22c49287aba5
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : WebApplication1Application

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : Home
   Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
   Physical Address. . . . . . . . . : 00-15-5D-67-BB-01
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::2c85:9dde:9816:d433%4(Preferred)
   IPv4 Address. . . . . . . . . . . : 172.18.10.88(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 172.18.0.1
   DHCPv6 IAID . . . . . . . . . . . : 67114333
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-22-D9-64-26-00-15-5D-67-BB-01
   DNS Servers . . . . . . . . . . . : 172.18.0.1
                                       192.168.1.71
                                       192.168.1.254
   NetBIOS over Tcpip. . . . . . . . : Disabled
ninzavivek commented 6 years ago

So just be sure your local cluster is on VM , right? [This is a single machine running Windows 10 under VMWare Workstation. ]

If yes, then you might bre running into this issue.

Running Windows10 in a Virtual Machine will not get DNS reply back to the container. Resolution: Disable UDP checksum offload for IPv4 on the Virtual Machines NIC Please note this will degrade networking performance on the machine. https://github.com/Azure/service-fabric-issues/issues/1061

ninzavivek commented 6 years ago

Also tried running a container

docker run -i --dns-search Application4 microsoft/dotnet:2.1-aspnetcore-runtime

[Inside the container]

C:>powershell powershell 'powershell' is not recognized as an internal or external command, operable program or batch file.

C:>resolve-dns resolve-dns 'resolve-dns' is not recognized as an internal or external command, operable program or batch file.

C:>resolve-dnsname resolve-dnsname 'resolve-dnsname' is not recognized as an internal or external command, operable program or batch file.

chaholl commented 6 years ago

@ninzavivek Yes, this is Service Fabric running on Windows 10 running inside a VMWare Workstation VM.

I was aware of the microsoft/service-fabric-issues#1061 and have made the UDP tweak described. It hasn't made any difference. I can resolve external DNS, just not within the Service fabric cluster.

With regards to getting PowerShell on the container, you're right - the asp.net core 2.1 image doesn't have PowerShell. I guess I did those tests using .Net Core 2.0. Apologies, I should have checked that.

You can replicate the problem using:

docker run -it --dns-search Application4 microsoft/aspnetcore:2.0-nanoserver-sac2016 Powershell.exe
ninzavivek commented 6 years ago

@chaholl

This from inside a container [IP Address info redacted].

Host OS Version: Windos 10 1803 [ Not a VM, Bare metal machine] Container Image : microsoft/aspnetcore:2.0-nanoserver-sac2016

Windows PowerShell Copyright (C) 2016 Microsoft Corporation. All rights reserved.

PS C:> ping service1.testapp

Pinging service1.testapp [X.X.X.X] with 32 bytes of data: Reply from X.X.X.X: bytes=32 time<1ms TTL=127 Reply from X.X.X.X: bytes=32 time=1ms TTL=127 Reply from X.X.X.X: bytes=32 time=27ms TTL=127 Reply from X.X.X.X: bytes=32 time=16ms TTL=127

Ping statistics for X.XX.X: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 27ms, Average = 11ms PS C:> Resolve-DnsName -Name service1.testapp

Name Type TTL Section IPAddress


service1.testapp A 1 Answer X.X.X.X

PS C:> [System.Environment]::OSVersion.Version

Major Minor Build Revision


10 0 14393 0

I have a containerized app on a local cluster with DNS Name service1.testapp

ninzavivek commented 6 years ago

Is it possible for you try on a non-vm environment and see if it reproduces?

chaholl commented 6 years ago

I'm sure I can replicate your findings on a bare metal environment but that doesn't really fix the problem. Can you confirm that you've been able to replicate the issue in a VM?

ninzavivek commented 6 years ago

Ok, I will try giving this a try inside a VM environment and update you with my findings.

ninzavivek commented 6 years ago

I am unable to reproduce. So as to confirm the workaround was done correctly. Could you share the output of the following cmdlet on the VM?

Get-NetAdapterChecksumOffload

chaholl commented 6 years ago

Sure, on the host VM the cmdlet output is:

Name                           IpIPv4Enabled   TcpIPv4Enabled  TcpIPv6Enabled  UdpIPv4Enabled  UdpIPv6Enabled
----                           -------------   --------------  --------------  --------------  --------------
Ethernet on Host               RxTxEnabled     RxTxEnabled     RxTxEnabled     Disabled        RxTxEnabled
vEthernet (nat)                RxTxEnabled     RxTxEnabled     RxTxEnabled     RxTxEnabled     RxTxEnabled
vEthernet (Default Switch)     RxTxEnabled     RxTxEnabled     RxTxEnabled     RxTxEnabled     RxTxEnabled

Not sure if it matters but if I run the same command inside the Docker container I get:

Name                           IpIPv4Enabled   TcpIPv4Enabled  TcpIPv6Enabled
----                           -------------   --------------  --------------
Ethernet                       RxTxEnabled     RxTxEnabled     RxTxEnabled

The adapter inside the container doesn't share a physical address with any of those on the host so I guess it's a new Hyper-V adapter that's provisioned automatically?

btastic commented 6 years ago

I have the same exact issue. Any news?

PS C:\app> ping mssql.fabric                                                                              
Ping request could not find host mssql.fabric. Please check the name and try again.    

PS C:\app> resolve-dnsname mssql.fabric                                                                   
resolve-dnsname : mssql.fabric : DNS server failure                                                       
At line:1 char:1                                                                                          
+ resolve-dnsname mssql.fabric                                                                            
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                            
    + CategoryInfo          : ResourceUnavailable: (mssql.fabric:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : RCODE_SERVER_FAILURE,Microsoft.DnsClient.Commands.ResolveDnsName            

PS C:\app> ipconfig /all                                                                                  

Windows IP Configuration                                                                                  

   Host Name . . . . . . . . . . . . : d2d088437b2a                                                       
   Primary Dns Suffix  . . . . . . . :                                                                    
   Node Type . . . . . . . . . . . . : Hybrid                                                             
   IP Routing Enabled. . . . . . . . : No                                                                 
   WINS Proxy Enabled. . . . . . . . : No                                                                 
   DNS Suffix Search List. . . . . . : ServiceFabric                                                      

Ethernet adapter Ethernet:                                                                                

   Connection-specific DNS Suffix  . : fritz.box                                                          
   Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter                                  
   Physical Address. . . . . . . . . : 00-15-5D-48-57-30                                                  
   DHCP Enabled. . . . . . . . . . . : Yes                                                                
   Autoconfiguration Enabled . . . . : Yes                                                                
   Link-local IPv6 Address . . . . . : fe80::c56c:3c46:df70:5975%4(Preferred)                             
   IPv4 Address. . . . . . . . . . . : 172.22.172.31(Preferred)                                           
   Subnet Mask . . . . . . . . . . . : 255.255.240.0                                                      
   Default Gateway . . . . . . . . . : 172.22.160.1                                                       
   DHCPv6 IAID . . . . . . . . . . . : 67114333                                                           
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-23-49-30-6B-00-15-5D-48-57-30                          
   DNS Servers . . . . . . . . . . . : 172.22.160.1                                                       
                                       192.168.178.57                                                     
   NetBIOS over Tcpip. . . . . . . . : Disabled                                                           

I can reach and connect to the mssql.fabric from my local machine using the DNS name. No issues in the Service Fabric Explorer.

Windows 10 Machine with latest Docker CE + latest Service Fabric SDK. Single Node Cluster

moikot commented 5 years ago

The same issue for me. I'm more worried about being able to resolve from the host but I'm getting:

PS E:\ServiceFabric> Resolve-DnsName test.service
Resolve-DnsName : test.service : Not enough memory resources are available to complete this operation
At line:1 char:1
+ Resolve-DnsName test.service
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (test.service:String) [Resolve-DnsName], Win32Exception
    + FullyQualifiedErrorId : ERROR_OUTOFMEMORY,Microsoft.DnsClient.Commands.ResolveDnsName
RoggerFabri commented 5 years ago

Have any of you managed to solve this? I have a SF Cluster with 3 nodes, 1 of the nodes cannot resolve DNS from inside the docker images, the other 2 can resolve normally, and they're identical.

chaholl commented 5 years ago

Not very helpful, I know, but I gave up in the end and switched to Kubernetes. There's some short term pain involved but it's worth it in the long run.

RoggerFabri commented 5 years ago

@chaholl cannot disagree.

moikot commented 5 years ago

@chaholl We did the same.