moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.34k stars 611 forks source link

Windows container unable to ping Ubuntu Container #2160

Open WillNye opened 7 years ago

WillNye commented 7 years ago

We are trying to create a docker swarm that has a ASP.NET Docker service running on a Windows docker container that is served via an Nginx service running on an Ubuntu docker-machine. The problem is the Windows container is not properly responding to other services in the swarm. We have Django apps using the same Nginx service which are running great but the ASP.NET app is timing out.

We went into the Nginx container and successfully touched the ASP.NET container but when we went into the ASP.NET container requests were being sent to the correct IP but all requests time out.

Here is everything we know to this point:

Docker version 17.04.0-ce, build 4845c56 Running Windows Server 2016 - KB3150513 Disabled firewall to confirm it isn't an issue with ports Windows container running on docker instance directly on the Windows Server Ubuntu containers are on Ubuntu Docker machines Ubuntu services can ping Windows services but Windows services unable to reach Ubuntu Services Ubuntu-> Windows Windows ->X Ubuntu

Our Stack:

ID                  NAME                    IMAGE                          NODE                DESIRED STATE       CURRENT STATE                    ERROR                       PORTS
jhqjiqg9dhsw        ourStack_DjangoProd.1   ourRepo/Django-prod:rc         ourStack-worker-5   Running             Running 38 seconds ago
qixjelib9bzj        ourStack_nginx.1        ourRepo/stealth-nginx:latest   ourStack-worker-5   Running             Running 34 seconds ago
k8u0p6mzwz54        ourStack_DjangoStage.1  ourRepo/Django-staging:latest  ourStack-worker-5   Running             Running 39 seconds ago
v1tzi1z7hxei        ourStack_DjangoDev.1    ourRepo/Django-staging:latest  manager-3           Running             Running 39 seconds ago
uxlvdlp6qwuu        ourStack_asp-app.1      ourRepo/asp-app:latest         CAMSTAT-SVR         Running             Running less than a second ago
omvwb4cjdrzd        ourStack_nginx.2        ourRepo/stealth-nginx:latest   manager-3           Running             Running 33 seconds ago
kc8tp6dpf6i1        ourStack_DjangoProd.2   ourRepo/Django-prod:rc         manager-3           Running             Running 37 seconds ago

Nginx ifconfig:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.255.0.13  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:0a:ff:00:0d  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.5  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 02:42:ac:12:00:05  txqueuelen 0  (Ethernet)
        RX packets 229  bytes 263967 (257.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 107  bytes 7219 (7.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.0.0.6  netmask 255.255.255.0  broadcast 0.0.0.0
        ether 02:42:0a:00:00:06  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1  (Local Loopback)
        RX packets 12  bytes 1456 (1.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 12  bytes 1456 (1.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Nginx ping ASP:

root@f4dde99e6401:/var/www# ping ourStack_asp-app
PING ourStack_asp-app (10.0.0.2): 56 data bytes
64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=0.401 ms
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.095 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.090 ms
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=0.105 ms
64 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=0.104 ms
64 bytes from 10.0.0.2: icmp_seq=5 ttl=64 time=0.097 ms
64 bytes from 10.0.0.2: icmp_seq=6 ttl=64 time=0.092 ms
64 bytes from 10.0.0.2: icmp_seq=7 ttl=64 time=0.094 ms
64 bytes from 10.0.0.2: icmp_seq=8 ttl=64 time=0.099 ms
64 bytes from 10.0.0.2: icmp_seq=9 ttl=64 time=0.095 ms
--- ourStack_asp-app ping statistics ---
10 packets tranDjangotted, 10 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.090/0.127/0.401/0.091 ms

Nginx service inspect:

[
    {
        "ID": "ualbse6gr74jehk66ec9ftekk",
        "Version": {
            "Index": 5388
        },
        "CreatedAt": "2017-05-02T13:24:12.436333625Z",
        "UpdatedAt": "2017-05-02T13:24:12.442605083Z",
        "Spec": {
            "Name": "ourStack_nginx",
            "Labels": {
                "com.docker.stack.namespace": "python"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "8675309/stealth-nginx:latest@sha256:ce9c2faab917ad2fe626843e96846eacb98dc8feb5fcd930ebe1d4da77bf0681",
                    "Labels": {
                        "com.docker.stack.namespace": "python"
                    }
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "on-failure",
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.platform.os == linux"
                    ]
                },
                "ForceUpdate": 0
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 2
                }
            },
            "UpdateConfig": {
                "Parallelism": 2,
                "Delay": 10000000000,
                "FailureAction": "pause",
                "MaxFailureRatio": 0
            },
            "Networks": [
                {
                    "Target": "8mcvr2tcfgeacn16gulkkzkpb",
                    "Aliases": [
                        "nginx"
                    ]
                }
            ],
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 80,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 80,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 80,
                    "PublishedPort": 80,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "fbopakq2nzt436rq1b2zv5ol7",
                    "Addr": "10.255.0.10/16"
                },
                {
                    "NetworkID": "8mcvr2tcfgeacn16gulkkzkpb",
                    "Addr": "10.0.0.4/24"
                }
            ]
        }
    }
]

ASP Service:

CONTAINER ID        IMAGE                                                                                        COMMAND                   CREATED             STATUS              PORTS               NAMES
e986dfb90c79        ourRepo/asp-app@sha256:1580d526c06482d7ef92c1fdec4beb494ff340feb6f9ec7125fdf67ffa7eefab   "C:\\ServiceMonitor..."   6 minutes ago       Up 6 minutes        80/tcp, 6000/tcp    ourStack_asp-app.1.uxlvdlp6qwuu837kutibvhnac
'''

**ASP ipconfig:**
'''
Windows IP Configuration

Ethernet adapter vEthernet (Container NIC 67208761):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::f9c8:5b32:1e75:d7bb%64
   IPv4 Address. . . . . . . . . . . : 10.255.0.6
   Subnet Mask . . . . . . . . . . . : 255.255.0.0
   Default Gateway . . . . . . . . . :

Ethernet adapter vEthernet (Container NIC 6546affc):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::d53e:9d30:758a:29ee%69
   IPv4 Address. . . . . . . . . . . : 10.0.0.3
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . :

Ethernet adapter vEthernet (Container NIC 5e1156e0):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::d000:c63d:ff:b91a%74
   IPv4 Address. . . . . . . . . . . : 172.17.227.52        -- Can access .NET app from here at port 6000
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 172.17.224.1

ASP ping:

ping ourStack_nginx

Pinging ourStack_nginx [10.0.0.4] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 10.0.0.4:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

ASP service inspect:

[
    {
        "ID": "kojjy62r48slhipzvyti56ewb",
        "Version": {
            "Index": 5381
        },
        "CreatedAt": "2017-05-02T13:24:11.78049809Z",
        "UpdatedAt": "2017-05-02T13:24:11.782822174Z",
        "Spec": {
            "Name": "ourStack_asp-app",
            "Labels": {
                "com.docker.stack.namespace": "ourStack"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "ourRepo/asp-app:latest@sha256:1580d526c06482d7ef92c1fdec4beb494ff340feb6f9ec7125fdf67ffa7eefab",
                    "Labels": {
                        "com.docker.stack.namespace": "ourStack"
                    }
                },
                "Resources": {},
                "Placement": {
                    "Constraints": [
                        "node.platform.os == windows"
                    ]
                },
                "ForceUpdate": 0
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "Networks": [
                {
                    "Target": "8mcvr2tcfgeacn16gulkkzkpb",
                    "Aliases": [
                        "asp-app"
                    ]
                }
            ],
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 6000,
                        "PublishedPort": 6000,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 6000,
                        "PublishedPort": 6000,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 6000,
                    "PublishedPort": 6000,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "fbopakq2nzt436rq1b2zv5ol7",
                    "Addr": "10.255.0.3/16"
                },
                {
                    "NetworkID": "8mcvr2tcfgeacn16gulkkzkpb",
                    "Addr": "10.0.0.2/24"
                }
            ]
        }
    }
]
mavenugo commented 7 years ago

@WillNye swarm-mode on Windows doesn't support VIP based load-balancing. Can you try creating the service using --endpoint-mode=dns-rr ? This will create the service with load-balancing done using DNS-RR.

Also, swarm-mode on Windows doesnt support Routing-mesh. You can instead publish the port using -p mode=host,target=x,published=y.

WillNye commented 7 years ago

@mavenugo Sorry for the delay in response. I did try to do as you are suggesting and outlined on https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/swarm-mode but with no success. I've also tried several variations as mentioned in the comments with the link above. Such as docker service create --name test_asp --constraint node.platform.os==windows --detach=false --endpoint-mode=dnsrr --publish mode=host,target=80,published=7999 repo/aspnet-helloworld:latest

ThisFunctionalTom commented 6 years ago

Hi, may I ask if you found solution for your problem? We have the same problem with our hybrid swarm. We can ping each linux container from the windows but we can not ping the service: For example: ping myservice --> does not work ping myservice.1 --> does work

On the other hand both ping commands work from another Linux container.

WillNye commented 6 years ago

@tenigma Sadly, we wound up just abandoning efforts to dockerize our Windows apps. It really seems like swarm support with Windows and Linux just isn't there yet. I'm not sure what your situation is but if you're trying to serve your .Net apps over Nginx we just did a round robin within Nginx instead.

Something like:

upstream netApp {
        server .NetContainerOne;
        server .NetContainerTwo;
}

server { 
        listen 80;
        server_name yoursite.com;

       location / {
                proxy_pass http://netApp;
                proxy_next_upstream error http_500 http_502 http_503 http_504;
        }
}
ThisFunctionalTom commented 6 years ago

@WillNye Thanks for your answer. We are trying to dockerize some backend services and we wanted to use rabbitmq linux image. Our windows app cannot talk to service but can talk direct to a service container. I found some videos and this blog post (http://collabnix.com/building-hybrid-docker-swarm-mode-cluster-on-google-cloud-platform/) where they use "--endpoint-mode dnsrr". Tomorow we will try this and if it does not work we will try to switch to windows cluster only or switch to nats queue instead of rabbitmq.

Thanks again for your help.

vadimkorr commented 6 years ago

Hi all, facing same issue with Windows Server with containers on Azure (Docker version 17.06.2-ee-6, build e75fdb8) and Ubuntu Server on Azure (Docker version 18.01.0-ce, build 03596f5) on the same swarm. So, containers hosted on Ubuntu can ping Windows container, but Windows hosted containers, unfortunately, cannot see Ubuntu hosted containers.

Please, any news about that? Thank you in advance

daschott commented 5 years ago

Does it work to curl the Linux container from inside the Windows container?

raghunathchary commented 4 years ago

Hi, is there any update on this issue, I'm facing similar issue when I tried to run the dockerized .net core application on Windows container in swarm.

management:
    image: service-a
    ports:
      - target: 4200
        published: 4200
        protocol: tcp
        mode: host
    secrets:
      - cert_pass
    environment:
        - CERT_PASS=C:\ProgramData\Docker\secrets\cert_pass
    volumes:
     - .\servicea\certificates:c:\application\certificates
     - .\servicea\environment:c:\application\environment
    deploy:
      endpoint_mode: dnsrr

I can see container is running and ports are active, verified using telnet and is able to connect to port 4200.

When I browsed the application from browser, I get error timeout as response. curl http://localhost:4200/ from host machine is also failing with timeout.

But it is working fine when I logged into container and curl http://localhost:4200/is successful.

When I pinged the container IP address, I can see that ping got "TTL expired in transit" for ping response. Does that indicate any issue with routing between host and container network?