moby / swarmkit

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.
Apache License 2.0
3.35k stars 615 forks source link

Docker container from one service can't connect to a container from another swarm service spawned on a different node via the service name #1529

Closed niau closed 8 years ago

niau commented 8 years ago

My environment: I have a swarm cluster from one manager and one node. On Ubuntu 16.04 with an inactive ufw firewall.

Docker version

root@head1:/home/niau# docker version
Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:33:38 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:33:38 2016
 OS/Arch:      linux/amd64

Docker info

Containers: 5
 Running: 1
 Paused: 0
 Stopped: 4
Images: 2
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 22
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null overlay host
Swarm: active
 NodeID: bju9jvsnx23cf1hz5pdicg4zt
 Is Manager: true
 ClusterID: 6tfaldrfmcfucanjq985ikzy7
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.0.2.7
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 740.4 MiB
Name: head1
ID: TPCI:HI3A:YIOR:DX5N:UF2T:ZIMN:TTNP:GDYU:SWRM:5EDY:C4U6:TYD5
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

How to reproduce the issue:

Create 3 services with replica factor 1 and endpoint mode vip so that to get a distribution like one container on one of the nodes and two on the other. Make them to use the initially (by default) created internal overlay docker network.

root@head1:/home/niau# for I in `seq 1 3`; do docker service create --name nms$I --network ingress --endpoint-mode vip -p 808$I:8080/tcp tomcat; done
dqmjcloapuxzly8cubh89i5n3
1o8okbe4h8efxe2kfea3c6r1a
dalcyd45ex0t1lv8rqi0v9ozk
root@head1:/home/niau# docker service ls
ID            NAME  REPLICAS  IMAGE   COMMAND
1o8okbe4h8ef  nms2  1/1       tomcat  
dalcyd45ex0t  nms3  1/1       tomcat  
dqmjcloapuxz  nms1  1/1       tomcat  
root@head1:/home/niau# docker service ps nms1
ID                         NAME    IMAGE   NODE   DESIRED STATE  CURRENT STATE           ERROR
dwz3rpi95guimoursmpfnmcz7  nms1.1  tomcat  head1  Running        Running 25 seconds ago  
root@head1:/home/niau# docker service ps nms2
ID                         NAME    IMAGE   NODE   DESIRED STATE  CURRENT STATE           ERROR
8vfp3wdt0v6dkv0wts0nvq5b8  nms2.1  tomcat  node1  Running        Running 26 seconds ago  
root@head1:/home/niau# docker service ps nms3
ID                         NAME    IMAGE   NODE   DESIRED STATE  CURRENT STATE           ERROR
9vg54l68wzmrxuj38bh06dhes  nms3.1  tomcat  node1  Running        Running 28 seconds ago 

Example service inspect

niau@head1:~$ sudo docker service inspect nms1 
[
    {
        "ID": "dqmjcloapuxzly8cubh89i5n3",
        "Version": {
            "Index": 112
        },
        "CreatedAt": "2016-09-13T08:26:14.624441946Z",
        "UpdatedAt": "2016-09-13T08:26:14.62679166Z",
        "Spec": {
            "Name": "nms1",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "tomcat"
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "Networks": [
                {
                    "Target": "47uaavbs6hy1xwzu6fald4dvq"
                }
            ],
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 8081
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8080,
                        "PublishedPort": 8081
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 8080,
                    "PublishedPort": 8081
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "47uaavbs6hy1xwzu6fald4dvq",
                    "Addr": "10.255.0.2/16"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]

Once you do that you can try to resolve the name of the services. The pattern I have found is:

root@587b81820e4a:/usr/local/tomcat# nslookup nms1
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
Name:   nms1
Address: 10.255.0.2

root@587b81820e4a:/usr/local/tomcat# nslookup nms2
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
Name:   nms2
Address: 10.255.0.7

root@587b81820e4a:/usr/local/tomcat# nslookup nms3
Server:     127.0.0.11
Address:    127.0.0.11#53

Non-authoritative answer:
Name:   nms3
Address: 10.255.0.10

-> container from service nms2 on host node1 trying to reach nms1 service that is in a container running on a different node (head1)

root@65142c85f77f:/usr/local/tomcat# telnet nms1 8081
Trying 10.255.0.2...
**telnet: Unable to connect to remote host: No route to host**
root@65142c85f77f:/usr/local/tomcat# telnet nms1 8080
Trying 10.255.0.2...
**telnet: Unable to connect to remote host: No route to host**

When I check the routing on the container

root@65142c85f77f:/usr/local/tomcat# ip route
default via 172.18.0.1 dev eth1 
10.255.0.0/16 dev eth0  proto kernel  scope link  src 10.255.0.12 
172.18.0.0/16 dev eth1  proto kernel  scope link  src 172.18.0.4 

->>>> Which is a bug to me....

niau commented 8 years ago

One more comment I have monitored exactly the same behavior also in an environment build against the head of master docker repository branch.

niau@docker-swarm-head-1:~$ docker info
Containers: 4
 Running: 1
 Paused: 0
 Stopped: 3
Images: 6
Server Version: 1.13.0-dev
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 26
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local azurefile
 Network: bridge host null overlay overlay
Swarm: active
 NodeID: 7p1ab5vhjax085liyxlrrkfl6
 Is Manager: true
 ClusterID: b9sxgfffc4pk274xhyylvyeqg
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.0.0.57
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 27.48 GiB
Name: docker-swarm-head-1
ID: CZFA:OXKQ:K6OD:4TMK:BRT6:5FXD:6NUE:JG5I:XKNV:J6AJ:5DLA:QK3R
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
niau@docker-swarm-head-1:~$ docker version
Client:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   ebae43e
 Built:        Sat Sep 10 10:38:38 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   ebae43e
 Built:        Sat Sep 10 10:38:38 2016
 OS/Arch:      linux/amd64
mrjana commented 8 years ago

@niau Since this is an issue found by using docker directly, would you mind re-creating this issue in docker/docker?

mrjana commented 8 years ago

I am closing the issue here since it needs to be created in docker/docker

niau commented 8 years ago

@mrjana this issue has been created and found by using swarm.

Docker directly is used just for verification purposes. Please confirm that I still needs to open it against docker/docker.

mrjana commented 8 years ago

@niau Yes, all issues found my using docker directly should be filed against docker/docker. This just helps to have one single funnel point to triage all the docker issues.

mrjana commented 8 years ago

Please file the issue there and ping me there. We can start the conversation there.