moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.76k stars 18.67k forks source link

Docker 1.12 swarm mode load balancing not consistently working #25325

Closed mschirrmeister closed 8 years ago

mschirrmeister commented 8 years ago

Hi,

I have a problem with the docker 1.12 swarm mode load balancing. The setup has 3 hosts, Docker 1.12 on CentOS 7 running in Azure. Nothing really special about the hosts. Plain CentOS 7 setup, Docker 1.12 from the Docker yum repo and btrfs as a data disk for /var/lib/docker.

If I create 2 services, scale them to 3 and then try to access them from a client the access occasionally does not work. What it means is if you access the service via the docker host ip address(es) and exposed ports some containers do not respond.

Output of docker version:

Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.12.0
Storage Driver: btrfs
 Build Version: Btrfs v3.19.1
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge overlay null host
Swarm: active
 NodeID: d7oq3rjt5llc47hr9wt19tood
 Is Manager: true
 ClusterID: 51zzdq5p2xe8otuwmbalyfy2t
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot interval: 10000
  Heartbeat tick: 1
  Election tick: 3
 Dispatcher:
  Heartbeat period: 5 seconds
 CA configuration:
  Expiry duration: 3 months
 Node Address: 10.218.3.5
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.806 GiB
Name: azeausdockerapps301t.azr.omg.wpp
ID: LWMY:RHUH:JJ5O:OP6G:5LV5:7P7B:WI3W:2JMI:B7HY:EP6J:A7SW:DUX2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.): Current test environment is running on Microsoft Azure

Steps to reproduce the issue: Create overlay network

docker network create --driver overlay whoami-net

docker network ls | grep whoami-net
7bmymhp028ov        whoami-net          overlay             swarm

docker network inspect whoami-net
[
    {
        "Name": "whoami-net",
        "Id": "7bmymhp028ov19ia47xpdao7r",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": []
        },
        "Internal": false,
        "Containers": null,
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "257"
        },
        "Labels": null
    }
]

Create services and scale them

docker service create --name service1 --network whoami-net -p 8000 jwilder/whoami
docker service scale service1=3

docker service create --name service2 --network whoami-net -p 8000 jwilder/whoami
docker service scale service2=3

docker service ls

ID            NAME      REPLICAS  IMAGE           COMMAND
0u2d76899t30  service2  3/3       jwilder/whoami
3ecardus67vd  service1  3/3       jwilder/whoami

docker service ps service1

ID                         NAME        IMAGE           NODE                              DESIRED STATE  CURRENT STATE          ERROR
48kab5vtpwbiimn1ilsbakh0j  service1.1  jwilder/whoami  azeausdockerapps303t.marco.lan  Running        Running 3 minutes ago
800eov5dgg4hf1rgjwn2vb17d  service1.2  jwilder/whoami  azeausdockerapps302t.marco.lan  Running        Running 2 minutes ago
2klc639jzqhgy1ejyvqard46t  service1.3  jwilder/whoami  azeausdockerapps301t.marco.lan  Running        Running 2 minutes ago

docker service ps service2

ID                         NAME        IMAGE           NODE                              DESIRED STATE  CURRENT STATE           ERROR
1iyvqd2eskzdr78k86i4bjxc7  service2.1  jwilder/whoami  azeausdockerapps302t.marco.lan  Running        Running 52 seconds ago
b4ntijm8lc99oqq2af5dyh6u9  service2.2  jwilder/whoami  azeausdockerapps303t.marco.lan  Running        Running 48 seconds ago
e3i956f4fxgq847jwsqsstcbq  service2.3  jwilder/whoami  azeausdockerapps301t.marco.lan  Running        Running 48 seconds ago

docker service inspect service1

[
    {
        "ID": "3ecardus67vdjb552xf01hn3f",
        "Version": {
            "Index": 275
        },
        "CreatedAt": "2016-08-02T09:55:30.35862447Z",
        "UpdatedAt": "2016-08-02T09:56:53.477137303Z",
        "Spec": {
            "Name": "service1",
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "jwilder/whoami"
                },
                "Resources": {
                    "Limits": {},
                    "Reservations": {}
                },
                "RestartPolicy": {
                    "Condition": "any",
                    "MaxAttempts": 0
                },
                "Placement": {}
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 3
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause"
            },
            "Networks": [
                {
                    "Target": "7bmymhp028ov19ia47xpdao7r"
                }
            ],
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8000
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 8000
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 8000,
                    "PublishedPort": 30000
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "dpac4u1zv98g9eayoql72jvhq",
                    "Addr": "10.255.0.6/16"
                },
                {
                    "NetworkID": "7bmymhp028ov19ia47xpdao7r",
                    "Addr": "10.0.0.2/24"
                }
            ]
        },
        "UpdateStatus": {
            "StartedAt": "0001-01-01T00:00:00Z",
            "CompletedAt": "0001-01-01T00:00:00Z"
        }
    }
]

Access service1 from a client against docker host 1

➜  ~ time curl http://10.218.3.5:30000
I'm 272dd0310a95
curl http://10.218.3.5:30000  0.01s user 0.01s system 6% cpu 0.217 total
➜  ~ time curl http://10.218.3.5:30000
curl: (7) Failed to connect to 10.218.3.5 port 30000: Operation timed out
curl http://10.218.3.5:30000  0.01s user 0.01s system 0% cpu 1:15.71 total
➜  ~ time curl http://10.218.3.5:30000
curl: (7) Failed to connect to 10.218.3.5 port 30000: Operation timed out
curl http://10.218.3.5:30000  0.01s user 0.01s system 0% cpu 1:16.82 total
➜  ~

Access service2 from a client against docker host 1

➜  ~ time curl http://10.218.3.5:30001
curl: (7) Failed to connect to 10.218.3.5 port 30001: Operation timed out
curl http://10.218.3.5:30001  0.01s user 0.01s system 0% cpu 1:17.69 total
➜  ~ time curl http://10.218.3.5:30001
I'm 8519ed607de5
curl http://10.218.3.5:30001  0.01s user 0.01s system 6% cpu 0.227 total
➜  ~ time curl http://10.218.3.5:30001
curl: (7) Failed to connect to 10.218.3.5 port 30001: Operation timed out
curl http://10.218.3.5:30001  0.01s user 0.01s system 0% cpu 1:15.79 total
➜  ~

Access service1 from a client against docker host 2

➜  ~ time curl http://10.218.3.6:30000
I'm 272dd0310a95
curl http://10.218.3.6:30000  0.01s user 0.01s system 5% cpu 0.232 total
➜  ~ time curl http://10.218.3.6:30000
curl: (7) Failed to connect to 10.218.3.6 port 30000: Operation timed out
curl http://10.218.3.6:30000  0.01s user 0.01s system 0% cpu 1:12.34 total
➜  ~ time curl http://10.218.3.6:30000
I'm 71f6aa01fad4
curl http://10.218.3.6:30000  0.01s user 0.01s system 7% cpu 0.267 total
➜  ~

Access service2 from a client against docker host 2

➜  ~ time curl http://10.218.3.6:30001
I'm 8519ed607de5
curl http://10.218.3.6:30001  0.01s user 0.01s system 6% cpu 0.241 total
➜  ~ time curl http://10.218.3.6:30001
I'm 24dbf906923a
curl http://10.218.3.6:30001  0.01s user 0.01s system 7% cpu 0.246 total
➜  ~ time curl http://10.218.3.6:30001
curl: (7) Failed to connect to 10.218.3.6 port 30001: Operation timed out
curl http://10.218.3.6:30001  0.01s user 0.01s system 0% cpu 1:15.87 total
➜  ~

Access service1 from a client against docker host 3

➜  ~ time curl http://10.218.3.7:30000
I'm 272dd0310a95
curl http://10.218.3.7:30000  0.01s user 0.01s system 4% cpu 0.353 total
➜  ~ time curl http://10.218.3.7:30000
I'm e6289ebe82da
curl http://10.218.3.7:30000  0.01s user 0.01s system 2% cpu 0.513 total
➜  ~ time curl http://10.218.3.7:30000
curl: (7) Failed to connect to 10.218.3.7 port 30000: Operation timed out
curl http://10.218.3.7:30000  0.01s user 0.01s system 0% cpu 1:16.79 total
➜  ~

Access service2 from a client against docker host 3

➜  ~ time curl http://10.218.3.7:30001
I'm 24dbf906923a
curl http://10.218.3.7:30001  0.01s user 0.01s system 7% cpu 0.234 total
➜  ~ time curl http://10.218.3.7:30001
I'm 8519ed607de5
curl http://10.218.3.7:30001  0.01s user 0.01s system 6% cpu 0.216 total
➜  ~ time curl http://10.218.3.7:30001
I'm da18d8e4b307
curl http://10.218.3.7:30001  0.01s user 0.01s system 6% cpu 0.214 total
➜  ~

Describe the results you received: Not all containers respond when accessing the service via the docker host ip addresses and exposed ports.

Describe the results you expected: All containers from a service should respond no matter via which docker host the service is accessed.

Additional information you deem important (e.g. issue happens only occasionally): The issue is occasionally. Occasionally that if you delete and re-create the service maybe all containers respond, or containers on a different host do not respond.

It is at least consistent once a service is created. Lets say, containers on host 2 and host 3 do not respond when accessed via docker host 1, then it is always like this for the lifetime of that service.

mschirrmeister commented 8 years ago

I am at the moment a little bit baffled. If you have any tips where I can look to provide more inside information, then please let me know and I can provide this. Right now it looks to me that some internal load balancing/ipvs stuff is choking?

ushuz commented 8 years ago

I met the same problem with a 3-node setup. I brought up a service with 5 replicas using following command:

docker service create --name helloworld --replicas 5 --publish 8888:80 dockercloud/hello-world

When I curl one-node:8888, it chokes sometimes. As dockercloud/hello-world image returns container ID, I compared all container IDs and found out that one container was never reached. Then I killed that container, swarm brought up a new one, and curl wouldn't stuck anymore.

My three nodes are located at AWS Tokyo, Vultr Tokyo and DigitalOcean SGP1.

thaJeztah commented 8 years ago

I think this may be a duplicate of https://github.com/docker/docker/issues/25219 or https://github.com/docker/docker/issues/25130 could you have a look at those?

mrjana commented 8 years ago

@mschirrmeister Is the problem still there if you started a service with 3 replicas instead of starting the service with one replica and then scaling up?

mschirrmeister commented 8 years ago

@mrjana Yes, the problem is still there, even if I start the service with the option --replicate 3.

mschirrmeister commented 8 years ago

When creating/deleting and querying services I watched today syslog for errors on the host and I saw the following. Not sure how bad that is, or if it is helpful.

querying a service with curl

Aug  3 08:33:52 azeausdockerapps301t dockerd: time="2016-08-03T08:33:52.810065000Z" level=error msg="could not resolve peer \"10.255.0.3\": could not resolve peer: serf instance not initialized"

adding a service

Aug  3 08:25:57 azeausdockerapps302t dockerd: time="2016-08-03T08:25:57.731040585Z" level=error msg="Failed to create real server 10.255.0.11 for vip 10.255.0.10 fwmark 289 in sb d554fb6136ecda3acba06c2b936d235e17cc1273c21d655e1d2d13448fec2825: no such process"

deleting a service

Aug  3 08:36:35 azeausdockerapps303t dockerd: time="2016-08-03T08:36:35.122386868Z" level=info msg="Failed to delete real server 10.255.0.12 for vip 10.255.0.10 fwmark 294: no such file or directory"
rogaha commented 8 years ago

I wasn't able to reproduce it using Boot2Docker version 1.12.0 VMs. So, it seems that the issue happens occasionally indeed.

rogaha@Robertos-MacBook-Pro:~$ docker network create --driver overlay whoami-net                             7:37:52
cke02aohtbpspx5so5gc6a76x
rogaha@Robertos-MacBook-Pro:~$ docker network ls | grep whoami-net                                           8:15:07
cke02aohtbps        whoami-net          overlay             swarm
rogaha@Robertos-MacBook-Pro:~$ docker service create --name service1 --network whoami-net -p 8000 jwilder/whoami
cojb16cncgj76z0sslxt015bc
rogaha@Robertos-MacBook-Pro:~$ docker service scale service1=3
service1 scaled to 3
rogaha@Robertos-MacBook-Pro:~$ docker-machine ssh node3                                                      8:29:42
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___ \ __| | ___   ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.12.0, build HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016
Docker version 1.12.0, build 8eab29e
docker@node3:~$ time curl http://192.168.99.100:30000;time curl http://192.168.99.102:30000;time curl http://192.168.9
9.104:30000
I'm a2531712ae05
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
I'm b0d6d239f694
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
I'm af339bc5e1bf
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
docker@node3:~$ exit
rogaha@Robertos-MacBook-Pro:~$ docker-machine ssh master2                                                    8:29:48
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___ \ __| | ___   ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.12.0, build HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016
Docker version 1.12.0, build 8eab29e
docker@master2:~$ time curl http://192.168.99.100:30000;time curl http://192.168.99.102:30000;time curl http://192.168
.99.104:30000
I'm a2531712ae05
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
I'm b0d6d239f694
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
I'm af339bc5e1bf
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
docker@master2:~$ exit
rogaha@Robertos-MacBook-Pro:~$ docker-machine ssh master1                                                    8:29:54
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___ \ __| | ___   ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.12.0, build HEAD : e030bab - Fri Jul 29 00:29:14 UTC 2016
Docker version 1.12.0, build 8eab29e
docker@master1:~$ time curl http://192.168.99.100:30000;time curl http://192.168.99.102:30000;time curl http://192.168
.99.104:30000
I'm a2531712ae05
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
I'm b0d6d239f694
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
I'm af339bc5e1bf
real    0m 0.00s
user    0m 0.00s
sys 0m 0.00s
docker@master1:~$ docker service ps service1
ID                         NAME        IMAGE           NODE     DESIRED STATE  CURRENT STATE           ERROR
c83vrhg87a0hxpfo0te4weat6  service1.1  jwilder/whoami  master1  Running        Running 17 minutes ago
1vmmzhup16d5t5ounw0z06qqv  service1.2  jwilder/whoami  master2  Running        Running 16 minutes ago
5uhwu39j5wjslsm9gz5b4242n  service1.3  jwilder/whoami  node3    Running        Running 16 minutes ago
docker@master1:~$
erikrs commented 8 years ago

Hi all,

I too have some issues with load balancer which suddenly stops working. My setup is a simple 1 server (CentOS) with Docker 1.12 installed. After a while, following simple play around actions caused it to stop working:

docker swarm init
docker service create --name web --publish 80:80 --replicas 2 nginxdemos/hello
...
docker service scale web=0
docker service scale web=20
docker service scale web=0
docker service scale web=5
docker service scale web=10
docker service scale web=15
docker stop 2a66b345100c
docker rm 2a66b345100c
docker stop 7c79846edf41 085ee0c5596f 165d88d0029b c68f1202d8bc ab78b5649915 debb3f7f5673 76347454844e

I tested it via 2 external servers with curl (watch -n1 "curl -s 10.3.x.x |grep -e 'My hostname|My address'") . First symptom was that the load balancer stopped "round robin" the containers, each curl stayed on the same container - this on all curl tests on all servers including curl on server itself. And then the load balancer stopped all together with timeouts resulting in all curl tests.

syslog at time it stopped: https://gist.github.com/erikrs/7c05940f9c1e98c15a41f367686aa517 and also docker info output

Some time after this, I scaled the swarm service again to 0, and again to 2. It then worked again.

fitz123 commented 8 years ago
  1. Create 3-node cluster with 2 workers and manager
manager0:~$ docker swarm init
Swarm initialized: current node (8bigilkl82ilyhxroro2rigm5) is now a manager.

To add a worker to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-1896eccg5umy8f8uyq60nb4qanp93u56j99uyejpfhqybr1lya-bhl6g7n5ket0zpo4ml6apax5q \
    10.99.10.176:2377

To add a manager to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-1896eccg5umy8f8uyq60nb4qanp93u56j99uyejpfhqybr1lya-2cxbei2xxdrzb87i4lxxi3rhd \
    10.99.10.176:2377
ninja@manager0:~$ docker node update --availability drain `hostname`
manager0
ninja@node0:~$     docker swarm join \
>     --token SWMTKN-1-1896eccg5umy8f8uyq60nb4qanp93u56j99uyejpfhqybr1lya-bhl6g7n5ket0zpo4ml6apax5q \
>     10.99.10.176:2377
This node joined a swarm as a worker.
ninja@node1:~$     docker swarm join \
>     --token SWMTKN-1-1896eccg5umy8f8uyq60nb4qanp93u56j99uyejpfhqybr1lya-bhl6g7n5ket0zpo4ml6apax5q \
>     10.99.10.176:2377
This node joined a swarm as a worker.
  1. Create service
ninja@manager0:~$ docker service create --name frontend --replicas 2 -p 80:8000/tcp jwilder/whoami
e1oezurvdk9fcvw594xnwn25b
ninja@manager0:~$ docker service ps frontend 
ID                         NAME        IMAGE           NODE   DESIRED STATE  CURRENT STATE                    ERROR
1vdzpmsf5dvz74675cyz1mac5  frontend.1  jwilder/whoami  node0  Running        Starting less than a second ago  
2lan6ycd2im5p5pl0oqq4vfoi  frontend.2  jwilder/whoami  node1  Running        Starting less than a second ago  
  1. Check it works as expected
fitz123@fitz123-laptop:~$ for i in `seq 4`; do curl node0; done
I'm 64fd8496e5f7
I'm ecb291129b53
I'm 64fd8496e5f7
I'm ecb291129b53
fitz123@fitz123-laptop:~$ for i in `seq 4`; do curl node1; done
I'm ecb291129b53
I'm 64fd8496e5f7
I'm ecb291129b53
I'm 64fd8496e5f7
  1. Reboot one of nodes
ninja@node0:~$ sudo reboot
  1. Check cluster after reboot
ninja@manager0:~$ docker service ps frontend 
ID                         NAME            IMAGE           NODE   DESIRED STATE  CURRENT STATE               ERROR
bvsmq3pa5l1ifhctuset8dh5s  frontend.1      jwilder/whoami  node1  Running        Running 16 seconds ago      
1vdzpmsf5dvz74675cyz1mac5   \_ frontend.1  jwilder/whoami  node0  Shutdown       Complete 18 seconds ago     
2lan6ycd2im5p5pl0oqq4vfoi  frontend.2      jwilder/whoami  node1  Running        Running about a minute ago  

ninja@manager0:~$ docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
6ytfk6umjiwxzk4su8qkyfcbn    node0     Ready   Active        
8bigilkl82ilyhxroro2rigm5 *  manager0  Ready   Drain         Leader
bl2a1ur0i9fhi3l7giia2fl2j    node1     Ready   Active   

Result after node restart:

fitz123@fitz123-laptop:~$ for i in `seq 4`; do curl node1; done
I'm 0425a0bf58f3
I'm 64fd8496e5f7
I'm 0425a0bf58f3
I'm 64fd8496e5f7
fitz123@fitz123-laptop:~$ for i in `seq 4`; do curl --connect-timeout 2 node0; done
curl: (28) Connection timed out after 2000 milliseconds
curl: (28) Connection timed out after 2000 milliseconds
curl: (28) Connection timed out after 2001 milliseconds
curl: (28) Connection timed out after 2001 milliseconds

If I add 3rd container:

ninja@manager0:~$ docker service scale frontend=3
frontend scaled to 3
ninja@manager0:~$ docker service ps frontend 
ID                         NAME            IMAGE           NODE   DESIRED STATE  CURRENT STATE               ERROR
bvsmq3pa5l1ifhctuset8dh5s  frontend.1      jwilder/whoami  node1  Running        Running about an hour ago   
1vdzpmsf5dvz74675cyz1mac5   \_ frontend.1  jwilder/whoami  node0  Shutdown       Complete about an hour ago  
2lan6ycd2im5p5pl0oqq4vfoi  frontend.2      jwilder/whoami  node1  Running        Running about an hour ago   
9k12yf0gcug7amd7564ookkfj  frontend.3      jwilder/whoami  node0  Running        Preparing 2 seconds ago    

Result after adding 3rd node looks like that:

fitz123@fitz123-laptop:~$ for i in `seq 4`; do curl node1; done
I'm 3fcc3b2b0e41
I'm 0425a0bf58f3
I'm 64fd8496e5f7
I'm 3fcc3b2b0e41
fitz123@fitz123-laptop:~$ for i in `seq 4`; do curl --connect-timeout 2 node0; done
curl: (28) Connection timed out after 2000 milliseconds
I'm 3fcc3b2b0e41
curl: (28) Connection timed out after 2001 milliseconds
curl: (28) Connection timed out after 2001 milliseconds
thaJeztah commented 8 years ago

@mrjana @mavenugo is this issue resolved by https://github.com/docker/docker/pull/25603 ?

mavenugo commented 8 years ago

@thaJeztah this is one of the issues that is potentially solved via #25603. @mschirrmeister can you please confirm ?

asmialoski commented 8 years ago

I have the same issue! I tested with CentOS and Ubuntu nodes, same issue. Usually, issue occurs only on the node that is restarted. If I run "systemctl restart docker" after reboot, apparently resolve the issue for a moment, but issue return after some minutes.

mschirrmeister commented 8 years ago

I updated to 1.12.1-rc1. The package upgrade did also a restart of the service. The issue with the timeout, when accessing the service via http, was gone. But not all 3 backends answered when accessing via the Docker host ip address. On one host it was always going to the same backend. On another host it was load balancing between 2 backends.

I then did a full restart of the hosts and re-created the services. Access via curl works at the moment and all 3 backends on all 3 hosts worked. I will monitor the situation a little more to see if it will break again.

When it was not working after the upgrade, I connected to the container on the docker host where it was load balancing always to the same backend and did a dns lookup to tasks.service and it only showed 1 ip address.

somejfn commented 8 years ago

@mschirrmeister I have seen that same symptom on several occasions.... and I diagnosed it in the IPVS table not being populated correctly (why I don't know). To confirm the issue, cat /proc/net/ip_vs in the ingress-sbox namespace (that's the network namespace doing load balancing for requests coming from the outside). I.e.

cd /var/run/docker/netns/ ; nsenter --net=5683f2b6e546 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 000001C2 rr -> 0AFF0009:0000 Masq 1 0 0
-> 0AFF0008:0000 Masq 1 0 0
-> 0AFF0007:0000 Masq 1 0 0

Those last 3 lines are the hex-encoded IP of the containers being load balanced for the service. On several occasions that list of target containers was either incomplete or outdated.

Also, if you have multiple services defined already... you'll have several of these entries. This one was for service marked with FWM 0x01C2. Find what traffic this was for originally with:

cd /var/run/docker/netns/ ; nsenter --net=5683f2b6e546 iptables -t mangle -L -n -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination
0 0 MARK tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 MARK set 0x1c2

So this service is the one with port 8080 published to the outside world

mrjana commented 8 years ago

@somejfn This is exactly the problem that was fixed in 1.12.1-rc1 i.e incorrect backend information in ipvs. Are you using 1.12.1-rc1?

somejfn commented 8 years ago

@mrjana Was on 1.12.0 with pre-built binaries at https://get.docker.com/builds/Linux/x86_64/docker-latest.tgz. Im on CoreOS (hence no a package manager) so I guess I'd need to build from source to get 1.12.1-rc1 until 1.12.1 is GA ?

mrjana commented 8 years ago

@somejfn Yes, that's right.

mschirrmeister commented 8 years ago

I am definitely running 1.12.1-rc1.

# docker info | grep "Server Version"
Server Version: 1.12.1-rc1

I can confirm I still see the issue. I did today another reboot of all 3 docker hosts. Then started the docker daemon on all 3 hosts and the swarm cluster was back up and running

# docker node ls
ID                           HOSTNAME                          STATUS  AVAILABILITY  MANAGER STATUS
4oprp4u607bo98mxqlwvus881    azeausdockerapps303t.abc.foo.int  Ready   Active        Reachable
5c8mpecymotn9rdvxb8hkh0tf    azeausdockerapps302t.abc.foo.int  Ready   Active        Leader
d7oq3rjt5llc47hr9wt19tood *  azeausdockerapps301t.abc.foo.int  Ready   Active        Reachable

I created then my service again with 1 replica. Scaled it to 3 and accessed it from my client. 1 host goes to only 1 backend. The other 2 hosts go each to 2 backends.

Host1

# nsenter --net=06bd65103713 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM  00000104 rr
  -> 0AFF0007:0000      Masq    1      0          0

Host2/Host3 look like this.

# nsenter --net=9de86f1b7f9a cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM  00000104 rr
  -> 0AFF0008:0000      Masq    1      0          0
  -> 0AFF0006:0000      Masq    1      0          0

When I do a service remove, the entry in /var/run/docker/netns stays there, but it has of course no FWM entries. If I re-create the service, it gets filled with backends, but again with the wrong (to less) backends like above.

asmialoski commented 8 years ago

I have the same issue with 1.12.1-rc1.

Environment:

# docker version
Client:
 Version:      1.12.1-rc1
# cat /etc/issue
Ubuntu 16.04.1 LTS \n \l
# docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
2mzcl28jislbiafxll8lcwpnb    cn07      Ready   Active
7mhcf1199f48vm5vxfah7t7w2 *  cn06      Ready   Active        Leader

Steps to reproduce:

  1. Create a service: docker service create --replicas 1 --publish 8080:80 --name vote instavote/vote
  2. UpScale the service: docker service scale vote=10
  3. DownScale the service: docker service scale vote=1

At this point, I can access the service through just one host. If I check the IPVSADMN, I can see the problem: NODE MASTER (working):

# nsenter --net=241278c9f76b sh
# iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 767 packets, 265K bytes)
pkts bytes target     prot opt in     out     source               destination
 685 84104 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9090 MARK set 0x100
 242 29765 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 MARK set 0x101

Chain INPUT (policy ACCEPT 61 packets, 3820 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 706 packets, 262K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 61 packets, 3680 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.255.0.2           MARK set 0x100
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.255.0.25          MARK set 0x101

Chain POSTROUTING (policy ACCEPT 767 packets, 265K bytes)
 pkts bytes target     prot opt in     out     source               destination
# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  256 rr
  -> 10.255.0.12:0                Masq    1      0          0
FWM  257 rr
  -> 10.255.0.32:0                Masq    1      0          0

IPVSADM is forwarding correctly to IPs 10.255.0.12 and 10.255.0.32.

NODE 2 (not working):

# nsenter --net=3feca1a6e851 sh
# iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 1426 packets, 359K bytes)
 pkts bytes target     prot opt in     out     source               destination
  690 70326 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:9090 MARK set 0x100
  505 49345 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8080 MARK set 0x101

Chain INPUT (policy ACCEPT 152 packets, 10397 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 1274 packets, 349K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 152 packets, 9277 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.255.0.2           MARK set 0x100
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.255.0.25          MARK set 0x101

Chain POSTROUTING (policy ACCEPT 1426 packets, 358K bytes)
 pkts bytes target     prot opt in     out     source               destination
# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  256 rr
  -> 10.255.0.5:0                 Masq    1      0          0
  -> 10.255.0.8:0                 Masq    1      0          0
  -> 10.255.0.9:0                 Masq    1      0          0
  -> 10.255.0.12:0                Masq    1      0          1
  -> 10.255.0.13:0                Masq    1      0          0
  -> 10.255.0.16:0                Masq    1      0          0
  -> 10.255.0.17:0                Masq    1      0          0
  -> 10.255.0.20:0                Masq    1      0          0
  -> 10.255.0.21:0                Masq    1      0          0
  -> 10.255.0.24:0                Masq    1      0          0
FWM  257 rr
  -> 10.255.0.26:0                Masq    1      0          0
  -> 10.255.0.29:0                Masq    1      0          0
  -> 10.255.0.30:0                Masq    1      0          1
  -> 10.255.0.32:0                Masq    1      0          0
  -> 10.255.0.33:0                Masq    1      0          0
  -> 10.255.0.34:0                Masq    1      0          0

IPVSADM on the node 2 continue forwarding connections to old IP's (0.26, 0.29, 0.30 and etc.). IE, DownScale didn't updated ipvsadm.

mrjana commented 8 years ago

@mschirrmeister @asmialoski I've run this scale up/down tests many times and I haven't seen any issues. Can you please post the daemon logs from nodes where you are having issues?

asmialoski commented 8 years ago

@mrjana Please, see logs in attachments.

Let me explain steps that I performed:

I have two nodes: 1 MASTER and 1 WORKER.

  1. Create 2 services: docker service create --replicas 1 --publish 9090:80 --name vote2 instavote/vote docker service create --replicas 1 --publish 8080:80 --name vote instavote/vote
  2. ScaleUP two services to 3 replicas: docker service scale vote=3 docker service scale vote2=3
  3. Reboot the node WORKER
  4. All containers will be migrated to MASTER and, after reboot, external access through both nodes will continue work;
  5. ScaleDown the services to 1 replica: docker service scale vote=1 docker service scale vote2=1
  6. At this point, IPVSADM on the node WORKER is not updated and external access through node WORKKER stop to work.

Tks, logs.zip

shenshouer commented 8 years ago
worker-10-50-1-106 net # journalctl -fu docker
-- Logs begin at Tue 2016-06-28 20:43:18 CST. --
Aug 17 20:02:45 worker-10-50-1-106 bash[23256]: time="2016-08-17T20:02:45+08:00" level=info msg="Firewalld running: false"
Aug 17 20:03:44 worker-10-50-1-106 bash[23256]: time="2016-08-17T20:03:44.904706893+08:00" level=error msg="could not resolve peer \"10.255.0.9\": could not resolve peer: serf instance not initialized"
Aug 18 10:27:38 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:27:38.049230783+08:00" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=4p13wqeythnxsyedi9l9xcdpc
Aug 18 10:27:39 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:27:39.162733023+08:00" level=info msg="Failed to delete real server 10.255.0.15 for vip 10.255.0.12 fwmark 276: no such file or directory"
Aug 18 10:27:39 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:27:39+08:00" level=info msg="Firewalld running: false"
Aug 18 10:30:38 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:30:38+08:00" level=info msg="Firewalld running: false"
Aug 18 10:30:39 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:30:39+08:00" level=info msg="Firewalld running: false"
Aug 18 10:30:39 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:30:39+08:00" level=info msg="Firewalld running: false"
Aug 18 10:30:39 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:30:39+08:00" level=info msg="Firewalld running: false"
Aug 18 10:40:33 worker-10-50-1-106 bash[23256]: time="2016-08-18T10:40:33.031375514+08:00" level=error msg="could not resolve peer \"10.255.0.9\": could not resolve peer: serf instance not initialized"
^C
worker-10-50-1-106 net # cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1068.6.0
VERSION_ID=1068.6.0
BUILD_ID=2016-07-12-1826
PRETTY_NAME="CoreOS 1068.6.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
worker-10-50-1-106 net # exit
exit
core@worker-10-50-1-106 ~ $ docker version
Client:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 23:54:00 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.0
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   8eab29e
 Built:        Thu Jul 28 23:54:00 2016
 OS/Arch:      linux/amd64
core@worker-10-50-1-106 ~ $ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
core@worker-10-50-1-106 ~ $ docker service ls
ID            NAME              REPLICAS  IMAGE                           COMMAND
5qld98b1o02z  echo              1/1       dhub.yunpro.cn/shenshouer/echo
btgpv1p1tu5k  z7a7e4ec7386b738  5/5       dhub.yunpro.cn/shenshouer/echo
core@worker-10-50-1-106 ~ $ docker service ps z7a7e4ec7386b738
ID                         NAME                IMAGE                           NODE                DESIRED STATE  CURRENT STATE           ERROR
e6c2bq2zbt7u7mz5po06vw8jg  z7a7e4ec7386b738.1  dhub.yunpro.cn/shenshouer/echo  worker-10-50-1-107  Running        Running 25 minutes ago
0o9ecgb61ej6kw7cx4lrxm0ad  z7a7e4ec7386b738.2  dhub.yunpro.cn/shenshouer/echo  worker-10-50-1-104  Running        Running 25 minutes ago
0tdh8ltcslyhq13yaijvc0y7e  z7a7e4ec7386b738.3  dhub.yunpro.cn/shenshouer/echo  worker-10-50-1-106  Running        Running 25 minutes ago
e0pqh01d6shkgwina0xf39ego  z7a7e4ec7386b738.4  dhub.yunpro.cn/shenshouer/echo  worker-10-50-1-105  Running        Running 25 minutes ago
bcfmpm5xxrn6dw5mh43irf928  z7a7e4ec7386b738.5  dhub.yunpro.cn/shenshouer/echo  worker-10-50-1-103  Running        Running 23 minutes ago
core@worker-10-50-1-106 ~ $ curl 10.50.1.106:30001
curl: (7) Failed to connect to 10.50.1.106 port 30001: Connection timed out
core@worker-10-50-1-106 ~ $ curl 10.50.1.105:30001
{"clientAddr":"10.255.0.8:35938","url":{"Scheme":"","Opaque":"","User":null,"Host":"","Path":"/","RawPath":"","RawQuery":"","Fragment":""}}core@worker-10-50-1-106 ~ $ curl 10.50.1.108:30001
curl: (7) Failed to connect to 10.50.1.108 port 30001: Connection refused
core@worker-10-50-1-106 ~ $ docker node ls
ID                           HOSTNAME            STATUS  AVAILABILITY  MANAGER STATUS
46w7emjaro9n5wqsdvlb2axeo    worker-10-50-1-102  Ready   Active        Reachable
5bvwh7kzlxxe2srss0samjsyt *  worker-10-50-1-106  Ready   Active        Reachable
5xig0dyp26ebrxp1g4043fk9g    worker-10-50-1-101  Ready   Active        Leader
6ftk92pv02y85vpfg70yeqan6    worker-10-50-1-107  Ready   Active        Reachable
7l65yv0zi3qbtojm01jqad6b9    worker-10-50-1-103  Ready   Active        Reachable
b2v8dld2zuqjgfqx3p1p6oxy1    worker-10-50-1-104  Ready   Active        Reachable
ezqa3ggqrg5e2w5d3mogmelxo    worker-10-50-1-105  Ready   Active        Reachable
mschirrmeister commented 8 years ago

My logs are available here. https://gist.github.com/mschirrmeister/e1b86b93b4524066de7a06aee5bb80ef

What I did was again:

msvticket commented 8 years ago

I have similar experiences. When starting with a fresh swarm and a freshly deployed stack (using "docker stack deploy") it works.

I don't do scaling but regularly redeploy services (using "docker stack deploy").

After each deploy of the same stack (with updated images) I get more problems accessing the containers. Sometimes I get connection refused but mostly I get "Connection timed out".

It might be of interest that I regularly restarts docker on the node where the deploy command is issued. (Swarm related commands starts to give "Error response from daemon: rpc error: code = 4 desc = context deadline exceeded". Restart of docker is the only way I've found to recover from this.)

matti commented 8 years ago

Same here with 1.12.1 I had a 1 manager + 2 workers cluster.

I managed to get nginx service in a state (by doing nothing special, just scaling a bit and testing how VIP works) where it would only respond consistently from the node where it was actually running. Rest of the nodes usually did not respond (connection did not establish), but if I hit control+c to the curl and run the command again it usually responded right away.

Couldn't see anything interesting in the logs. I first restarted docker on all machines and that did not help. Then I issued reboot on all machines at the same time and that did solve the issue..

Service was created (and recreated couple of times) with docker service create. To me it looks like iptables/vip layer was somehow not in sync.

After the reboot I haven't been able to recreate the problem.

RRAlex commented 8 years ago

I'm not exactly sure where to get the VIP's network namespace, but interestingly, no matter which namespace I list with nsenter (/var/run/docker/netns/*) running ipvsadm, I never get FWN 257 rr and FWN 256 rr listed together, and only get the last one (256) once (Just running a single service with 6 replications on 2 nodes with 1.12.1).

Would it indicate that it's not even trying to forward traffic to the second node then?

Bosee commented 8 years ago

I'am got same issue with 1.12.1, setup 3 hosts with 1 manager, expose port 8090: docker service create -p 8090:8090 --name monitor 10.21.49.64:5000/monitor

when service started, it can access 8090 on one node only, remove the service and create it again, it can access 8090 on two nodes sometimes. docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 20
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 22
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay bridge null host
Swarm: active
 NodeID: 9s4xcf48hykhly55y1g0py930
 Is Manager: true
 ClusterID: 3khrw4i372jm2fcsylpnl3wve
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.21.49.64
Runtimes: runc
Default Runtime: runc
Security Options: apparmor
Kernel Version: 3.19.0-66-generic
Operating System: Ubuntu 14.04 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.774 GiB
Name: SZX1000108768
ID: U63Q:22L2:Q2WJ:V7CN:ANGP:QT7I:SQOH:LZEW:J35R:DNTQ:U26H:R3SN
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 10.21.49.64:5000
 127.0.0.0/8

I guess it may be cause by the os kernel version too old, and then upgrade it to 4.2.0-42-generic, the problem disappears!

asmialoski commented 8 years ago

I am using 4.4.0-34-generic kernel version and I still have this issue.

jmzwcn commented 8 years ago

I think this issue is critical and already see the fix in above reference. release 1.12.2? laugh...

mschirrmeister commented 8 years ago

I updated my test nodes to 1.12.1 final and can confirm the issue still exists. I can reproduce it.

bitsofinfo commented 8 years ago

Experiencing the same +1

mavenugo commented 8 years ago

@mschirrmeister @bitsofinfo @jmzwcn could you give https://github.com/docker/docker/pull/25962 a try ? as per one of the commits (libnetwork vendoring), it indicates this issue is resolved.

stylixboom commented 8 years ago

I have almost the same problem, that can be solved by restarting the node. The different is, I cannot actually access the service through the manager IP address https://github.com/docker/swarmkit/issues/1439 I meant, all the running containers.

However, when I to go to that node directly, and try to access by its docker0 IP address, all the containers responded me quite well.

dongluochen commented 8 years ago

In my test the problem is not fully resolved. I try to create/remove services with same published port. The published port doesn't get removed cleanly at docker service rm. I think there is race condition on removing iptables entries and the next service creation.

Running the following script in a 3 node cluster may result in such error.

for i in `seq 1 4` 
do 
    docker service create --name ftest -p 8021:80 dongluochen/nctest 
    sleep 20 
    curl 127.0.0.1:8021
    docker service rm ftest
    docker service create --name gtest -p 8021:80 dongluochen/nctest 
    sleep 20 
    curl 127.0.0.1:8021
    docker service rm gtest
done

In a node's ingress sbox, I can find multiple entries for dpt:8021 when it fails.

root@ip-172-19-241-144:/#  iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 22 packets, 1374 bytes)
 pkts bytes target     prot opt in     out     source               destination
   23  1426 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8021 MARK set 0x16e
   22  1366 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8021 MARK set 0x18c
   16   960 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8021 MARK set 0x1a2
   10   692 MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8022 MARK set 0x1a3

root@ip-172-19-241-144:/# ipvsadm -l -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  418 rr
  -> 10.255.0.5:0                 Masq    1      0          0
FWM  419 rr
  -> 10.255.0.7:0                 Masq    1      0          0

ubuntu@ip-172-19-241-144:~$ docker version
Client:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   bf0df06
 Built:        Fri Aug 26 00:14:08 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   bf0df06
 Built:        Fri Aug 26 00:14:08 2016
 OS/Arch:      linux/amd64

If I add sleep between service remove and next creation, I do not see the failure.

mschirrmeister commented 8 years ago

@mavenugo Is there already a prebuild binary for #25962? I have this week no time to try to build something from source.

kouhin commented 8 years ago

I met the same problem. I recreated the service, but the dead entries still remain in IPVS table, even I forced the node left swarm or restarted the docker daemon. Maybe it's ok to remove theme one by one by using ipvsadm, but it's not possible in prod.

I'm not familiar with IPVS, I wonder should expire_nodest_conn be set to 1 to drop dead entries automatically ?

mrjana commented 8 years ago

I am trying to do a long running test(in a 3 node cluster) with fixes in #25962 which scales a service up and down to see if there are any issues. After that I can post an experimental binary to whoever wants it.

dongluochen commented 8 years ago

@mrjana I can still reproduce the problem on with service create/rm. If I run the following script several times, I can get a failure.

PORT=8088
SERVICE=test
for i in `seq 1 4` 
do 
    docker service create --name $SERVICE -p $PORT:80 dongluochen/nctest 
    sleep 15
    echo "deleting service"
    docker service rm $SERVICE
done
docker service create --name $SERVICE -p $PORT:80 dongluochen/nctest 

Service availability is validated with the following script on any node in the cluster.

while true; do curl -s --show-error -I http://127.0.0.1:8088 | head -n 1; sleep 0.1; done

I'm running mrjana/docker@3ff7123.

ubuntu@ip-172-19-241-144:~$ docker version
Client:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   3ff7123
 Built:        Wed Aug 31 00:33:18 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   3ff7123
 Built:        Wed Aug 31 00:33:18 2016
 OS/Arch:      linux/amd64
mrjana commented 8 years ago

@dongluochen I don't think this is a right validation. If you are trying to create and remove a service and also run curl on the said service in parallel periodically the curl is bound to fail sometimes because the service is not available a during a fraction of time when the service is removed and getting re-added. What you did previously is the right way to validate service create/rm

dongluochen commented 8 years ago

@mrjana I'm validating after the last service create, not in between. I rerun my test from fresh cluster (start docker after removing /var/run/docker and /var/lib/docker on all the nodes) with the following script. After the test run the service is up, but load balancer is not passing traffic properly.

PORT=8089
SERVICE=test
for i in `seq 1 40` 
do 
    docker service create --name $SERVICE -p $PORT:80 dongluochen/nctest 
    docker service rm $SERVICE
done
docker service create --name $SERVICE -p $PORT:80 dongluochen/nctest 

Here is the mangle table from ingress sandbox on the node with the container running. Traffic to port 8089 are marked as 261 (0x105). But there is no such mark on ipvsadm.

root@ip-172-19-241-144:/# iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 1837 packets, 110K bytes)
 pkts bytes target     prot opt in     out     source               destination
 4168  272K MARK       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8089 MARK set 0x105

Chain INPUT (policy ACCEPT 1837 packets, 110K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 1837 packets, 73480 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MARK       all  --  *      *       0.0.0.0/0            10.255.0.7           MARK set 0x106

Chain POSTROUTING (policy ACCEPT 1837 packets, 73480 bytes)
 pkts bytes target     prot opt in     out     source               destination
root@ip-172-19-241-144:/# ipvsadm -l -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
FWM  262 rr
  -> 10.255.0.8:0                 Masq    1      0          0
mrjana commented 8 years ago

@dongluochen Thanks for the clarification. In your previous script you had a 15sec sleep between the create and remove. In the new one you don't. Is that of importance in terms of reproducibility? I will try your latest script to see if I am able to repro.

dongluochen commented 8 years ago

I got similar problem with the previous script. The 15 seconds was to validate each service create was successful. The latest script makes reproduce fast.

mschirrmeister commented 8 years ago

@mavenugo @mrjana I cloned the docker master repo and applied the patches/commits from #25962 (Hope I did everything right). The build produced docker 1.13.0-dev.

With that version, my problem still exists. Reproduced with,

Did then a curl against all 3 docker hosts. 2 nodes balanced between 2 containers and the 3rd node went always to 1 container.

docker info

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 4
Server Version: 1.13.0-dev
Storage Driver: btrfs
 Build Version: Btrfs v3.17
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host null overlay overlay
Swarm: active
 NodeID: d7oq3rjt5llc47hr9wt19tood
 Is Manager: true
 ClusterID: 51zzdq5p2xe8otuwmbalyfy2t
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.218.3.5
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.28.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 6.805 GiB
Name: azeausdockerapps301t.azr.omg.wpp
ID: LWMY:RHUH:JJ5O:OP6G:5LV5:7P7B:WI3W:2JMI:B7HY:EP6J:A7SW:DUX2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

docker version

# docker version
Client:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   800d5f8
 Built:        Mon Sep  5 13:17:20 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.0-dev
 API version:  1.25
 Go version:   go1.7
 Git commit:   800d5f8
 Built:        Mon Sep  5 13:17:20 2016
 OS/Arch:      linux/amd64
mrjana commented 8 years ago

@mschirrmeister You seem to be having a basic issue with load balancing. There seems to be something unique in your environment. I would have to take a look at your hosts to see what's different. I know you offered to provide access to your machines. Is that still possible?

ghost commented 8 years ago

I have the same problem on 6 Raspberry Pi nodes on 1.12.1...

root@swarm00:/var/run/docker/netns# docker info
Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 3
Server Version: 1.12.1
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host null overlay
Swarm: active
 NodeID: 0p0khtm1r1o3qk6wn11ida254
 Is Manager: true
 ClusterID: f2jppvtyvx5nz5r46ljejh8cx
 Managers: 1
 Nodes: 6
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.144.80
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.11-v7+
Operating System: Raspbian GNU/Linux 8 (jessie)
OSType: linux
Architecture: armv7l
CPUs: 4
Total Memory: 925.5 MiB
Name: swarm00.dev
ID: K7WM:MPI7:FHJT:PDYF:BDRK:77MC:AJKA:DMTB:YH22:T6OO:7GQR:OXUH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No cpuset support
Insecure Registries:
 127.0.0.0/8
root@swarm00:/var/run/docker/netns# docker node ls
ID                           HOSTNAME     STATUS  AVAILABILITY  MANAGER STATUS
0p0khtm1r1o3qk6wn11ida254 *  swarm00.dev  Ready   Active        Leader
17xld3ibyq36eo1g76gdbfw0v    swarm02.dev  Ready   Active
6losgi6e2m62dam1ms5trn0tw    swarm04.dev  Ready   Active
7bxv7npaiikjt4q3vsle09kmq    swarm03.dev  Ready   Active
d8itn0shhggeys0dah0hubnjg    swarm05.dev  Ready   Active
eg59x4humv78j3mnqm9aaa64z    swarm01.dev  Ready   Active

This looks like it's trying to do round-robin but it can't find its way to the other nodes...

root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
{"guid":"f432266e-23b7-46b3-bc18-29f7bd2deaf3","container":"3f6e68a79782"}root@swarm00:/var/run/docker/netns#
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
curl: (7) Failed to connect to 172.17.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.17.0.1:30000/guid
{"guid":"7c64cb9f-88b1-4a1c-ae75-392bfb1593a2","container":"3f6e68a79782"}root@swarm00:/var/run/docker/netns#
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
{"guid":"55c10ab5-bbc7-491c-aaed-def520e8a2c2","container":"3f6e68a79782"}root@swarm00:/var/run/docker/netns#
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
curl: (7) Failed to connect to 172.18.0.1 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 172.18.0.1:30000/guid
{"guid":"1f06d5ee-b892-4771-abd8-4ba4d5808850","container":"3f6e68a79782"}root@swarm00:/var/run/docker/netns#
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
{"guid":"e3179120-7bc7-4d33-a69c-cdb225cbe24d","container":"3f6e68a79782"}root@swarm00:/var/run/docker/netns#
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
curl: (7) Failed to connect to 192.168.144.80 port 30000: No route to host
root@swarm00:/var/run/docker/netns# curl 192.168.144.80:30000/guid
{"guid":"413fad44-b749-4084-8d12-383846b46ad7","container":"3f6e68a79782"}root@swarm00:/var/run/docker/netns#

Rebooting all nodes didn't help...

root@swarm00:/var/run/docker/netns# docker service ps service1
ID                         NAME            IMAGE                              NODE         DESIRED STATE  CURRENT STATE            ERROR
6ghzcz968ii8xn6sh5sccxdnv  service1.1      alexellis2/guid-generator-arm:0.1  swarm03.dev  Running        Running 15 minutes ago
cpwn5oqdaii4kejakossruc05   \_ service1.1  alexellis2/guid-generator-arm:0.1  swarm03.dev  Shutdown       Complete 15 minutes ago
exz9hb8svag53la929ntbnsd5  service1.2      alexellis2/guid-generator-arm:0.1  swarm00.dev  Running        Running 15 minutes ago
146ik9at2wplzdjmbnilfbzqc   \_ service1.2  alexellis2/guid-generator-arm:0.1  swarm00.dev  Shutdown       Complete 16 minutes ago
dww7em3z7whkou2y2i7bnhzjl  service1.3      alexellis2/guid-generator-arm:0.1  swarm01.dev  Running        Running 15 minutes ago
ehvwqets4xuwwwyf04h16zxsa   \_ service1.3  alexellis2/guid-generator-arm:0.1  swarm01.dev  Shutdown       Complete 16 minutes ago
8m998xf3zp09xkdord6x7drz8  service1.4      alexellis2/guid-generator-arm:0.1  swarm03.dev  Running        Running 15 minutes ago
amexws0gndc0cy1ovsljvfyih   \_ service1.4  alexellis2/guid-generator-arm:0.1  swarm05.dev  Shutdown       Complete 15 minutes ago
ejp5ewej89udslvv5hpaqu7pv  service1.5      alexellis2/guid-generator-arm:0.1  swarm02.dev  Running        Running 15 minutes ago
e4y7xfhi5i9sr9geem8r9ctbu   \_ service1.5  alexellis2/guid-generator-arm:0.1  swarm04.dev  Shutdown       Complete 15 minutes ago
4rqkq8jocg5283hljqq81jxqr  service1.6      alexellis2/guid-generator-arm:0.1  swarm02.dev  Running        Running 15 minutes ago
7vlmvgo4kkd9doqrbya7pp9tc   \_ service1.6  alexellis2/guid-generator-arm:0.1  swarm02.dev  Shutdown       Complete 15 minutes ago

To me this looks like no network setup is being done...

root@swarm00:/var/run/docker/netns# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
fc3149d14599        bridge              bridge              local
d09f7618ef12        docker_gwbridge     bridge              local
433c7dd36190        host                host                local
9sg8v29i00yi        ingress             overlay             swarm
55e822266712        none                null                local
root@swarm00:/var/run/docker/netns# ls -l
total 0
-r--r--r-- 1 root root 0 Sep  8 09:32 09e75a3493ee
-r--r--r-- 1 root root 0 Sep  8 09:32 1-9sg8v29i00
-r--r--r-- 1 root root 0 Sep  8 09:32 11f554d1254e
root@swarm00:/var/run/docker/netns# nsenter --net=1-9sg8v29i00 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
root@swarm00:/var/run/docker/netns# nsenter --net=1-9sg8v29i00 iptables -nvL -t mangle
Chain PREROUTING (policy ACCEPT 72 packets, 7056 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain INPUT (policy ACCEPT 2 packets, 736 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 71 packets, 6688 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 71 packets, 6688 bytes)
 pkts bytes target     prot opt in     out     source               destination

Please let me know if I can provide more information.

As a question aside, why the difference between 9sg8v29i00yi and 1-9sg8v29i00?

ghost commented 8 years ago

I would like to add that I created the service with 6 replicas from the offing and the networking never worked. This isn't a case of scaling up or down after the service launch.

mrjana commented 8 years ago

@darkermatter You most probably have a different problem. Since you are on RPi and if you are on raspbian distro I would check if you have vxlan module in you kernel by doing lsmod. If you don't have it then that is likely your problem and if you are on raspbian distro you can get it by doing an rpi-update I believe.

bitsofinfo commented 8 years ago

How is this closed? the OP @mschirrmeister never confirmed it is fixed.

rogaha commented 8 years ago

@mrjana is this fixed by https://github.com/docker/docker/commit/99c39680984018b345881a29d77a89f87958a57b?

mrjana commented 8 years ago

I will reopen the issue. This issue has become a kitchen sink issue for various different issues. For e.g the issue reported by @asmialoski in this thread is definitely fixed by https://github.com/docker/docker/commit/99c39680984018b345881a29d77a89f87958a57b. So I mentioned this issue in the commit logs of my PR which automatically closed this issue. But the original issue as reported by @mschirrmeister is probably not resolved yet. We can keep it open until that is resolved.

@asmialoski If you want to you can use docker/docker master build to verify if your issue is resolved now.