Ingress overlay network not resolving requests from host node to task on another node (always same combination of nodes, everything else works)

bloo commented 6 years ago

Description

There's some Docker networking resolution failure that happens with a very specific combination of circumstances. We have Swarm clusters that, over time (not sure yet - whether it's a CoreOS upgrade, random reboot, moon phase?) lose the ability to route requests from one node (an EC2 host) to a container on another specific node. When there are multiple containers running across multiple hosts, the requests from the same specific combination always fails, while all other combinations work.

Steps to reproduce the issue:

have 3 swarm nodes, nodeA, nodeB, nodeC
create Docker service hello
- replicas=1
- publish port 3333 (bringing Swarm ingress network to node that runs container)
ssh into each Swarm node EC2 instance
- verify DOCKER-INGRESS rule exists for 3333 using sudo iptables -L
- curl localhost:3333
docker service update --force hello to move the task/container around between nodes

Describe the results you received:

If the only hello task is running on nodeA or nodeC
- curl localhost:3333 work
If the only hello task is running on nodeB
- curl localhost:3333 from nodeA timeout 100% of the time
- curl localhost:3333 from nodeB or nodeC work
if we bump replicas=2
- and hello is running on nodeA and nodeC
  - everything works everywhere
- and hello is running on nodeA|C and nodeB
  - curl localhost:3333 from nodeA works 50% of the time
  - curl localhost:3333 from nodeB or nodeC work
rebooting nodeA doesn't help
terminating nodeA and having our AWS ASG recreate it fixes the issue (but we won't do that just yet)

Describe the results you expected:

Requests to nodeAs listening port 3333, when routing to that service's container/task on nodeB should always work.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

All 3 nodes:

Client:
 Version:   17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:10:31 2018
 OS/Arch:   linux/amd64

Server:
 Engine:
  Version:  17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:    Tue Feb 27 22:10:31 2018
  OS/Arch:  linux/amd64
  Experimental: true

Output of docker info:

nodeA

core@ip-10-255-2-125 ~ $ docker info
Containers: 14
 Running: 7
 Paused: 0
 Stopped: 7
Images: 11
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: z9yytapt4r8tbu48epze2z22r
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.2.125
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-2-125
ID: GVNR:74L4:JMGJ:UNPB:RB55:7OTB:HSGS:G3PR:YHEU:QC3T:2PSR:6O74
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.type=m4.large
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeB

core@ip-10-255-1-242 ~ $ docker info
Containers: 8
 Running: 8
 Paused: 0
 Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 7vlbs4n5s3tm3b0qvld2t3exr
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.1.242
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-1-242
ID: KAQS:KOWT:IOII:GUTQ:BLU7:SNLK:4VLH:JRM2:PMGG:RZZM:R6YV:AS6P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeC

core@ip-10-255-3-162 ~ $ docker info
Containers: 9
 Running: 7
 Paused: 0
 Stopped: 2
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: yrjmuu83zhqc1b95kf3s2fx8s
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.3.162
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-3-162
ID: JZX3:DCZT:S7W6:E43Y:4MRZ:NOTU:Y3XB:ZX7C:EZ3J:OYM7:WZIU:GCX6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS across 3 AZs using CoreOS Container Linux AMIs and identical Launch Configurations.

At Swarm initialization, we're re-creating the ingress network to work around an already used subnet on our network:

  yes | docker network rm ingress
  docker network create --driver overlay --ingress \
    --subnet 10.100.0.0/16 \
    --gateway 10.100.0.1 \
    ingress-hello

bloo commented 6 years ago

I'm going to keep my cluster as-is (partially broken) to assist in triage and inspection. Thanks!

bloo commented 6 years ago

Running containers on each node (with network diag utilities) on the same overlay network as the hello service, then exec'ing into each of those containers, does not give me any issue. The problem exists somewhere between dockerd on nodeA and the ingress overlay network I'm suspecting.

bloo commented 6 years ago

At Swarm initialization, we're re-creating the ingress network to work around an already used subnet on our network:

  yes | docker network rm ingress
  docker network create --driver overlay --ingress \
    --subnet 10.100.0.0/16 \
    --gateway 10.100.0.1 \
    ingress-hello

When running docker inspect ingress-hello on each node, our ingress-hello-sbox containers show these settings - which don't look right:

nodeA

            "ingress-hello-sbox": {
                "Name": "ingress-hello-endpoint",
                "EndpointID": "fce5334d6dcedbb50a66d99b70595b5c93ca4cb81674703214ba1fb365051dd3",
                "MacAddress": "02:42:0a:64:00:0b",
                "IPv4Address": "10.100.0.11/16",
                "IPv6Address": ""
            }

nodeB

            "ingress-hello-sbox": {
                "Name": "ingress-hello-endpoint",
                "EndpointID": "2fdae6cb1934f9a4a9eb130f9b13bff5501668d625bb201646cf2abb4b20063d",
                "MacAddress": "02:42:0a:64:00:02",
                "IPv4Address": "10.100.0.2/16",
                "IPv6Address": ""
            }

nodeC

            "ingress-hello-sbox": {
                "Name": "ingress-hello-endpoint",
                "EndpointID": "f3fdfb3bc19c1257d3549cbdb262953357003cfda0af7372a6ce722cf75ff44a",
                "MacAddress": "02:42:0a:64:00:0b",
                "IPv4Address": "10.100.0.11/16",
                "IPv6Address": ""
            }

Similarly, when we docker inspect docker_gwbridge on each node:

nodeA

            "ingress-hello-sbox": {
                "Name": "gateway_ingress-hell",
                "EndpointID": "8e58e338c0a04840ad2607c57f73b6b76a529a5c11da03efe6c1392e24f92d02",
                "MacAddress": "02:42:ac:12:00:04",
                "IPv4Address": "172.18.0.4/16",
                "IPv6Address": ""
            }

nodeB

            "ingress-hello-sbox": {
                "Name": "gateway_ingress-hell",
                "EndpointID": "23f2377320af58680fb29e4b73789d4e6add7068fba70b135a5056acf8296c6f",
                "MacAddress": "02:42:ac:12:00:04",
                "IPv4Address": "172.18.0.4/16",
                "IPv6Address": ""
            }

.. and this is the container entry of our hello service task on docker_gwbridge:

            "cc073a84ce16d81645ff3255b19ee0d1cff09bc7ced92c596e53467f4e77c732": {
                "Name": "gateway_cc073a84ce16",
                "EndpointID": "af4730ae76f7c54b5c59bb33ae7672b84b7b8b2a95881007f28ef7554a17e1ae",
                "MacAddress": "02:42:ac:12:00:07",
                "IPv4Address": "172.18.0.7/16",
                "IPv6Address": ""
            },

nodeC

            "ingress-hello-sbox": {
                "Name": "gateway_ingress-hell",
                "EndpointID": "a1c937fb8c2e0ce83d90fec9d3c1a4f6ea49e3769b8f6f59d44cb52c691c37ff",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": ""
            }

scotdalton commented 6 years ago

any word on what could be causing these issues?

moby / moby

Ingress overlay network not resolving requests from host node to task on another node (always same combination of nodes, everything else works) #36871