moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.8k stars 18.67k forks source link

ingress-sbox containers with the same ip addresses blocking ingress traffic #36949

Open bloo opened 6 years ago

bloo commented 6 years ago

Over time, ingress requests on our Swarm cluster start timing out when one host node tries to route traffic to a container on another host node. We've found that the ingress-sbox container on the ingress network on those 2 hosts have the same private ip address.

Steps to reproduce the issue:

  1. Run a swarm with multiple managers on a self-updating, self-rebooting OS (Container Linux)
  2. Wait
  3. Observe intermidden timeouts

Describe the results you received:

If the container that's suppose to handle ingress traffic is in global mode, for example, and constrained to only the manager nodes (ie there are 3 containers spread across 3 host nodes), 1 out of 3 ingress requests to the external address of one of the manager host nodes times out.

Describe the results you expected:

Perfect routing.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:   17.12.1-ce
 API version:   1.35
 Go version:    go1.9.4
 Git commit:    7390fc6
 Built: Tue Feb 27 22:10:31 2018
 OS/Arch:   linux/amd64

Server:
 Engine:
  Version:  17.12.1-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   7390fc6
  Built:    Tue Feb 27 22:10:31 2018
  OS/Arch:  linux/amd64
  Experimental: true

Output of docker info:

nodeA

core@ip-10-255-2-125 ~ $ docker info
Containers: 14
 Running: 7
 Paused: 0
 Stopped: 7
Images: 11
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: z9yytapt4r8tbu48epze2z22r
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.2.125
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-2-125
ID: GVNR:74L4:JMGJ:UNPB:RB55:7OTB:HSGS:G3PR:YHEU:QC3T:2PSR:6O74
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.type=m4.large
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeB

core@ip-10-255-1-242 ~ $ docker info
Containers: 8
 Running: 8
 Paused: 0
 Stopped: 0
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: 7vlbs4n5s3tm3b0qvld2t3exr
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.1.242
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-1-242
ID: KAQS:KOWT:IOII:GUTQ:BLU7:SNLK:4VLH:JRM2:PMGG:RZZM:R6YV:AS6P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

nodeC

core@ip-10-255-3-162 ~ $ docker info
Containers: 9
 Running: 7
 Paused: 0
 Stopped: 2
Images: 8
Server Version: 17.12.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: yrjmuu83zhqc1b95kf3s2fx8s
 Is Manager: true
 ClusterID: 9ydjpwkzcqadjachlc42w5yz0
 Managers: 3
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 12 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.255.3.162
 Manager Addresses:
  10.255.1.242:2377
  10.255.2.125:2377
  10.255.3.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.32-coreos
Operating System: Container Linux by CoreOS 1688.5.3 (Rhyolite)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.791GiB
Name: ip-10-255-3-162
ID: JZX3:DCZT:S7W6:E43Y:4MRZ:NOTU:Y3XB:ZX7C:EZ3J:OYM7:WZIU:GCX6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
 instance.region=us-east-1
 instance.role=manager
 instance.role.type=manager
 instance.type=m4.large
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS across 3 AZs using CoreOS Container Linux AMIs and identical Launch Configurations.

This is a duplicate and simplified explanation of https://github.com/moby/moby/issues/36871.

thaJeztah commented 6 years ago

ping @ctelfer could you have a look if this is one of the things fixed in 18.03.x?

bloo commented 6 years ago

@thaJeztah @ctelfer any luck? Our clusters have since upgraded to 18.03.1-ce and it would be nice to close out our internal issue. Thanks!

ctelfer commented 6 years ago

I haven't seen a particular signature of duplicate IP addresses on the ingress networks. However, there were definitely general duplicate IP address issues fixed in the 18.03 CE release. See https://github.com/docker/libnetwork/pull/2105 in particular.