moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.65k stars 18.65k forks source link

Docker not releasing IP for stopped/exited containers in overlay network #30986

Open bgrissin opened 7 years ago

bgrissin commented 7 years ago

using a default overlay driver configuration, when stopping containers that are running on a specific overlay, when the container stops, the IP is not released back to the available pool. This is causing issues when we begin to reach the number of available IPs for this CIDR (256) prevents us from deploying new services

image

Steps to reproduce the issue:

  1. Create overlay network using -d overlay
  2. Docker run
  3. Docker stop

Describe the results you received:

1) ~ $ docker network create -d overlay backend

[root@ip-10-12-3-23 log]# docker network inspect backend [ { "Name": "backend", "Id": "536t207lrcexq4wreemqmwx0k", "Scope": "swarm", "Driver": "overlay", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "10.0.1.0/24", "Gateway": "10.0.1.1" } ] }, "Internal": false, "Attachable": false, "Containers": { "309aebcf4ec64d0d2fe7339606b5c251ad1d812c7b1f461f72be344395a5d635": { "Name": "testapi.7.4w84ahgtaoq97284lvsyuwtvx", "EndpointID": "a9517bdd16b882494bc15402a287a43297c3d0457eb3a6c91d2372310fbbb273", "MacAddress": "02:42:0a:00:01:ef", "IPv4Address": "10.0.1.239/24", "IPv6Address": "" }, "b98ca92d61e8183e4556bd378e7ebd7f2240f090d460ca227b4e5152e2cac35f": { "Name": "testapi.5.8r7z1d5mfrembp3d2d583egq9", "EndpointID": "9dd3e6f6a35f8ed022e38833409b5f1c6370acb809bcb63a675856e87340be2d", "MacAddress": "02:42:0a:00:01:ee", "IPv4Address": "10.0.1.238/24", "IPv6Address": "" } }, "Options": { "com.docker.network.driver.overlay.vxlanid_list": "257" }, "Labels": { "com.docker.ucp.access.owner": "jdoe" } } ]

2)

docker run --network backend ===> creates af9e

3) then

~ $ Docker stop af9e

[ec2-user@ip-1.1.1.1 ~]$ docker inspect af9e [ { "Id": "af9e76d8a095fa3c6f12ed1ab248ca2e88b7fe50141a6756013cc13c6694112c", "Created": "2017-02-01T22:56:21.818626388Z", "Path": "./start.sh", "Args": [], "State": { "Status": "exited", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "2017-02-01T22:56:22.701044968Z", "FinishedAt": "2017-02-02T02:34:01.964797741Z" }, "Image": "sha256:f5e33b83bc19875a3640bc57c2fc7c5a9c6e03882efb05d3b8e796af0800d1f1", "ResolvConfPath": "/var/lib/docker/containers/af9e76d8a095fa3c6f12ed1ab248ca2e88b7fe50141a6756013cc13c6694112c/resolv.conf", "HostnamePath": "/var/lib/docker/containers/af9e76d8a095fa3c6f12ed1ab248ca2e88b7fe50141a6756013cc13c6694112c/hostname", "HostsPath": "/var/lib/docker/containers/af9e76d8a095fa3c6f12ed1ab248ca2e88b7fe50141a6756013cc13c6694112c/hosts", "LogPath": "/var/lib/docker/containers/af9e76d8a095fa3c6f12ed1ab248ca2e88b7fe50141a6756013cc13c6694112c/af9e76d8a095fa3c6f12ed1ab248ca2e88b7fe50141a6756013cc13c6694112c-json.log", "Name": "/authprocessing.1.aqdkgmhjsvsx1ygj0s1u01n7u", "RestartCount": 0, "Driver": "devicemapper", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "", "ExecIDs": null, "HostConfig": { "Binds": null, "ContainerIDFile": "", "LogConfig": { "Type": "json-file", "Config": {} }, "NetworkMode": "default", "PortBindings": null, "RestartPolicy": { "Name": "", "MaximumRetryCount": 0 }, "AutoRemove": false, "VolumeDriver": "", "VolumesFrom": null, "CapAdd": null, "CapDrop": null, "Dns": [], "DnsOptions": [], "DnsSearch": [], "ExtraHosts": null, "GroupAdd": null, "IpcMode": "", "Cgroup": "", "Links": null, "OomScoreAdj": 0, "PidMode": "", "Privileged": false, "PublishAllPorts": false, "ReadonlyRootfs": false, "SecurityOpt": null, "UTSMode": "", "UsernsMode": "", "ShmSize": 67108864, "Runtime": "runc", "ConsoleSize": [ 0, 0 ], "Isolation": "", "CpuShares": 0, "Memory": 0, "CgroupParent": "", "BlkioWeight": 0, "BlkioWeightDevice": null, "BlkioDeviceReadBps": null, "BlkioDeviceWriteBps": null, "BlkioDeviceReadIOps": null, "BlkioDeviceWriteIOps": null, "CpuPeriod": 0, "CpuQuota": 0, "CpusetCpus": "", "CpusetMems": "", "Devices": null, "DiskQuota": 0, "KernelMemory": 0, "MemoryReservation": 0, "MemorySwap": 0, "MemorySwappiness": -1, "OomKillDisable": false, "PidsLimit": 0, "Ulimits": null, "CpuCount": 0, "CpuPercent": 0, "IOMaximumIOps": 0, "IOMaximumBandwidth": 0 }, "GraphDriver": { "Name": "devicemapper", "Data": { "DeviceId": "781", "DeviceName": "docker-202:2-897581196-03ed443282cf52134d13c6a2a9aa270fa868f4ab4ec64e8bb03d1aecf2447131", "DeviceSize": "10737418240" } }, "Mounts": [ { "Name": "c6666fe9c05cb72a127481218ad2a10dd04cfbca09cf1e70c6639de2c0648e48", "Source": "/var/lib/docker/volumes/c6666fe9c05cb72a127481218ad2a10dd04cfbca09cf1e70c6639de2c0648e48/_data", "Destination": "/tmp", "Driver": "local", "Mode": "", "RW": true, "Propagation": "" } ], "Config": { "Hostname": "af9e76d8a095", "Domainname": "", "User": "appuser", "AttachStdin": false, "AttachStdout": false, "AttachStderr": false, "ExposedPorts": { "8080/tcp": {} }, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "spring.profiles.active=dev", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "LANG=C.UTF-8", "JAVA_VERSION=8", "JAVA_UPDATE=121", "JAVA_BUILD=13", "JAVA_PATH=e9e7ea248e2c4826b92b3f075a80e441", "JAVAHOME=/usr/lib/jvm/default-jvm" ], "Cmd": [ "./start.sh" ], "Image": "xxx.xxx.com:8080/authprocessing:latest", "Volumes": { "/tmp": {} }, "WorkingDir": "", "Entrypoint": null, "OnBuild": null, "Labels": { "com.docker.swarm.node.id": "18t955l87te8ntf3rrefcuv0c", "com.docker.swarm.service.id": "2zf8tn5i8nu8bplmrtjfuihpp", "com.docker.swarm.service.name": "authprocessing", "com.docker.swarm.task": "", "com.docker.swarm.task.id": "aqdkgmhjsvsx1ygj0s1u01n7u", "com.docker.swarm.task.name": "authprocessing.1" } }, "NetworkSettings": { "Bridge": "", "SandboxID": "b03beab0861404ce1c1962d14262b8b68d36ea368fbbe358637feef2337811da", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": null, "SandboxKey": "/var/run/docker/netns/b03beab08614", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", **"Networks": { "backend": { "IPAMConfig": { "IPv4Address": "10.0.1.13" }, "Links": null, "Aliases": [ "af9e76d8a095" ], "NetworkID": "536t207lrcexq4wreemqmwx0k", "EndpointID": "", "Gateway": "", "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "" },_** "ingress": { "IPAMConfig": { "IPv4Address": "10.255.0.22" }, "Links": null, "Aliases": [ "af9e76d8a095" ], "NetworkID": "7xy77e846qybtxok93z1lbntj", "EndpointID": "", "Gateway": "", "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "" }, "proxy": { "IPAMConfig": { "IPv4Address": "10.0.0.15" }, "Links": null, "Aliases": [ "af9e76d8a095" ], "NetworkID": "8q2heepmq3nfvr8wamqwu0fsk", "EndpointID": "", "Gateway": "", "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "MacAddress": "" } } } } ]

Describe the results you expected:

I would expect that the IP gets released, in this case it does not

~ $ docker version 1.12.3

Additional environment details (AWS, VirtualBox, physical, etc.): AWS env using stock rhel base 7.2 images

bgrissin commented 7 years ago

[ec2-user@ip-10-12-3-81 ~]$ docker info

Containers: 17 Running: 15 Paused: 0 Stopped: 2 Images: 134 Server Version: 1.12.6-cs7 Storage Driver: devicemapper Pool Name: docker-202:2-897581196-pool Pool Blocksize: 65.54 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: /dev/loop0 Metadata file: /dev/loop1 Data Space Used: 37.21 GB Data Space Total: 107.4 GB Data Space Available: 70.16 GB Metadata Space Used: 51.17 MB Metadata Space Total: 2.147 GB Metadata Space Available: 2.096 GB Thin Pool Minimum Free Space: 10.74 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Data loop file: /var/lib/docker/devicemapper/devicemapper/data WARNING: Usage of loopback devices is strongly discouraged for production use. Use --storage-opt dm.thinpooldev to specify a custom block storage device. Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Library Version: 1.02.135-RHEL7 (2016-09-28) Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: null host bridge overlay Swarm: active NodeID: 18t955l87te8ntf3rrefcuv0c Is Manager: false Node Address: 10.12.3.81 Runtimes: runc Default Runtime: runc Security Options: seccomp Kernel Version: 3.10.0-514.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.3 (Maipo) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 15.26 GiB Name: ip-10-12-3-81.corp.gxicloud.com ID: GFQE:TR7I:7V2W:LU2L:KYAI:3WJV:UFY3:V2GG:CQ6Q:C3S6:YQML:DHR7 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Insecure Registries: 127.0.0.0/8

[ec2-user@ip-10-12-3-81 ~]$

cruzzan commented 5 years ago

Any updates or news on this issue? I am currently investigating an issue that seems very similar to this. (And a few other open issues #37338 for example)

We have nowhere near the number of services running in our swarm to take up all ~250 ip's in the pool for the user-created overlay network (10.0.0.0/24). However we are still seeing newly deployed stacks with services that can't get past the New state. And in the journalctl log on the manager we get errors like this failed to allocate network IP for task sj673d4avl2s40hq8s8s5208w network p353610zri45zo12dnluxhvo8: could not find an available IP.

niazhussain commented 5 years ago

Facing similar issue of allocating VIP: Docker Version: Client: Docker Engine - Community Version: 19.03.1 API version: 1.39 (downgraded from 1.40) Go version: go1.12.5 Git commit: 74b1e89 Built: Thu Jul 25 21:21:05 2019 OS/Arch: linux/amd64 Experimental: false

Server: Docker Engine - Community Engine: Version: 18.09.0 API version: 1.39 (minimum version 1.12) Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:16:44 2018 OS/Arch: linux/amd64 Experimental: false

D0wn3r commented 3 years ago

Hello, When I do this scenario: Create a stack1 Remove stack1 Create stack1 Remove stack1 ....

At a moment swarm don't allocate me IP for services and services are stuck in "New" status Is this bug similar as yours ?

EDIT: In this scenario, I have an external network and it seem to be it that's making the problem

Warriorgiroro commented 2 years ago

Any update on this?

D0wn3r commented 2 years ago

You have this problem too ?

cruzzan commented 2 years ago

I thought i might update with my situation since this thread is still alive. I have to date not found a solution to the problem that i faced back when i wrote my comment, nor really figured out why it was happening.

However, we mitigated the problem by changing strategy for networking in our swarm. Creating much smaller networks for each service, and then peering those networks with the ones that needed a connection. We had a hunch that perhaps the issue was related to quite high turnover in containers. (We had quite a lot of redeployments running and so on). That has seemed to work for us. I hope that this might help someone else as well.