moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.86k stars 18.67k forks source link

Unable to remove a stopped container: `device or resource busy` #22260

Closed pheuter closed 6 years ago

pheuter commented 8 years ago

Apologies if this is a duplicate issue, there seems to be several outstanding issues around a very similar error message but under different conditions. I initially added a comment on #21969 and was told to open a separate ticket, so here it is!


BUG REPORT INFORMATION

Output of docker version:

Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:34:23 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:34:23 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 51
Server Version: 1.11.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 81
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-74-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-10-1-49-110
ID: 5GAP:SPRQ:UZS2:L5FP:Y4EL:RR54:R43L:JSST:ZGKB:6PBH:RQPO:PMQ5
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on Ubuntu 14.04.3 LTS HVM in AWS on an m3.medium instance with an EBS root volume.

Steps to reproduce the issue:

  1. $ docker run --restart on-failure --log-driver syslog --log-opt syslog-address=udp://localhost:514 -d -p 80:80 -e SOME_APP_ENV_VAR myimage
  2. Container keeps shutting down and restarting due to a bug in the runtime and exiting with an error
  3. Manually running docker stop container
  4. Container is successfully stopped
  5. Trying to rm container then throws the error: Error response from daemon: Driver aufs failed to remove root filesystem 88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e: rename /var/lib/docker/aufs/diff/a48629f102d282572bb5df964eeec7951057b50f21df7abe162f8de386e76dc0 /var/lib/docker/aufs/diff/a48629f102d282572bb5df964eeec7951057b50f21df7abe162f8de386e76dc0-removing: device or resource busy
  6. Restart docker engine: $ sudo service docker restart
  7. $ docker ps -a shows that the container no longer exists.
LaurentDumont commented 7 years ago

Is there anything that can be done in the meantime? I'm running into the same issue with the docker-container module from Ansible. That is on Debian Jessie with Docker 17.05.0

lievendp commented 7 years ago

@cpuguy83 yet it's the only kernel "version" you're going to see in enterprise level linux like RHEL7. I guess they backport a lot of fixes in their kernel version. Is there anything specific in the newer kernels we're looking for to fix this issue? I could have a look if it's in the version I have?

From what I see here there are supported docker versions on RHEL7 so the kernel shouldn't be the problem then, no?

Anyhow, it appears that fixing one thing seems to be leading to another problem in a newer version, it makes me think that maybe docker (the way I use it which is quite basic) isn't that production-ready (stable) as it's still too young a technology? Won't keep me from using it considering the advantages it can bring.

The "leaking" of mounts in namespaces, is there any method to pinpoint this is the problem? ATM, it's just a description from my end leading to your conclusion but is it possible for me to actually test and see the leaking problem? Maybe seeing it may help solving or working around it?

ghost commented 7 years ago

Docker 17.03.1-ce on a new CentOS 7.3 install running kernel 3.10.0-514.21.1.el7.x86_64

Problem remains.

Commands:

docker stop 95aa09d90aaf
docker rm 95aa09d90aaf

Result in:

Error response from daemon: Driver overlay failed to remove root filesystem 95aa09d90aaf870301163e19bf9bb73eff055e7a2c3e3d22d09604fb41361608: remove /var/lib/docker/overlay/8479d2fd0c0e7ec06c17af0b00bb004baeb0c6fbe92ed1b858b741c9458bb499/merged: device or resource busy

Followed by:

Message from syslogd@ik1-331-25960 at Jun  5 10:24:47 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

Problem solved by:

systemctl restart docker

Here is my docker info:

Containers: 2
 Running: 0
 Paused: 0
 Stopped: 2
Images: 1
Server Version: 17.03.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.21.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 3
Total Memory: 1.954 GiB
Name: ik1-331-25960.vs.sakura.ne.jp
ID: LFE3:D55E:EXWU:JFGN:GKZ4:QLKI:3CX7:7YG4:U2OQ:LLSI:LNRE:D5UU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
cpuguy83 commented 7 years ago

@lievendp The particular kernel feature is being able to do a detached mount while the mount exists in another namespace. I believe RH is planning on including this in RHEL 7.4, btw.

Generally speaking the only time you would really encounter this issue is if you've mounted the /var/lib/docker, or one of it's parents into a container.

One potential work-around for this is to set MountFlags=slave in the docker systemd unit file. The reason this isn't in by default is it can cause problems for certain use-cases.

Vanuan commented 7 years ago

Yeah, I'm using CentOS 7.3 too. Happens once every 20-40 times.

Generally speaking the only time you would really encounter this issue is if you've mounted the /var/lib/docker, or one of it's parents into a container.

Looks similar to what I did. I'm using docker-compose inside ssh (jenkins slave) container with mounted sock file.

andrask commented 7 years ago

Generally speaking the only time you would really encounter this issue is if you've mounted the /var/lib/docker, or one of it's parents into a container.

In my tests I created and deleted containers, about 50/min. I mounted nothing and only started and removed the container running cat. The result was device busy errors on /*/shm paths and leaked meta and data storage. So the above statement may not cover all use cases.

danktec commented 7 years ago

had this issue for the first time. Not sure of root cause. Docker 17.03.1. Ubuntu 14.04.5 LTS. Restarted docker service to resolve.

This happened again on another system. Suspect it was due to a container being taken down in an un-clean way. Waited a while and eventually the 'Dead' container wet away on its own.

lievendp commented 7 years ago

@cpuguy83 Regarding the improvements in RHEL 7.4, I'm guessing we're talking about this: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/7.4_Release_Notes/technology_previews_virtualization.html Will it impact the way docker can be used?

User namespace This feature provides additional security to servers running Linux containers by providing better isolation between the host and the containers. Administrators of a container are no longer able to perform administrative operations on the host, which increases security. (BZ#1138782)

cpuguy83 commented 7 years ago

@lievendp No, I doubt it would be listed on the release notes.

ghost commented 7 years ago

I don't know anything about /var/lib/docker but I'm not using any composer. It happens as often as 1 in 5 times that I stop and remove a container during development.

gad0lin commented 7 years ago

I experience the same on Centos 7. It somehow started to appear recently. I guess it is affected by the load.

Containers: 29 Running: 26 Paused: 0 Stopped: 3 Images: 20 Server Version: 1.12.6 Storage Driver: devicemapper Pool Name: docker-202:3-390377-pool Pool Blocksize: 65.54 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: /dev/vg-docker/data Metadata file: /dev/vg-docker/metadata Data Space Used: 7.502 GB Data Space Total: 273.8 GB Data Space Available: 266.3 GB Metadata Space Used: 13.37 MB Metadata Space Total: 17.05 GB Metadata Space Available: 17.03 GB Thin Pool Minimum Free Space: 27.38 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Library Version: 1.02.135-RHEL7 (2016-11-16) Logging Driver: syslog Cgroup Driver: cgroupfs Plugins: Volume: local Network: null host bridge overlay Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: seccomp Kernel Version: 3.10.0-514.6.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 15.26 GiB Name: production-rh2 ID: 7MS3:2QDM:45GK:7COB:YY7C:WEIK:IQ3S:APQJ:ZAFQ:6JGJ:X6LT:7Q5B Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Insecure Registries: 127.0.0.0/8

Vanuan commented 7 years ago

@cpuguy83

The particular kernel feature is being able to do a detached mount while the mount exists in another namespace. I believe RH is planning on including this in RHEL 7.4, btw.

Do you happen to know which kernel this feature was mainlined in?

Is it related to device mapper or overlay? Or is it completely independent from the storage driver?

Vanuan commented 7 years ago

Will installing newer kernel from http://elrepo.org/tiki/tiki-index.php help? The newest one is 4.11: kernel-ml-4.11.6-1.el7.elrepo.x86_64.rpm

cpuguy83 commented 7 years ago

@Vanuan It would, though I'd stick with the LTS kernel, which would be 4.9. The version it was mainlined is 3.15 (IIRC). It's more of a bug fix than a feature. It lets you unmount resources that are mounted in other namespaces.

It's independent of storage driver.

itsafire commented 7 years ago

Is anyone with this sort of error using healthcheck in their docker-compose.yml ? I had no problems with deployment until I added healthcheck to the yml file. I'm on docker 17.03.1-ce

cpuguy83 commented 7 years ago

@itsafire That's interesting... I'll take a look at that code path.

itsafire commented 7 years ago

@cpuguy83 As soon as I got rid of the healthcheck definition in my compose file, the problem went away. I'll investigate later for re-production of that issue.

The production system is a dual server setup with current Debian jessie. Both behaved identical leaving dead container behind. The docker daemon restart resolved the issue.

lievendp commented 7 years ago

I have no health check in the docker-compose.

tsirolnik commented 7 years ago

Still have this issue using composer.

Starts with a mkdir file exists, then after restarting/killing/etc which didn't work I get this error message.

sebastiansterk commented 7 years ago

I'm having the same issue. After upgrading my Docker Mastodon Containers (Update instructions) I got for each container:

for web  driver "overlay" failed to remove root filesystem for xxxx: remove /var/lib/docker/overlay/xxxx/merged: device or resource busy
ERROR: Encountered errors while bringing up the project.

Docker version: 17.06.0-ce

Vanuan commented 7 years ago

@owhen Which kernel version?

xmj commented 7 years ago

Seeing the same issue on CentOS 7, using Docker-CE 17.06. kitchen/docker isn't happy.

Vanuan commented 7 years ago

@xmj As already mentioned, this is a kernel issue. Workaround is to use mainline: https://gist.github.com/pgporada/bee21b339b6ca750f1de

sebastiansterk commented 7 years ago

@Vanuan 3.10.0-514.2.2.el7.x86_64 Will this kernel issue fixed by centos? I'm not very happy with this workaround..

cpuguy83 commented 7 years ago

Should be fixed in the centos 7.4 kernel... but I haven't tested it.

cpuguy83 commented 7 years ago

Ok folks, I've found where these mounts are leaking. It's the systemd-udevd service (at least in my testing). If you restart systemd-udevd you should find that the container is removable without issue.

cpuguy83 commented 7 years ago

You can also see a related issue in rkt: https://github.com/rkt/rkt/issues/1922

sebastiansterk commented 7 years ago

systemctl restart systemd-udevd

rm -r -f 8d2bce5c13b882ea16ce012b639e646b31c95d722486bab056d0a39e974ad746
rm: cannot remove ‘8d2bce5c13b882ea16ce012b639e646b31c95d722486bab056d0a39e974ad746/merged’: Device or resource busy
cpuguy83 commented 7 years ago

Like I tried to hint, mileage may vary. The (almost?) root cause is that the usage of MountFlags in systemd units causes systemd to create a new mount namespace for the service that is being started. Depending on what is set in MountFlags will change how mounts are progated from the host (or back to the host).

In the case of systemd-udevd, it is using MountFlags=slave, which means that any changes to mounts on the host will propagate to the systemd-udevd mount ns (where the service is running).

What should be happening is when an unmount occurs, that should propagate to systemd-udevd's mount ns... however for some reason either this propagation isn't happening or something in the mount ns is keeping the mount active, preventing removal even if the mount appears to be gone in the host's mount ns.

I'm using systemd-udevd as an example here as I can reproduce it 100% of the time specifically with systemd-udevd, and can mitigate it either by stopping the service or disabling the MountFlags in the unit file for that service (and thus it will live in the host mount ns).

There could be a myriad of things causing the resource to remain busy on your system, including how other containers are running and what they are mounting in. For instance if you are mounting /var/lib/docker into a container on one of these old kernels it is likely going to cause the same problem.

ghost commented 7 years ago

Blaming this problem on the kernel of the most up-to-date release of a current operating system is rather disingenuous. Kernels are not written and tested to meet the specs of any particular app or service (or shouldn't be). Rather, it might be better to specify which operating systems, kernels, and configurations are necessary to have this working correctly. Such as the environment where docker was developed and tested without error.

Incidentally, I nearly solved this problem by inserting delays in my scripts between docker commands. Its ugly, but I haven't seen the problem in a while.

Vanuan commented 7 years ago

@wdch-nseeley To be fair, naming CentOS "current" is deceptive. It's designed to be old. Red Hat does not update the kernel version, but instead backports new features to the same kernel version. The last kernel version released with CentOS 7 is 3.10, released in 2013 and has end of life in October 2017.

It's natural that Red Hat messes up sometimes.

Docker is a product that relies on a particular set of kernel features. Not the other way around.

cpuguy83 commented 7 years ago

@wdch-nseeley No one is blaming the kernel, other than that we can't take advantage of newer kernel abilities (could be considered a bug fix) on centos/rhel until 7.4 comes out like we do on other distros with newer kernels. There's just a lot of moving parts to sort out, plus one can get this error from different places and for different reasons.

Now... The real issue here seems to be the usage of MountFlags (in the systemd unit config) on system services to run in their own mount namespace and eat up mounts from Docker. What is strange is that with MountFlags=slave, the changes to mounts on the root mount namespace (where systemd and docker are running by default) are supposed to propagate to the the service's mount namespace... it's getting the new mount, but it's not getting the unmount request. I can actually even nsenter into the service's mount namespace and manually unmount the affected mountpoint with no issue and then call docker rm (or whatever) and it removes successfully... this issue with the unmount not propagating feels like a kernel bug, but I need to do some more tests and see if it's actually even really fixed in newer kernels at all or if we're working around it with our usage of MNT_DETACH on unmount. I found that not using MountFlags at all seems to clear this issue up.

I really ran into this recently because while testing the new metrics plugins on rhel/centos I have found 100% of the time I would get a device or resource busy error on remove, even with absolutely nothing else running, and bare system services. The interesting bit about metrics plugins is it creates a unix socket at /run/docker/metrics.sock on the host which is then bind-mounted into the plugin container's rootfs whereas other plugins don't really get any special mounts like this. The solution for this was to mount --make-private /var/lib/docker/plugins which we merged yesterday. It's obviously not a perfect solution since we've been doing this for a long time for container filesystem mounts, and yet somehow these mounts are still leaked on occasion... but in any case it fixed the immediate issue for metrics plugins failing 100% of the time on remove on these old kernels.

Vanuan commented 7 years ago

So, the workaround is to remove MountFlags from udevd?

sudo vim /usr/lib/systemd/system/systemd-udevd.service # remove MountFlags
sudo systemctl daemon-reload
sudo systemctl restart systemd-udevd

docker.service doesn't have MountFlags.

cpuguy83 commented 7 years ago

@Vanuan This fixed the issue on my very minimal installation, it may not be the only thing.

sebastiansterk commented 7 years ago

@Vanuan This does not fixed the issue on my system.

archenroot commented 7 years ago

I am on Centos 7.3 with Docker 17.06.0 CE and faced this issue with GitLab container.

I tried following:

docker rm $(docker ps -a -q)
Error response from daemon: driver "overlay" failed to remove root filesystem for 3f42a98df9a22c37cf18db35eb353f0ff90e0430aec6d6419706e3dd90a91c2d: remove /opt/docker-data/overlay/92b81e6f8c4dbfbedc1f99d349c1b9c7209be7f9d8a3602a00a5bb30707da638/merged: device or resource busy

So no luck. So I did the stuff suggested by @cognifloyd 👍

grep docker /proc/*/mountinfo
nsenter -m -t ${PROC_ID} /bin/bash
mount 
unmount (use output from previous mount command)
exit

solved my issue...

MartinTerp commented 7 years ago

Hi

had the same issue with 1 of my containers, and i remembered that i have this in my Splunk container:

volumes:
 - /apps/splunk-data/etc:/opt/splunk/etc
 - /apps/splunk-data/var:/opt/splunk/var
 - /var/lib/docker/containers:/host/containers:ro
 - /var/run/docker.sock:/var/run/docker.sock:ro

As you can see i mount /var/lib/docker/containers, so i stopped the splunk container, "rm -f" the faulty container, with success, and started up the splunk one again

antoinetran commented 7 years ago

We have the same issue with docker 17.06-ce. Had to umount shm. But I don't know why this appear in the first place.

[trana@integ-storage-002 ~]$ sudo docker ps -a|grep ntp
111ff029e499        proto-pi-cm.ts-l2pf.cloud-omc.org:5000/thales/ntpserver:origin-feature-0.4   "/bin/sh -c /var/n..."    4 weeks ago         Dead                                                                                         integ_ntpclient_8

[trana@integ-storage-002 ~]$ sudo docker rm integ_ntpclient_8
ERROR: for integ_ntpclient_8  Error response from daemon: unable to remove filesystem for ...: remove /var/.../shm: device or resource busy

[trana@integ-storage-002 ~]$ mount |grep 111ff029e499
shm on /var/lib/docker/containers/111ff029e499984233a5c8c43f6c8e21471240be7721a35b8e15d8c6da7a757d/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
shm on /var/lib/docker/containers/111ff029e499984233a5c8c43f6c8e21471240be7721a35b8e15d8c6da7a757d/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)

# Twice because twice mount.
sudo umount /var/lib/docker/containers/111ff029e499984233a5c8c43f6c8e21471240be7721a35b8e15d8c6da7a757d/shm
sudo umount /var/lib/docker/containers/111ff029e499984233a5c8c43f6c8e21471240be7721a35b8e15d8c6da7a757d/shm

[trana@integ-storage-002 ~]$ docker rm integ_ntpclient_8
Vanuan commented 7 years ago

@antoinetran which kernel/os?

antoinetran commented 7 years ago
CentOS Linux release 7.3.1611 (Core) 
Name        : kernel
Arch        : x86_64
Version     : 3.10.0
Release     : 327.13.1.el7
opera443399 commented 7 years ago

in my case, workaround by trying what @cognifloyd mentioned above:

  1. info
    
    [root@test_node_02 ~]# docker info

Server Version: 17.06.0-ce Storage Driver: overlay Backing Filesystem: extfs Supports d_type: true Logging Driver: json-file Cgroup Driver: cgroupfs

Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: active

Kernel Version: 3.10.0-514.21.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 1 Total Memory: 1.797GiB

Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Experimental: false


2. problem:
````bash
Error response from daemon: driver "overlay" failed to remove root filesystem for xxx: remove /var/lib/docker/overlay/xxx/merged: device or resource busy
  1. workaround

  2. try to remove "dead" containers [root@test_node_02 ~]# docker rm -f $(docker ps -a --filter status=dead -q |head -n 1) Error response from daemon: driver "overlay" failed to remove root filesystem for 808acab2716420275cdb135ab964071cfc33406a34481354127635d3a282fa31: remove /var/lib/docker/overlay/88440438ea95b47e7459049fd765b51282afee4ad974107b0bf96d08d9c7763e/merged: device or resource busy

  3. find pid in /proc//mountinfo [root@test_node_02 ~]# grep -l --color docker ps -a --filter status=dead -q |head -n 1 /proc//mountinfo

  4. whois pid [root@test_node_02 ~]# ps -f 7360 UID PID PPID C STIME TTY STAT TIME CMD root 7360 7344 1 Aug16 ? Ssl 73:57 /usr/bin/cadvisor -logtostderr

  5. also, we can determine they are on different mount namespaces [root@test_node_02 ~]# ls -l /proc/$(cat /var/run/docker.pid)/ns/mnt /proc/7360/ns/mnt lrwxrwxrwx 1 root root 0 Aug 21 15:55 /proc/11460/ns/mnt -> mnt:[4026531840] lrwxrwxrwx 1 root root 0 Aug 21 15:55 /proc/7360/ns/mnt -> mnt:[4026532279] [root@test_node_02 ~]#

  6. try to restart cadvisor [root@test_node_01 ~]# docker service ls |grep cadvisor 5f001c9293cf cadvisor global 3/3 google/cadvisor:latest

[root@test_node_01 ~]# docker service update --force cadvisor [root@test_node_01 ~]#

  1. remove again [root@test_node_02 ~]# docker rm -f $(docker ps -a --filter status=dead -q |head -n 1) 808acab27164 [root@test_node_02 ~]#


conclusion: cadvisor or other containers which using volume contains '/var/lib/docker' or '/' will cause the problem. 
workaround: find the container/service, restart it.
how to fix it: unknown.
dobryakov commented 7 years ago

Some information could be useful. I have two hardware servers with same Centos 7 and almost same core versions, same storage drivers, but the bug still reproduce on one server and never reproduce on another:

# reproduce here:
Server Version: 17.06.0-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.26.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.1GiB
Docker Root Dir: /home/_docker
Debug Mode (client): false
Debug Mode (server): false
Username: 6626070
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

but

# does not reproduce here
Server Version: 17.03.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.10.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.13 GiB
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
archenroot commented 7 years ago

Just to add to my previous comment, I still have the issue and always enforced to reboot or umount the way I described... my docker instance info:

Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 4
Server Version: 17.06.0-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.26.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 19.61GiB
Name: lgh-dev01
ID: OI4M:BVZK:YGCD:M7DS:TD7X:WO3Q:WFHQ:UECY:N5A6:NHSX:4THI:HE5T
Docker Root Dir: /opt/docker-data
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 26
 Goroutines: 31
 System Time: 2017-08-21T17:29:34.436543815+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
archenroot commented 7 years ago

@dobryakov If I can see properly, the only difference in those instances is docker version?

dobryakov commented 7 years ago

@archenroot right, the main difference is in docker versions. There is a little difference in minor kernel version's numbers (10.2.el7 vs 26.2.el7), but I think it doesn't matter.

And this is the reason why I doubt about kernel issue...

cpuguy83 commented 7 years ago

@dobryakov if you are using docker rm -f you would not see the error on 17.03, but the error would still occur.

dobryakov commented 7 years ago

I tried to reinstall oldest version of docker (17.03), but it was failed on my servers by dependency problems :( Does anybody know, would manual kernel update fix this issue?

archenroot commented 7 years ago

@dobryakov read the message from @cpuguy83 - he suggested that the error actually appear on 17.03 as well, but is not reported...

@cpuguy83 - are you saying that the error cannot be seen via dmesg or /var/log/messages neither? or just not propagated via docker daemon on service level? no sure here... if some error occurs, but is not visible than it is even worse :dagger:

dobryakov commented 7 years ago

@cpuguy83 how could I reproduce this issue on 17.03? Please help me to find and see an error. My containers on 17.03 are stopping, exiting, restarting, and removing (even without -f) without any problem.

Vanuan commented 7 years ago

@dobryakov I think the issue would manifest in container not being deleted. So to reproduce it you'd have to run docker ps immediately after docker rm -f and grep the container you've tried to delete.

Well, it would be good to have a flag "ignore shm error". Until then we can grep the error and ignore it manually to mimic 17.03 behavior.