moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.85k stars 18.67k forks source link

Unable to remove a stopped container: `device or resource busy` #22260

Closed pheuter closed 6 years ago

pheuter commented 8 years ago

Apologies if this is a duplicate issue, there seems to be several outstanding issues around a very similar error message but under different conditions. I initially added a comment on #21969 and was told to open a separate ticket, so here it is!


BUG REPORT INFORMATION

Output of docker version:

Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:34:23 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:34:23 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 51
Server Version: 1.11.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 81
 Dirperm1 Supported: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.13.0-74-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-10-1-49-110
ID: 5GAP:SPRQ:UZS2:L5FP:Y4EL:RR54:R43L:JSST:ZGKB:6PBH:RQPO:PMQ5
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Running on Ubuntu 14.04.3 LTS HVM in AWS on an m3.medium instance with an EBS root volume.

Steps to reproduce the issue:

  1. $ docker run --restart on-failure --log-driver syslog --log-opt syslog-address=udp://localhost:514 -d -p 80:80 -e SOME_APP_ENV_VAR myimage
  2. Container keeps shutting down and restarting due to a bug in the runtime and exiting with an error
  3. Manually running docker stop container
  4. Container is successfully stopped
  5. Trying to rm container then throws the error: Error response from daemon: Driver aufs failed to remove root filesystem 88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e: rename /var/lib/docker/aufs/diff/a48629f102d282572bb5df964eeec7951057b50f21df7abe162f8de386e76dc0 /var/lib/docker/aufs/diff/a48629f102d282572bb5df964eeec7951057b50f21df7abe162f8de386e76dc0-removing: device or resource busy
  6. Restart docker engine: $ sudo service docker restart
  7. $ docker ps -a shows that the container no longer exists.
dominikschulz commented 8 years ago

Same here. Exact same OS, also running on AWS (different instance types) with aufs.

After stopping the container retrying docker rm several times and/or waiting a few seconds usually leads to "container not found" eventually. Issues exists in our stack at least since Docker 1.10.

allencloud commented 8 years ago

suffered from this issue for quite long time.

danielfoss commented 8 years ago

Receiving this as well with Docker 1.10. I would very occasionally get something similar with 1.8 and 1.9 but it would clear up on it's own after a short time. With 1.10 it seems to be permanent until I can restart the service or VM. I saw that it may be fixed in 1.11 and am anxiously awaiting the official update so I can find out.

cpuguy83 commented 8 years ago

"Device or resource busy" is a generic error message. Please read your error messages and make sure it's exactly the error message above (ie, rename /var/lib/docker/aufs/diff/...

"Me too!" comments do not help.

@danielfoss There are many fixes in 1.11.0 that would resolve some device or resource busy issues on multiple storage drivers when trying to remove the container. 1.11.1 fixes only a specific case (mounting /var/run into a container).

cezarsa commented 8 years ago

I'm also seeing this problem on some machines and by taking a look at the code I think the original error is being obscured in here: https://github.com/docker/docker/blob/master/daemon/graphdriver/aufs/aufs.go#L275-L278

My guess is that the Rename error is happening due to an unsuccessful call to unmount. However, as the error message in unmount is logged using Debugf we won't see it unless the daemon is started in debug mode. I'll see if I can spin some servers with debug mode enabled and catch this error.

genezys commented 8 years ago

I tried to set my docker daemon in debug mode and got the following logs when reproducing the error:

Aug 23 10:49:58 vincent dockerd[14083]: time="2016-08-23T10:49:58.191330085+02:00" level=debug msg="Calling DELETE /v1.21/containers/fa781466a8117d690077d85cc06af025da1c9c9b13302b1efed65c21788d5a75?link=False&force=False&v=False"
Aug 23 10:49:58 vincent dockerd[14083]: time="2016-08-23T10:49:58.191478608+02:00" level=error msg="Error removing mounted layer fa781466a8117d690077d85cc06af025da1c9c9b13302b1efed65c21788d5a75: rename /var/lib/docker/aufs/mnt/007c204b5aa1708f628d9518bb83d51176446e0c3743587f72b9f6cde3b9ce24 /var/lib/docker/aufs/mnt/007c204b5aa1708f628d9518bb83d51176446e0c3743587f72b9f6cde3b9ce24-removing: device or resource busy"
Aug 23 10:49:58 vincent dockerd[14083]: time="2016-08-23T10:49:58.191519719+02:00" level=error msg="Handler for DELETE /v1.21/containers/fa781466a8117d690077d85cc06af025da1c9c9b13302b1efed65c21788d5a75 returned error: Driver aufs failed to remove root filesystem fa781466a8117d690077d85cc06af025da1c9c9b13302b1efed65c21788d5a75: rename /var/lib/docker/aufs/mnt/007c204b5aa1708f628d9518bb83d51176446e0c3743587f72b9f6cde3b9ce24 /var/lib/docker/aufs/mnt/007c204b5aa1708f628d9518bb83d51176446e0c3743587f72b9f6cde3b9ce24-removing: device or resource busy"

I could find the message Error removing mounted layer in https://github.com/docker/docker/blob/f6ff9acc63a0e8203a36e2e357059089923c2a49/layer/layer_store.go#L527 but I do not know Docker enough to tell if it is really related.

Version info:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:02:53 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:02:53 2016
 OS/Arch:      linux/amd64
simkim commented 8 years ago

I had the same problem using docker-compose rm

Driver aufs failed to remove root filesystem 88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e

What I did to fix the problem without restarting docker :

cat /sys/fs/cgroup/devices/docker/88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e/tasks

It give you the pid of the processes which run in devices subsystem (what is mounted and busy) located in the hierarchy in /docker/:containerid:

I succeeded to kill them : kill $(cat /sys/fs/cgroup/devices/docker/88189a16be60761a2c04a455206650048e784d750533ce2858bcabe2f528c92e/tasks)

After their death, the container was gone (successfully removed)

Version

Client: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: Thu Aug 18 05:02:53 2016 OS/Arch: linux/amd64

Server: Version: 1.12.1 API version: 1.24 Go version: go1.6.3 Git commit: 23cf638 Built: Thu Aug 18 05:02:53 2016 OS/Arch: linux/amd64

genezys commented 8 years ago

There seems to be 2 different problems here as I am unable to fix my issue using @simkim's solution.

# docker rm b1ed3bf7dd6e
Error response from daemon: Driver aufs failed to remove root filesystem b1ed3bf7dd6e5d0298088682516ec8796d93227e4b21b769b36e720a4cfcb353: rename /var/lib/docker/aufs/mnt/acf9b10e85b8ad53e05849d641a32e646739d4cfa49c1752ba93468dee03b0cf /var/lib/docker/aufs/mnt/acf9b10e85b8ad53e05849d641a32e646739d4cfa49c1752ba93468dee03b0cf-removing: device or resource busy
# ls /sys/fs/cgroup/devices/docker/b1ed3bf7dd6e5d0298088682516ec8796d93227e4b21b769b36e720a4cfcb353
ls: cannot access /sys/fs/cgroup/devices/docker/b1ed3bf7dd6e5d0298088682516ec8796d93227e4b21b769b36e720a4cfcb353: No such file or directory
# mount | grep acf9b10e85b8ad53e05849d641a32e646739d4cfa49c1752ba93468dee03b0cf

In my case, the cgroup associated with my container seems to be correctly deleted. The filesystem is also unmounted.

The only solution for me is still to restart the Docker daemon.

simkim commented 8 years ago

today same problem than @genezys

cpuguy83 commented 8 years ago

This appears to have gotten worse in 1.12... I have (some) idea of what may have caused this, but not quite sure of the solution (short of a revert). One thing I have noticed is in kernel 3.16 and higher, we do not get the busy error from the kernel anymore.

simkim commented 8 years ago

Yes I upgraded to 1.12 yesterday from 1.11 and now I got this problem two times in 2 days, never had it before on this host

simkim commented 8 years ago

@genezys and myself are on debian 8, 3.16.7-ckt25-2+deb8u3

simkim commented 8 years ago

When @genezys and I run "docker-compose stop && docker-compose rm -f --all && docker-compose up -d", since docker 1.12 :

I tried to run all cron task during the day in case something was done during the night but it don't trigger the bug.

simkim commented 8 years ago

Same information with more details, we can provide more information as requested as it append every morning.

Stop and remove

Stopping tasappomatic_worker_1 ... done
Stopping tasappomatic_app_1 ... done
Stopping tasappomatic_redis_1 ... done
Stopping tasappomatic_db_1 ... done
WARNING: --all flag is obsolete. This is now the default behavior of `docker-compose rm`
Going to remove tasappomatic_worker_1, tasappomatic_app_1, tasappomatic_redis_1, tasappomatic_db_1
Removing tasappomatic_worker_1 ... error
Removing tasappomatic_app_1 ... error
Removing tasappomatic_redis_1 ... error
Removing tasappomatic_db_1 ... error

ERROR: for tasappomatic_app_1  Driver aufs failed to remove root filesystem a1aa9d42e425c16718def9e654dc700ff275d180434e32156230f4d1900cc417: rename /var/lib/docker/aufs/mnt/c243cc7329891de9584159b6ba8717850489b4010dfcc8b782c3c09b9f26f665 /var/lib/docker/aufs/mnt/c243cc7329891de9584159b6ba8717850489b4010dfcc8b782c3c09b9f26f665-removing: device or resource busy

ERROR: for tasappomatic_redis_1  Driver aufs failed to remove root filesystem b736349766266140e91780e3dbbcaf75edb9ad35902cbc7a6c8c5dcb2dfefe28: rename /var/lib/docker/aufs/mnt/b474a7c91ad77920dfb00dc3a0ab72bc22964ae3018e971d0d51e6ebe8566aeb /var/lib/docker/aufs/mnt/b474a7c91ad77920dfb00dc3a0ab72bc22964ae3018e971d0d51e6ebe8566aeb-removing: device or resource busy

ERROR: for tasappomatic_db_1  Driver aufs failed to remove root filesystem 1cc473718bd19d6df3239e84c74cd7322306486aa1d2252f30472216820fe96e: rename /var/lib/docker/aufs/mnt/d4162a6ef7a9e9e65bd460d13fcce8adf5f9552475b6366f14a19ebd3650952a /var/lib/docker/aufs/mnt/d4162a6ef7a9e9e65bd460d13fcce8adf5f9552475b6366f14a19ebd3650952a-removing: device or resource busy

ERROR: for tasappomatic_worker_1  Driver aufs failed to remove root filesystem eeadc938d6fb3857a02a990587a2dd791d0f0db62dc7a74e17d2c48c76bc2102: rename /var/lib/docker/aufs/mnt/adecfa9d22618665eba7aa4d92dd3ed1243f4287bd19c89617d297056f00453a /var/lib/docker/aufs/mnt/adecfa9d22618665eba7aa4d92dd3ed1243f4287bd19c89617d297056f00453a-removing: device or resource busy
Starting tasappomatic_db_1
Starting tasappomatic_redis_1

ERROR: for redis  Cannot start service redis: Container is marked for removal and cannot be started.

ERROR: for db  Cannot start service db: Container is marked for removal and cannot be started.
ERROR: Encountered errors while bringing up the project.

Inspecting mount

fuser -m /var/lib/docker/aufs/mnt/c243cc7329891de9584159b6ba8717850489b4010dfcc8b782c3c09b9f26f665
/var/lib/docker/aufs/mnt/c243cc7329891de9584159b6ba8717850489b4010dfcc8b782c3c09b9f26f665:  5620  5624  5658  6425  6434 14602m

Same set of process for the 4 containers

Inspecting process

5620 5624 6434 another postgresql container ()
5658 worker from another container
6425 django from another container
14602m dockerd

systemd,1
  └─dockerd,14602 -H fd://
      └─docker-containe,14611 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
          └─docker-containe,5541 2486fd7f494940619b54fa9b4cedc52c8175988c5ae3bb1dca382f0aaee4f72a /var/run/docker/libcontainerd/2486fd7f494940619b54fa9b4cedc52c8175988c5ae3bb1dca382f0aaee4f72a docker-runc
              └─postgres,5565
                  └─postgres,5620
systemd,1
  └─dockerd,14602 -H fd://
      └─docker-containe,14611 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
          └─docker-containe,5541 2486fd7f494940619b54fa9b4cedc52c8175988c5ae3bb1dca382f0aaee4f72a /var/run/docker/libcontainerd/2486fd7f494940619b54fa9b4cedc52c8175988c5ae3bb1dca382f0aaee4f72a docker-runc
              └─postgres,5565
                  └─postgres,5624
systemd,1
  └─dockerd,14602 -H fd://
      └─docker-containe,14611 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
          └─docker-containe,5642 0364f4ace6e4d1746f8c3e31f872438a592ac07295dd232d92bf64cf729d7589 /var/run/docker/libcontainerd/0364f4ace6e4d1746f8c3e31f872438a592ac07295dd232d92bf64cf729d7589 docker-runc
              └─pootle,5658 /usr/local/bin/pootle rqworker
systemd,1
  └─dockerd,14602 -H fd://
      └─docker-containe,14611 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
          └─docker-containe,5700 bd3fb1c8c36ec408bcf53c8501f95871950683c024919047f5423640e377326d /var/run/docker/libcontainerd/bd3fb1c8c36ec408bcf53c8501f95871950683c024919047f5423640e377326d docker-runc
              └─run-app.sh,5716 /run-app.sh
                  └─pootle,6425 /usr/local/bin/pootle runserver --insecure --noreload 0.0.0.0:8000
systemd,1
  └─dockerd,14602 -H fd://
      └─docker-containe,14611 -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime docker-runc
          └─docker-containe,5541 2486fd7f494940619b54fa9b4cedc52c8175988c5ae3bb1dca382f0aaee4f72a /var/run/docker/libcontainerd/2486fd7f494940619b54fa9b4cedc52c8175988c5ae3bb1dca382f0aaee4f72a docker-runc
              └─postgres,5565
                  └─postgres,6434
scher200 commented 8 years ago

has anyone a better solution then restarting the docker service (version 1.12)?

genezys commented 8 years ago

A workaround was proposed in #25718 to set MountFlags=private in the docker.service configuration file of systemd. See https://github.com/docker/docker/issues/25718#issuecomment-250254918 and my following comment.

So far, this has solved the problem for me.

anusha-ragunathan commented 8 years ago

@genezys : Note the side effect of this workaround that I've explained in https://github.com/docker/docker/issues/25718#issuecomment-250356570

gurpreetbajwa commented 8 years ago

I was getting something like this:

Error response from daemon: Driver aufs failed to remove root filesystem 6b583188bfa1bf7ecf2137b31478c1301e3ee2d5c98c9970e5811a3dd103016c: rename /var/lib/docker/aufs/mnt/6b583188bfa1bf7ecf2137b31478c1301e3ee2d5c98c9970e5811a3dd103016c /var/lib/docker/aufs/mnt/6b583188bfa1bf7ecf2137b31478c1301e3ee2d5c98c9970e5811a3dd103016c-removing: device or resource busy

I simply searched for "6b583188bfa1bf7ecf2137b31478c1301e3ee2d5c98c9970e5811a3dd103016c" and found it was located in multiple folders under docker/ Deleted all those files and attempted deleting docker container again using :sudo rm "containerId" And it worked.

Hope it helps!

k-bx commented 8 years ago

The thing is, I can't remove that file. And lsof doesn't show any user of that file. I suspect this kernel bug so I just did sudo apt-get install linux-image-generic-lts-xenial on my 14.04, hoping it'll help.

oopschen commented 8 years ago

I encouter same problem and i google for while. It seems the cadvisor container lock the file. After remove the cadvisor container, i can remove the files under [dockerroot]/containers/xxxxxx.

thaJeztah commented 8 years ago

@oopschen yes, that's a known issue; the cAdvisor uses various bind-mounts, including /var/lib/docker, which causes mounts to leak, resulting in this problem.

oopschen commented 8 years ago

@thaJeztah Is there any solution or alternative for cadvisor? Thanks.

thaJeztah commented 8 years ago

@oopschen some hints are given in https://github.com/docker/docker.github.io/pull/412, but it depends on what you need cAdvisor for to be able to tell what alternatives there are. Discussing alternatives may be a good topic for forums.docker.com

jeff-kilbride commented 7 years ago

Just got this error for the first time on OS X Sierra using docker-compose:

ERROR: for pay-local  Driver aufs failed to remove root filesystem
0f7a073e087e0a5458d28fd13d6fc840bfd2ccc28ff6fc2bd6a6bc7a2671a27f: rename
/var/lib/docker/aufs/mnt/a3faba12b32403aaf055a26f123f5002c52f2afde1bca28e9a1c459a18a22835
/var/lib/docker/aufs/mnt/a3faba12b32403aaf055a26f123f5002c52f2afde1bca28e9a1c459a18a22835-removing: 
structure needs cleaning

I had never seen it before the latest update last night.

$ docker-compose version
docker-compose version 1.9.0, build 2585387
docker-py version: 1.10.6
CPython version: 2.7.12
OpenSSL version: OpenSSL 1.0.2j  26 Sep 2016

$ docker version
Client:
 Version:      1.13.0-rc3
 API version:  1.25
 Go version:   go1.7.3
 Git commit:   4d92237
 Built:        Tue Dec  6 01:15:44 2016
 OS/Arch:      darwin/amd64

Server:
 Version:      1.13.0-rc3
 API version:  1.25 (minimum version 1.12)
 Go version:   go1.7.3
 Git commit:   4d92237
 Built:        Tue Dec  6 01:15:44 2016
 OS/Arch:      linux/amd64
 Experimental: true

I tried docker rm -fv a couple of times, but always received the same error.

$ docker ps -a
CONTAINER ID        IMAGE                              COMMAND             CREATED             STATUS              PORTS               NAMES
0f7a073e087e        pay-local                          "node app.js"       2 minutes ago       Dead                                    pay-local

In the amount of time it's taken me to type this out, the offending container is now gone.

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

I don't know if it's fixed itself, or if there's still a problem lurking...

EDIT: Just started and stopped the same set of containers using docker-compose several times with no errors, so... ?

thaJeztah commented 7 years ago

@jeff-kilbride structure needs cleaning is a different message, and may refer to the underlying filesystem; could be specific to Docker for Mac

yoyos commented 7 years ago

Just happened to me on every container, and went away after a few seconds (contained got deleted)

docker version

Client: Version: 1.13.1 API version: 1.26 Go version: go1.7.5 Git commit: 092cba3 Built: Wed Feb 8 06:36:34 2017 OS/Arch: linux/amd64

Server: Version: 1.13.1 API version: 1.26 (minimum version 1.12) Go version: go1.7.5 Git commit: 092cba3 Built: Wed Feb 8 06:36:34 2017 OS/Arch: linux/amd64 Experimental: false

docker-compose version docker-compose version 1.9.0, build 2585387 docker-py version: 1.10.6 CPython version: 2.7.9 OpenSSL version: OpenSSL 1.0.1t 3 May 2016

On debian 8 3.16.0-4-amd64

carlwain74 commented 7 years ago

I've been having this issue on a few of my Docker servers running 1.12.5

Client: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:23:59 2016 OS/Arch: linux/amd64

Server: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:23:59 2016 OS/Arch: linux/amd64

Last night in particular a developer tried to use docker-compose stop, rm and up -d (bash wrapper) and he encountered the issue reported above. Prior to using the docker-compose the developer pulled an updated "latest" tagged image from our local registry. When I started to investigate I could see the container was marked as Dead. I attempted 'docker rm' command and got the same results.

After 5-10 minutes of researching the issue on the web I went back and to observe the status of the container and could see that it was removed already. Following this observation I attempted to bring the container up "docker-compose up -d" and was successful in doing so.

bschwartz757 commented 7 years ago

Hello,

** I was getting errors during some of these commands because I had removed docker at one point; I re-installed it, now I can't seem to uninstall it. I'm still getting these errors as well:


rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```

```docker -v
Docker version 1.12.6, build 78d1802```

MAIN ISSUE:
I tried out Rancher over the past week and it doesn't look it will be a good solution for me. I have a standard ubuntu 16.04 server on Digital Ocean, and I'm trying to completely remove rancher and docker; it took some digging on the internet to figure out how to do this, and I've finally got it whittled down but now I can't finish removing /var/lib/rancher and /var/lib/docker. Here are the outputs I get:

```sudo rm -rf rancher
rm: cannot remove 'rancher/volumes': Device or resource busy```

I read that using this command might help track down the running processes so they can be killed, but no dice:
```lsof +D ./
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/c24324d8b667
      Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/default
      Output information may be incomplete.
COMMAND   PID       USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
bash    16667 blakers757  cwd    DIR  253,1     4096 267276 .
lsof    27938 blakers757  cwd    DIR  253,1     4096 267276 .
lsof    27939 blakers757  cwd    DIR  253,1     4096 267276 .```

When I try to kill the processes by pid, it fails. 

docker ps shows no running containers:
```docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

when I try to remove /var/lib/docker, I get the following:
```sudo rm -rf /var/lib/docker
rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```

whatever is running inside this `overlay2` folder seems to be to blame.

Just wondering if you all have any ideas, thanks.
thaJeztah commented 7 years ago

Is the docker service stopped when you try to remove? Looks like there's still something running

bschwartz757 commented 7 years ago

I'm totally new to Docker, unfortunately - thought this would be a good learning opportunity. I've tried every command I can find to kill and remove any remaining containers/services but still no luck:


CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ docker ps -a -f status=exited
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ docker stop $(docker ps -a -q)
"docker stop" requires at least 1 argument(s).
See 'docker stop --help'.

Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]

Stop one or more running containers
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ docker rm $(docker ps -a -q)
"docker rm" requires at least 1 argument(s).
See 'docker rm --help'.

Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ cd ../../
blakers757@ubuntu-1gb-sfo1-01:/$ cd ~
blakers757@ubuntu-1gb-sfo1-01:~$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
blakers757@ubuntu-1gb-sfo1-01:~$ docker rm $(docker ps -a -q)
"docker rm" requires at least 1 argument(s).
See 'docker rm --help'.

Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers
blakers757@ubuntu-1gb-sfo1-01:~$ docker volume ls
DRIVER              VOLUME NAME
blakers757@ubuntu-1gb-sfo1-01:~$ sudo rm -rf /var/lib/docker
rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy
blakers757@ubuntu-1gb-sfo1-01:~$ ```

Any help would be appreciated, thanks!
thaJeztah commented 7 years ago

try stopping the service (systemctl stop docker), then remove /var/lib/docker

bschwartz757 commented 7 years ago

Thank you, unfortunately that's still not working though:


==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===
Authentication is required to stop 'docker.service'.
Authenticating as: Blake Schwartz,,, (blakers757)
Password: 
==== AUTHENTICATION COMPLETE ===
blakers757@ubuntu-1gb-sfo1-01:~$ sudo rm -rf /var/lib/docker
[sudo] password for blakers757: 
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```
jeff-kilbride commented 7 years ago

Maybe:

docker volume prune -f

Do you still have images? What does docker images show? If so, try to remove them:

docker rmi -f [container id]

Finally:

docker rmi $(docker images --quiet --filter "dangling=true")

If none of those work, I can't help you... (reboot the server, if you are able?)

bschwartz757 commented 7 years ago

Thanks, there aren't any images but after rebooting my server I was able to remove /var/lib/rancher. Still unable to remove /var/lib/docker though:


CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ rm -rf docker
rm: cannot remove 'docker': Permission denied
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ sudo rm -rf docker
rm: cannot remove 'docker/overlay': Device or resource busy
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ docker kill overlay
Error response from daemon: Cannot kill container overlay: No such container: overlay
blakers757@ubuntu-1gb-sfo1-01:/var/lib$ ```

This output is a little different than previous (it had been referring to docker/overlay2/some long sha number/merged. In any case though, still doesn't seem to want to remove docker entirely.
yunghoy commented 7 years ago

My team has been faced with the problem every time shutting down docker containers. We are running a service with more than 100 docker containers and container advisors through swarm system. The only solution what I found is that shutting down forcefully several times until the message which indicates containers do not exists anymore is shown. It's happening around 1 out of 5 containers. It seems 10 percent is really critical for business service.

OS: Ubuntu Xeniel Docker: v1.13.1 CAdvisor: v0.24.1

We had to restart docker service or, unluckily, linux servers because of the combination of network allocation bug and this container advisor bug. Luckily, the network allocation bug seems to be fixed in the latest docker binary.

cpuguy83 commented 7 years ago

@yunghoy What's the full error message that you see?

titpetric commented 7 years ago

I came accross this one (I hope it's related):

root@docker2:/var/www# docker rm -f spiderweb
Error response from daemon: Unable to remove filesystem for 601d43bca2550c2916d2bf125f04b04b82423633fbed026393b99291d1ef0b08: remove /var/lib/docker/containers/601d43bca2550c2916d2bf125f04b04b82423633fbed026393b99291d1ef0b08/shm: device or resource busy
root@docker2:/var/www# docker rm -f spiderweb
Error response from daemon: No such container: spiderweb

The symptoms however were a bit off from what I could see. When I was running the container, it started to run the process but then it was as if the process was stuck in an infinite loop (as soon as it started with some I/O actually - I write out some small files with it on a volume mount). The process didn't react to 'docker stop', and I managed to do a pstree -a before killing it with docker rm -f and getting the above message, this was the last branch:

─docker run -i --rm --name spiderweb -v /var/www:/var/www -v /src/spiderweb/bin:/usr/src/app -w /usr/src/app node:4 node src/index.js docker2
   └─11*[{docker}]

I'm not exactly sure how 11 docker children come into play here. Seeing typical container process trees leads me to believe that the application has stopped already, but docker engine didn't catch it somehow.

This is pastebin output for that location, which is still there and can't be removed: full output. I'm going to go with a service restart, to clean this up.

Edit: following @thaJeztah advice above, even after service restart docker there were folders in /var/lib/docker/containers that couldn't be deleted. They didn't show up in lsof, and I was running as root so it's a bit beyond me how this can happen apart from disk failure. A reboot solved this, and the files/folders containing only an empty "shm" folder could then be deleted. Attaching a docker version/info for extra information about my system:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:02:23 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:02:23 2017
 OS/Arch:      linux/amd64
 Experimental: false
Containers: 4
 Running: 4
 Paused: 0
 Stopped: 0
Images: 18
Server Version: 17.03.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-2-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 7.698 GiB
Name: docker2
ID: G2Z3:XLWE:P3V3:FTZR:U2Y6:2ABJ:6HTP:PIV2:KRHA:2ATV:ZMPQ:SHMJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
kopax commented 7 years ago

I always have this message if I try to remove the container to quickly. It's really annoying I have to restart docker to get the port back:

ERROR: for nginx_nginx_1  Unable to remove filesystem for cfd48197bba6ee1ac91d7690b0567b56e61be03420768a5627936601b3ad6378: remove /var/lib/docker/containers/cfd48197bba6ee1ac91d7690b0567b56e61be03420768a5627936601b3ad6378/shm: device or resource busy

Docker version 1.12.5, build 7392c3b

Is there a way to avoid it ?

tuannvm commented 7 years ago

Same issue on docker 17.03.0-ce. No way to get it back to work unless restarting docker daemon...

Update Stop cadvisor, then attempt to remove dead container & re-create it. Works for me.

Should avoid using this mountpoint if not necessary /var/lib/docker:/var/lib/docker:ro. It seems Cadvisor, with permission to access other container's volumes could lock them on the run.

andrask commented 7 years ago

Docker version 1.12.6 with devicemapper on RHEL 7.3 results the same. docker rm -f <container> fails. No mounts, no special settings.

This is happening under high load. That is, I start several scripts that

Loop:
1. run container with `cat`
2. exec some commands
3. rm -f container

Things go well for a while and then removal starts getting buffered and batched. When the error happens, the file system of the container remain intact but the container is "forgotten" by the server. The scenario finally fails with the tdata device being totally fragmented and the tmeta device being full.

dylanrhysscott commented 7 years ago

Hi I'm having this issue using docker-compose - I've opened up an issue here https://github.com/docker/compose/issues/4781. Could this be related? Restarting the Docker Daemon didn't help and my only solution was to force remove the dead container. Even though it triggered the error the container is still removed. Looking in /var/lib/docker/mnt/aufs I suspect the dead folders are still there but it does circumvent the issue...at least until the next time you need to recreate a container with a fresh image

bignay2000 commented 7 years ago

I am having this issue with RHEL 7.3 and Docker version 17.05.0-ce, build 89658be

We do a "systemctl restart docker" as work around or reboot the RHEL 7.3 docker host virtual machine.

This does appear to be more frequent in later releases of Docker.

df -Th shows /boot drive is ext4 and all the other drives are xfs. (not sure why the boot drive is ext4, going to ask my team).

cpuguy83 commented 7 years ago

@bignay2000 Typically you'd get this error if some mount leaked from the container state dir into a container. This can happen quite easily if you do something like -v /var/lib/docker:/var/lib/docker.

Vanuan commented 7 years ago

I've got this when I removed some dirs from /var/lib/docker/volumes to free up some space. Because docker volume rm doesn't work when you're out of space.

May it be related?

I've also introduced --rmi local -v options to docker-compose down. Will try to remove these options to see whether it was the cause.

P.S. This is happening on jenkins, with multiple parallel docker-compose runs.

yejw5 commented 7 years ago

I restart server,and then remove /var/lib/docker successfully.

cognifloyd commented 7 years ago

I just experienced this multiple times. I added MountFlags=private to the docker service to prevent further mount leaks, but I was sick of restarting the machine, so I went hunting for a way to get rid of the leaked mounts without restarting.

Looking for these leaked mounts, I noticed here that each pid has its own mountinfo at /proc/[pid]/mountinfo. So, to see where the leaks were I did:

$ grep docker /proc/*/mountinfo
/proc/13731/mountinfo:521 460 8:3 /var/lib/docker/overlay /var/lib/docker/overlay rw,relatime shared:309 - xfs /dev/sda3 rw,seclabel,attr2,inode64,noquota
/proc/13731/mountinfo:522 521 0:46 / /var/lib/docker/overlay/2a2dd584da9858fc9e5928d55ee47328712c43e52320b050ef64db87ef4d545a/merged rw,relatime shared:310 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/7cbf3db2f8b860ba964c88539402f35c464c36013efcb845bce2ee307348649f/root,upperdir=/var/lib/docker/overlay/2a2dd584da9858fc9e5928d55ee47328712c43e52320b050ef64db87ef4d545a/upper,workdir=/var/lib/docker/overlay/2a2dd584da9858fc9e5928d55ee47328712c43e52320b050ef64db87ef4d545a/work
/proc/13731/mountinfo:523 521 0:47 / /var/lib/docker/overlay/12f139bad50b1837a6eda1fe6ea5833853746825bd55ab0924d70cfefc057b54/merged rw,relatime shared:311 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/d607050a3f9cdf004c6d9dc9739a29a88c78356580db90a83c1d49720baa0e5d/root,upperdir=/var/lib/docker/overlay/12f139bad50b1837a6eda1fe6ea5833853746825bd55ab0924d70cfefc057b54/upper,workdir=/var/lib/docker/overlay/12f139bad50b1837a6eda1fe6ea5833853746825bd55ab0924d70cfefc057b54/work
/proc/13731/mountinfo:524 521 0:48 / /var/lib/docker/overlay/33fb78580b0525c97cde8f23c585b31a004c51becb0ceb191276985d6f2ba69f/merged rw,relatime shared:312 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/5e8f5833ef21c482df3d80629dd28fd11de187d1cbbfe8d00c0500470c4f4af2/root,upperdir=/var/lib/docker/overlay/33fb78580b0525c97cde8f23c585b31a004c51becb0ceb191276985d6f2ba69f/upper,workdir=/var/lib/docker/overlay/33fb78580b0525c97cde8f23c585b31a004c51becb0ceb191276985d6f2ba69f/work
/proc/13731/mountinfo:525 521 0:49 / /var/lib/docker/overlay/e6306bbab8a29f715a0d9f89f9105605565d26777fe0072f73d5b1eb0d39df26/merged rw,relatime shared:313 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/409a9e5c05600faa82d34e8b8e7b6d71bffe78f3e9eff30846200b7a568ecef0/root,upperdir=/var/lib/docker/overlay/e6306bbab8a29f715a0d9f89f9105605565d26777fe0072f73d5b1eb0d39df26/upper,workdir=/var/lib/docker/overlay/e6306bbab8a29f715a0d9f89f9105605565d26777fe0072f73d5b1eb0d39df26/work
/proc/13731/mountinfo:526 521 0:50 / /var/lib/docker/overlay/7b56a0220212d9785bbb3ca32a933647bac5bc8985520d6437a41bde06959740/merged rw,relatime shared:314 - overlay overlay rw,seclabel,lowerdir=/var/lib/docker/overlay/d601cf06e1682c4c30611d90b67db748472d399aec8c84487c96cfb118c060c5/root,upperdir=/var/lib/docker/overlay/7b56a0220212d9785bbb3ca32a933647bac5bc8985520d6437a41bde06959740/upper,workdir=/var/lib/docker/overlay/7b56a0220212d9785bbb3ca32a933647bac5bc8985520d6437a41bde06959740/work

That told me that process 13731 still had references to /var/lib/docker/overlay, so I (as root) entered the mount namespace of that process and removed the mounts:

$ nsenter -m -t 13731 /bin/bash
$ mount
<snipped mount output that verifies that it does see those mount points>
$ umount /var/lib/docker/overlay/*
$ umount /var/lib/docker/overlay
$ exit

At which point I could finally delete /var/lib/docker, restart the docker service (thus recreating everything in /var/lib/docker), and have no more issues.

Vanuan commented 7 years ago

After removing --rmi local -v I didn't have this problem. Probably it tries to remove shared images. I'll try with option -v

lievendp commented 7 years ago

did encounter similar issue but still on docker 1.10.3 in this case.

# docker info
Containers: 15
 Running: 15
 Paused: 0
 Stopped: 0
Images: 12
Server Version: 1.10.3
Storage Driver: devicemapper
 Pool Name: docker-thin-pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 6.501 GB
 Data Space Total: 21.47 GB
 Data Space Available: 14.97 GB
 Metadata Space Used: 1.982 MB
 Metadata Space Total: 4.194 MB
 Metadata Space Available: 2.212 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 5
 Library Version: 1.02.107-RHEL7 (2015-12-01)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: bridge null host
 Authorization: rhel-push-plugin
Kernel Version: 3.10.0-327.13.1.el7.x86_64
Operating System: Red Hat Enterprise Linux
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 0
CPUs: 4
Total Memory: 15.51 GiB

Is it possible that the container won't stop and goes dead in this condition because some other service is still trying to send data to it? (sounds a bit far-fledged)

cpuguy83 commented 7 years ago

@lievendp No. The container is stopping just fine, it just can't be removed because the mount has leaked into some other namespace, and you are running a super old kernel where this kind of thing just breaks down.

The reason you can remove it after waiting some amount of time is because the thing that was holding the mount has exited.

In this case, docker rm -f is not removing the container, it's only removing the metadata stored in the daemon about the container.

cpuguy83 commented 7 years ago

Btw, there are more fixes coming in 17.06 that should hopefully help alleviate (or completely resolve) this situation, especially if you are on a relatively recent kernel... though you don't have to be on a recent kernel for some of the changes.