Closed mvording closed 6 years ago
Hell, I think this is an issue even outside of builds - I make a lot of transient containers and have resorted to nuking the system to deal with the huge volume of stuff left in /var/lib/docker
This is actually a serious resource leak on my systems. Here's an inode report for a box that has done little more than pull a bunch of images:
------------------------------------------
INODE USAGE SUMMARY
------------------------------------------
INODES | SIZE | DIRECTORY
------------------------------------------
92733 | 3.3G | aufs
1 | 4.0K | containers
2 | 8.0K | execdriver
365 | 1.5M | graph
2 | 14M | init
1 | 4.0K | tmp
1 | 4.0K | trust
1 | 4.0K | volumes
------------------------------------------
93109 | 3.3G | /var/lib/docker
------------------------------------------
And here's another box that's been running about 3 containers a minute (with only those very same images, and obviously removing the containers) for a few hours:
------------------------------------------
INODE USAGE SUMMARY
------------------------------------------
INODES | SIZE | DIRECTORY
------------------------------------------
148279 | 4.3G | aufs
15 | 60K | containers
8 | 32K | execdriver
365 | 1.5M | graph
2 | 14M | init
1 | 4.0K | tmp
1 | 4.0K | trust
1 | 4.0K | volumes
------------------------------------------
148675 | 4.3G | /var/lib/docker
------------------------------------------
It would be one thing if I just had to periodically wipe, but performance worsens the more containers I run. This is very serious to me, can anyone help?
Reports generated via https://github.com/tripflex/inodes, which is pretty dope.
We also just encountered this, nuking /var/lib/docker is the easiest solution. But it would be great if this was not needed of course.
Question: Is /var/lib/docker/aufs
empty after removing all containers/images?
Given the days of aufs
appear numbered, we'll try the DeviceMapper
backend and see if that suffers from the same problem, and report back here.
We didn't get very far with the DeviceMapper
backend, because issue #4036 prevented us from building any new images. Reverting to aufs with a larger inode table (and nuking /var/lib/docker/aufs
) for the time being.
We (@efuquen and I) are having this same issue. Even after a full system wipe, Docker quickly uses up all inodes after building even a few images.
This is my inode usage based off the helpful script @vincentwoo provided:
------------------------------------------
INODE USAGE SUMMARY
------------------------------------------
INODES | SIZE | DIRECTORY
------------------------------------------
67 | 544K | containers
13 | 52K | execdriver
440 | 1.8M | graph
2 | 7.5M | init
3878564 | 3.6G | overlay
1 | 4.0K | tmp
1 | 4.0K | trust
12 | 44K | vfs
13 | 52K | volumes
------------------------------------------
3957193 | 3.6G | /var/lib/docker
------------------------------------------
Quick update: using the btrfs
backend is a successful workaround for us (on Debian Wheezy @morgante)
@sebbacon Are you on CoreOS?
I just ran into this on an EC2 machine
Hi @vincentwoo ,
Just wondering if you could provide me with the helpful script that was given to @efuquen ?
@frankfuu it was just https://github.com/tripflex/inodes
Thanks @vincentwoo
I'm still having the same issues, and after a few builds I need stop docker then nuke /var/lib/docker
directory then restart docker
and rebuild my apps on dokku and restart mongod
before working again
Have the same issue with overlay:
docker# find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
1 init
1 linkgraph.db
1 repositories-overlay
1 trust
30 containers
116 graph
196 tmp
1445877 overlay
docker info
Containers: 5
Images: 56
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.18.0-031800-generic
Operating System: Ubuntu Wily Werewolf (development branch) (containerized)
CPUs: 4
Total Memory: 6.805 GiB
Name: Logstash1
ID: MYDD:REBI:AB6H:RLW3:YZD2:SUWT:CJED:ENVW:7W3U:RYUC:ZMM4:R2DI
WARNING: No swap limit support
docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64
DIND is being used for our CI workflow.
P.S. Found an interesting article describing this issue: http://blog.cloud66.com/docker-with-overlayfs-first-impression/
Almost a year since issue exists :(
Let's not mix overlay issues with aufs issues please. Overlay has known issues with excessive inode usage.
I have not personally had any issues with excessive inode usage on aufs, but have seen it be flakey (because aufs can be flakey) when removing containers/images (ie, not actually removing them, docker does report the error). Are you getting removal errors on your transient containers?
We are seeing this issue too, although, not from build, but from excessive create/delete script.
Anyone have any idea how to proceed, we are running out of inodes every couple days, and it is painful to keep things up.
Just experienced this. Given enough deploys in production it seems unavoidable.
Is there an updated analysis of what’s going wrong with this problem? The initial post said that running docker rmi
didn’t help as much as it should? Does this affect machines that build and not ones that just pull?
I’m using AWS linux ami machines, and for my production machines, which only pull, I added a post deploy script to at least run docker rmi $(docker images -q)
, but for my jenkins build machine, it seems I just have to ssh in and perform the more drastic steps to clean it up when things go south.
I wonder how many volumes you have on the host and if this is the source of your inode usage.
I run docker rmi $(docker images -q --filter "dangling=true")
after each deploy but it did not free up nearly enough inodes. Ended up wiping the entire aufs folder and rebuilding from scratch.
@zbyte64 +1
Ran into this issue as well on my development environment (haven't hit this yet on production but we're only running a few contianers (~ 12)). I heavily use docker-compose for setting up a dev stack so I do run a lot of docker-compose up/run. I must say I've been using docker for dev purposes on a day to day basis for 3 - 4 monts now, so itś not the same as what @thefallentree described.
I have no choice but to nuke /var/lib/docker, so I will do that and keep an eye on this issue. What worries me the most is the impact this issue might have on a production environment. I was wondering if others could share their FS setups and how good (or bad has worked for them). Right now we're using XFS with LVM but as I said before we're only running around ~ 12 containers and there are roughly 2 deploys/day.
Here is some information:
# uname -a
Linux aws015 4.1.6-200.fc22.x86_64 #1 SMP Mon Aug 17 19:54:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# docker info
Containers: 3
Images: 619
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: syslog
Kernel Version: 4.1.6-200.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 4
Total Memory: 15.5 GiB
Name: aws015
ID: UHNK:E7RB:24OQ:FQ2G:ZNOW:HSTJ:CDOQ:FZET:EXTT:SJJH:AGHR:5GXB
# docker version
Client:
Version: 1.8.1
API version: 1.20
Go version: go1.4.2
Git commit: d12ea79
Built: Thu Aug 13 02:39:27 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.8.1
API version: 1.20
Go version: go1.4.2
Git commit: d12ea79
Built: Thu Aug 13 02:39:27 UTC 2015
OS/Arch: linux/amd64
# df -i /
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora_aws015-root 3276800 3275073 1727 100% /
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/fedora_aws015-root 50G 28G 20G 59% /
#tune2fs -l /dev/mapper/fedora_aws015-root
tune2fs 1.42.12 (29-Aug-2014)
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: 13e67bf8-0279-4514-97ea-c788b0fd634f
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 3276800
Block count: 13107200
Reserved block count: 655360
Free blocks: 5641750
Free inodes: 1202
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1020
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Thu Jul 2 15:58:50 2015
Last mount time: Fri Sep 11 15:28:49 2015
Last write time: Fri Sep 11 11:28:45 2015
Mount count: 10
Maximum mount count: -1
Last checked: Thu Jul 2 15:58:50 2015
Check interval: 0 (<none>)
Lifetime writes: 232 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 2623233
Default directory hash: half_md4
Directory Hash Seed: 42714f67-ce1f-4425-9693-6ab291abde2e
Journal backup: inode blocks
btrfs
continues to work fine for us @yoanisgil
We have the same issue with aufs. rmi doesn't change anything, we have to nuke /var/lib/docker.
ps: Backing Filesystem: extfs
I have the same problem with overlay+ext4.
@elyulka this is a known issue with overlay fs.
Hi, same problem here -.-
Hi guys, if you're in trouble give a look to this post: http://masato.github.io/2014/11/05/how-to-correctly-remove-volume-directories/
or put in cron this command: (it's stupid but it works) docker ps -a | grep Exited | awk '{ print $1 }' | xargs docker rm -v
@cristianocasella removing volumes is not related here, also docker 1.9 has a docker volumes
command to manage volumes. The script you're proposing, is better written using the built-in filter functionality of docker; i.e. docker rm -v $(docker ps -q -f status=exited)
Related to #10613
Also thought I would chime in here. I am running docker 1.9.1 with aufs with docker-compose 1.5.2 on Ubuntu 15.10 64-bit in a VMWare Workstation VM on Windows 8.1.
I am currently prototyping a HDFS HA and HBase HA cluster using docker. Using docker-compose, I launch 13 containers and 1 data volume container.
I have noticed that with a clean VM at the start of the day, things work pretty well. However, by early afternoon, containers won't stop and a lot of times, the HBase master and regionserver processes won't start or run correctly in the containers. I can see that my entrypoint is run and it starts the process for the master or regionserver, but they no longer work properly or produce any output. I also noticed that Ubuntu will complain about a lot of orphan inodes when I try to restart the VM.
The only way to fix this problem is to delete everything in /var/lib/docker
and start the docker daemon again.
I am also having this problem with default docker machine in Mac OS X. I have to nuke the whole /var/lib/docker
to fix this. One interesting thing was that docker info
was showing 500 images, but when I did docker images|wc -l
, it only showed 150.
@andreychernih the difference in images-count is likely because docker info
also includes the number of intermediate images, i.e. docker images -aq | wc -l
should match that number more closely
@thaJeztah thanks, with intermediate images shown, these numbers are closer indeed.
hi guys, I just hit the same inode exhaustion issue with overlayfs. What is the best practice around this ? We do multiple deploys a day and this problem is hitting us hard.
@sandys the overlay section in the documentation has some hints, although in general, it's a result of the way overlay works.
I'm having the same issue. :(
USER POLL
_The best way to get notified of updates is to use the Subscribe button on this page._
Please don't use "+1" or "I have this too" comments on issues. We automatically collect those comments to keep the thread short.
The people listed below have upvoted this issue by leaving a +1 comment:
@mfornasa
We run into this issue nearly daily now with overlayfs
. Because of docker-machine, the only fix is essentially to destroy our machines and start again. Is this the canonical bug for overlayfs inode exhaustion @thaJeztah? Should I be filing one specifically with docker-machine?
The only other reference I could find to this issue is https://github.com/coreos/bugs/issues/264
@lox the issue tracking it in docker/docker is https://github.com/docker/docker/issues/10613, and you can partly workaround the issue e.g. https://github.com/docker/docker/issues/10613#issuecomment-122182216, https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/#overlayfs-and-docker-performance). Unfortunately, exhaustive inode usage is due to the way overlay works, so making those configuration changes (for now) is the best workaround. Perhaps docker-machine could make those changes when creating a new machine, yes
hi @thaJeztah Could you help explain why overlay have inode limitation, From the overlay point of view, it also create directories and files (some with hard link, which would not increase inode). Other storage driver also create many directiese and files, I did not quite understood it.
We are running into this issue using OverlayFS with an inode limit of: 3276800. That's over 3 Million inodes. Shouldn't that cover enough ground for a dozen builds/runs per day? How is this still a problem since Dec 19, 2014?!?!?
@saada the issue with overlay is not the same and has a well known cause. It's not that inodes are not released, it's that you have images laying around and the overlay driver used a lot of inodes.
Is the implication then @cpuguy83 that it's possible to reclaim those inodes by purging images? I was under the impression that wasn't possible.
@lox again, the issue of overlay and this issue are not the same at all.
I _suspect_ the issue with aufs is images (or the container) that got something like a "device or resource busy" error when trying to remove, but no longer show up in docker images
or docker ps -a
because the configurations are gone, but the data is still there.
Issues with "device or resource busy" should be pretty much all fixed in 1.11.
I can run one of the various docker clean up scripts that helps - but if I really want to clean it up I have to stop docker rm -rf /var/lib/docker then I get the iNodes back. Specifically, mine is related to running gitlab runners on coreos.
On 14 May 2016 at 13:34, Lachlan Donald notifications@github.com wrote:
Is the implication then @cpuguy83 https://github.com/cpuguy83 that it's possible to reclaim those inodes by purging images? I was under the impression that wasn't possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/docker/docker/issues/9755#issuecomment-219217912
Ah yup, you are correct apologies @cpuguy83, I hadn't realized the initial report was AUFS based.
Description of problem:
For this scenario, docker build is being used as part of a CI workflow. After several months of builds, even with having a daily container cleanup script in place, this failed due to being out of space. Checking the disk, there was plenty of disk space available, but no inodes remaining. Checking the filesystem, it turns out the /usr/lib/docker folder tree contained several million inodes.
docker rm/rmi commands were used to remove all containers and images from the system. However this only freed up a few % of the inodes.
To resolve this I had to manually do a "rm -rf /var/lib/docker" to free the inodes.
docker version
: Client version: 1.4.1 Client API version: 1.16 Go version (client): go1.3.3 Git commit (client): 5bc2ff8 OS/Arch (client): linux/amd64 Server version: 1.4.1 Server API version: 1.16 Go version (server): go1.3.3 Git commit (server): 5bc2ff8uname -a
: Linux ip-10-1-0-63 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/LinuxEnvironment details (AWS, VirtualBox, physical, etc.): AWS
How reproducible: Run many docker builds with differing source files, watch the inode count keep increasing and not get reset significantly when all containers and images are removed.