moby / moby

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
https://mobyproject.org/
Apache License 2.0
68.63k stars 18.64k forks source link

Docker daemon hangs trying to get container sizes #32846

Closed gswallow closed 1 year ago

gswallow commented 7 years ago

Opening a new case from https://github.com/moby/moby/issues/30003#issuecomment-297149187.

We're trying to run the datadog monitoring agent, and report the disk usage of each of our containers. Whenever I turn this option on, the docker daemon locks up in various ways, depending on which version of docker I have installed.

FWIW the datadog agent is using docker-py v1.10.6, and times out when it runs python code like this:

from docker import Client client = Client(base_url ='unix://var/run/docker.sock') client.containers(all=True, size=True)

We use the devicemapper storage driver in thinpool mode, because it's the only storage driver to date that allows me to limit the size of docker containers. On a PaaS, preventing containers from eating all of the available disk space is kind of important.

I have a few ways to reproduce this problem (or perhaps set of problems) running at the moment. First one:

Linux 4.4.0-67-generic #88-Ubuntu SMP Description: Ubuntu 16.04.2 LTS docker-engine 17.04.0~ce-0~ubuntu-xenial "storage-driver": "devicemapper", "storage-opts": [ "dm.thinpooldev=/dev/mapper/vg0-pool0", "dm.use_deferred_deletion=true", "dm.use_deferred_removal=true", "dm.fs=ext4" ]

time sh -c 'docker ps > /dev/null 2>&1' real 0m0.010s

time sh -c 'docker system df > /dev/null 2>&1' Probably never returns because I've been waiting well over twenty minutes.

docker info & go routines attached as system-1.txt system-1.txt

Second one:

Linux 4.4.0-67-generic #88-Ubuntu SMP Description: Ubuntu 16.04.2 LTS docker-engine 1.13.1-0~ubuntu-xenial "storage-driver": "devicemapper", "storage-opts": [ "dm.thinpooldev=/dev/mapper/vg0-pool0", "dm.use_deferred_deletion=true", "dm.use_deferred_removal=true", "dm.fs=ext4" ]

time sh -c 'docker ps > /dev/null 2>&1' hangs

time sh -c 'docker system df > /dev/null 2>&1' hangs

I had to reboot the system (or probably restart the docker daemon) to get a capture of the go routines, because so many go routines had piled up that I ended up with a 42 MB go-routines file when I sent docker daemon a USR1 signal.

docker info & go routines attached as system-2.txt system-2.txt

Third one (what we've settled on for production right now):

Linux 4.4.0-67-generic #88-Ubuntu SMP Description: Ubuntu 16.04.2 LTS docker-engine 1.11.2-0~xenial "storage-driver": "devicemapper", "storage-opts": [ "dm.thinpooldev=/dev/mapper/vg0-pool0", "dm.use_deferred_deletion=true", "dm.use_deferred_removal=true", "dm.fs=ext4" ]

docker ps doesn't hang. There is no docker system command with Docker 1.11.2. I can reproduce the hang, though, by invoking the python code, above.

docker info & go routines (obtained through journalctl) attached as system-3.txt. system-3.txt

remh commented 7 years ago

I think it's the same issue as: https://github.com/moby/moby/issues/15888

cpuguy83 commented 7 years ago

Thanks for opening.

Yeah, the sizes of each container has to be calculated. This can take a really long time, and in particular could be a really bad if you are run multiple docker system df commands at the same time. In 17.06 we'll have a global lock around this so you can only run one docker system df at a time. There's also some performance improvements to how this is calculated in 17.05.

gswallow commented 7 years ago

I upgraded to 17.05. After 12 hours of runtime, my current system load average is 280.65, 283.50, 280.79. IOwait 96%.

The good news is that docker ps doesn't hang. The bad news is that the datadog container has gone kaput.

17.05-still-broken.txt

thaJeztah commented 1 year ago

devicemapper may also have played a role here.

Let me close this ticket for now, as it looks like it went stale.