Open cornelius-keller opened 9 years ago
Thanks for reporting, @cornelius-keller
what cadvisor version are you running? Can you get host:port/validate for cadvisor? Is this a temporary situation, or does the container fs stays busy till you delete cadvisor?
@rjnagal Cadvisor version is:
[root@583274-app35 ~]# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE docker.io/google/cadvisor latest 399ae3c46a0e 47 hours ago 19.89 MB [root@583274-app35 ~]#
This is a permanent situation. The container fs stays busy untill I delete cadvisor.
What do you mean by getting host:port/validate for cadvisor? Cadvisor was still running and responsive on the web ui if that is what you mean. Unfortunately I can't give you any public host port to validate as cadvisor is only exposed via a vpn.
Yeah, I just need the ouput from /validate endpoint on cadvisor UI. You can scrub any data that's private in there. Thanks
On Fri, Jun 12, 2015 at 9:54 AM, Cornelius Keller notifications@github.com wrote:
@rjnagal https://github.com/rjnagal Cadvisor version is:
[root@583274-app35 ~]# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZEdocker.io/google/cadvisor latest 399ae3c46a0e 47 hours ago 19.89 MB [root@583274-app35 ~]#
This is a permanent situation. The container fs stays busy untill I delete cadvisor.
What do you mean by getting host:port/validate for cadvisor? Cadvisor was still running and responsive on the web ui if that is what you mean. Unfortunately I can't give you any public host port to validate as cadvisor is only exposed via a vpn.
— Reply to this email directly or view it on GitHub https://github.com/google/cadvisor/issues/771#issuecomment-111555689.
Sorry was a long day, did not get that this was an endpoint. I added the output to the gist.
I am facing this same issue. Essentially, running cadvisor with --volume=/:/rootfs:ro
causes other containers' devicemapper mounts to be mounted inside the cadvisor container, so they can't be properly destroyed when issuing docker rm
on the target container as they will appear in use.
How can this be solved?
When i run it on Fedora 21, it works fine. But when i run it on Ubuntu 14.04.2 LTS I get the same error as described above.
Error response from daemon: Cannot destroy container xxx_jenkinsMaster_1230: Driver aufs failed to remove root filesystem 13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d: rename /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d-removing: device or resource busy
The main difference is, that Ubuntu uses AUFS, where Fedora uses Devicemapper. Maby thats the problem.
@rjnagal I can confirm that this issue happens on Ubuntu trusty x64 with Doceker 1.8.1, cadvisor:latest and devicemapper.
'1cb6051b30a1' being the container ID.
# grep -l 1cb6051b30a1 /proc/*/mountinfo
/proc/1963/mountinfo
# ps aux | grep -i 1963
root 1963 1.9 0.8 588740 71688 ? Ssl Aug26 30:08 /usr/bin/cadvisor
root 14767 0.0 0.0 11744 952 pts/0 S+ 00:56 0:00 grep --color=auto -i 1963
Please suggest a workaround for this.
same here with CentOS + Docker 1.8.1(devicemapper)
Had to remove --volume=/:/rootfs:ro
&& --volume=/var/lib/docker:/var/lib/docker:ro
@rjnagal: Excepting disk usage calculation, cAdvisor does not poke at any of these directories right?
On Fri, Aug 28, 2015 at 12:26 AM, Jihoon Chung notifications@github.com wrote:
same here with CentOS + Docker 1.8.1(devicemapper)
Had to remove --volume=/:/rootfs:ro && --volume=/var/lib/docker:/var/lib/docker:ro
— Reply to this email directly or view it on GitHub https://github.com/google/cadvisor/issues/771#issuecomment-135661164.
Same problem here with Ubuntu 14.04.3.
@difro solution works but cadvisor can't provide docker stats anymore.
Any workaround?
The last time I ran into this problem, I digged a little bit into the cAdvisor source code. I'm not 100% sure - because it was a few weeks ago - but this is essentially the gist:
If you use cAdvisor like it is shown in README.md you'll mount /var/lib/docker
as a volume into the container. This will create dead containers.
The reason, cAdvisor wants you to mount /var/lib/docker
is - as far as I could see - only to display a certain info that is only interesting for admins and should be known before hand.
We should be able to get all info from a docker inspect
rather than parsing the container config file. Seems like mounting /var/lib/docker
is causing more trouble than it's worth.
we also encounter the same problem (cadvisor:lastest
, ubuntu 14.04
)
any updates regarding this?
The best we can do for now is to let users optionally disable filesystem usage metrics. We are waiting for some of the new upstream kernel features to simplify disk accounting.
On Tue, Jan 26, 2016 at 2:51 PM, Sven Müller notifications@github.com wrote:
any updates regarding this?
— Reply to this email directly or view it on GitHub https://github.com/google/cadvisor/issues/771#issuecomment-175277349.
Same situation. My Docker Version is 1.9.1 Cadvisor version 0.18.0
And when docker rm container fails, the status of that container change to "dead" . Is it possible to umount that specific mountpoint when container status changed to "exit" or "dead" ?
+1
cAdvisor doesn't mount anything. It runs du
periodically to collect
filesystem stats. Other than that, it does not touch the container's
filesystem at all.
The easy fix for this would be to retry docker deletion or disable
filesystem aggregation in cadvisor.
On Wed, Feb 3, 2016 at 2:57 PM, Alex Rhea notifications@github.com wrote:
+1
— Reply to this email directly or view it on GitHub https://github.com/google/cadvisor/issues/771#issuecomment-179518025.
running cAdvisor without --volume=/:/rootfs:ro
seems to fix it.
As pointed out in https://github.com/google/cadvisor/blob/master/docs/running.md
I haven't fully tested it yet, but works fine up to now
I had to remove the following volume mounts:
Setup:
Upgraded docker to 1.10.3 and now cAdvisor can only see the docker images, but no containers, if I only use volume mounts:
If I add /:/rootfs:ro
, cAdvisor can see the containers, but I get device or resource busy
, when trying to remove any container.
@xbglowx Are you using the latest cadvisor release?
Using google/cadvisor:v0.22.0
Any ideas or suggestions how can i dig inside the issue?
cc @timstclair
I was able to reproduce this locally with docker v1.9.1 and cAdvisor 0.22.0, but only right after starting cAdvisor and only once (removing a second container works). I could not reproduce with docker v1.11.
Is this consistent with everyone else's experience?
With docker 1.11.1 the is issue is gone. With the latest fixes from docker part, seems working now.
I'm still able to reproduce this with docker 1.11.1 and cAdvisor 0.23.0. Ubuntu 14.04.
@ashkop Can you try running cAdvisor with --disable_metrics="tcp,disk"
and see if that resolves the issue? Note that you will not get docker container filesystem metrics by adding this flag.
If I try using --disable_metrics="tcp,disk"
I get the following:
sudo docker run -ti -v /var/lib/docker/:/var/lib/docker:ro -v /var/run:/var/run:rw -v /sys:/sys:ro -v /:/rootfs:ro google/cadvisor --disable_metrics="tcp,disk"
panic: assignment to entry in nil map
goroutine 1 [running]:
panic(0xb0c8c0, 0xc8201c0440)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
main.(*metricSetValue).Set(0x15ac528, 0x7ffe3cea1f59, 0x8, 0x0, 0x0)
/go/src/github.com/google/cadvisor/cadvisor.go:85 +0x1da
flag.(*FlagSet).parseOne(0xc82004e060, 0xc82005e901, 0x0, 0x0)
/usr/local/go/src/flag/flag.go:881 +0xdd9
flag.(*FlagSet).Parse(0xc82004e060, 0xc82000a100, 0x2, 0x2, 0x0, 0x0)
/usr/local/go/src/flag/flag.go:900 +0x6e
flag.Parse()
/usr/local/go/src/flag/flag.go:928 +0x6f
main.main()
/go/src/github.com/google/cadvisor/cadvisor.go:99 +0x68
This is with cAdvisor version 0.23.0 (750f18e)
. Works fine with 0.22.0.
I still need to see if using --disable_metrics="tcp,disk"
fixes the problem.
Yeah, that was fixed in https://github.com/google/cadvisor/pull/1259, but it's not integrated into any release.
@vishh Unfortunately the flag didn't help. As @xbglowx mentioned, this option causes 0.23.0 to crash, so I tried 0.22.0 and canary. Both still prevent me from removing containers. Here's the error message I get:
Error response from daemon: Unable to remove filesystem for 9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473: remove /var/lib/docker/containers/9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473/shm: device or resource busy
Same here on Debian 8, Docker 1.11.1 and latest cAdvisor.
@timstclair Can we make a v0.23.1 release with the fix for --disable_metrics
flag?
I am experiencing the same issue with the following versions
"cAdvisor version: 0.23.0-750f18e" google/cadvisor latest 5cda8139955b 8 days ago 48.92 MB
CentOS Linux release 7.2.1511 (Core) Docker version 1.11.1, build 5604cbe
Work around was to remove /var/lib/docker from the shared volume.
@vishh Is this fixed if we just stopped tracking disk metrics for these machines? Are there other dependencies?
@rjnagal Disk metrics should be the only dependency. Disabling that by using --disable_metrics=tcp,disk
should fix this issue.
Can we do that by default when we detect devicemapper?
@rjnagal AFAIK, it is not limited to devicemapper alone. AUFS is also affected. If we need a default solution, we will have to disable per-container disk metrics by default.
The issue persists in v0.23.1 on CentOS7, Docker 1.10.1, devicemapper
docker run \
--rm \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
google/cadvisor:v0.23.1 \
-docker_only \
--disable_metrics="tcp,disk"
To add more info - the issue persists on v0.23.1 and v0.23.2 on CentOS7, Docker 1.11.1, devicemapper.
However the issue only occurs when cadvisor is run from docker. Running cadvisor directly on CentOS7 works without issues.
Could you add more details about your repro steps? How many containers are you running, with what options? It would help if we could reproduce from a clean VM centos image.
I tried to reproduce it on fresh VM, but failed. I'll try to find the difference that is actually causing the issue. Meanwhile I did lsof
inside the cadvisor
container of the file that is being blocked. Here's what I got:
1 /usr/bin/cadvisor pipe:[70918923]
1 /usr/bin/cadvisor pipe:[70918924]
1 /usr/bin/cadvisor pipe:[70918925]
1 /usr/bin/cadvisor socket:[70919220]
1 /usr/bin/cadvisor anon_inode:[eventpoll]
1 /usr/bin/cadvisor anon_inode:inotify
1 /usr/bin/cadvisor socket:[70919240]
I also noticed that issue occurs only if I start cadvisor
after my own containers. If cadvisor
is the first one started, then I can restart my containers without any issue.
@ashkop That's actually correct. I tried to reproduce the error, but couldn't. If the other containers are started first, only then cadvisor blocks removal.
Here's a script to replicate the error on CentOS 7.
You will need a machine with an empty block device (just replace the path to the device in DOCKER_DATA_DISK
) and it will setup docker with devicemapper through lvm's thin-pool, run a container, then cadvisor and then stop & rm the first container.
#!/bin/bash
DOCKER_DATA_DISK=/dev/vdb
set -exo pipefail
setenforce Permissive
yum update -y
yum install -y lvm2
systemctl enable lvm2-lvmetad
systemctl start lvm2-lvmetad
pvcreate $DOCKER_DATA_DISK
vgcreate data $DOCKER_DATA_DISK
lvcreate -l 100%free -T data/docker_thin
curl -sSL https://get.docker.com/ | sh
mkdir -p /etc/systemd/system/docker.service.d
cat <<EOF > /etc/systemd/system/docker.service.d/docker-lvm.conf
[Service]
ExecStart=
ExecStart=/usr/bin/docker daemon -H fd:// \
-s devicemapper \
--storage-opt dm.thinpooldev=/dev/mapper/data-docker_thin
TimeoutStartSec=3000
EOF
systemctl daemon-reload
systemctl enable docker
systemctl start docker
sleep 3
docker run \
--name=test \
-d \
debian:jessie \
/bin/sh -c "while true; do foo; sleep 1; done"
docker run \
-d \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--name=cadvisor \
google/cadvisor:v0.23.1 \
-docker_only \
--disable_metrics="tcp,disk"
docker stop test
docker rm test
The output is:
... some data ...
+ docker stop test
test
+ docker rm test
Error response from daemon: Unable to remove filesystem for 7d7513b0c3310f26e7425728f9c34e219db53a5e4dbb6e0e4259c2e6eb760044: remove /var/lib/docker/containers/7d7513b0c3310f26e7425728f9c34e219db53a5e4dbb6e0e4259c2e6eb760044/shm: device or resource busy
On Ubuntu 14.04, using --disable_metrics="tcp,disk"
still does not fix the problem. I've confirmed @ashkop 's observation: If cAdvisor is started after another container, then removing said container fails.
To get around this issue i have tried running cadvisor as standalone..however it does not get data while i am using RHEL , cadvisor complains "unable to get fs usage from thin pool for device".. it seems it cant get right information about the storage driver. Using RHEL 7.1 version 0.23.3 (6607e7c) docker 1.9.1
Anybody tried similar
This issue is hitting us often and affecting production container deployments (Debian 8.5 hosts, Docker 1.11.1).
Can anyone spell out what we lose by omitting the /:/rootfs:ro
mount? Is it just disk usage metrics?
AFAIK, it should be just the disk usage metrics
On Tue, Jul 19, 2016 at 2:38 PM, Shane StClair notifications@github.com wrote:
This issue is hitting us often and affecting production container deployments (Debian 8.5 hosts, Docker 1.11.1).
Can anyone spell out what we lose by omitting the /:/rootfs:ro mount? Is it just disk usage metrics?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/cadvisor/issues/771#issuecomment-233774348, or mute the thread https://github.com/notifications/unsubscribe-auth/AGvIKN3e53lwmDwcVP7hDBloCHdfD_Dsks5qXUO_gaJpZM4FBIxe .
Hi all, I have a problem using cadvisor on centos 7. When cadvisor is running, docker failes to remove other containers saying that the containers filesystem is busy. After stopping cadvisor is stopped container removal is working again.
I demostrated that in this gist: https://gist.github.com/cornelius-keller/0fd2d23b68ccd88c9328
I also included os version and docker info in the gist.