nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.78k stars 152 forks source link

/var/lib/sysbox is not cleaned up when Docker uses the older overlay storage driver #462

Closed href closed 2 years ago

href commented 2 years ago

I'm working on providing GitLab runners with sysbox-runc. On my runners I noticed that /var/lib/sysbox only ever grows.

From a cursory look at the source code it looks to me like this is only ever cleaned up if the container is stopped gracefully. If the container is killed, as is the case with GitLab runners until https://gitlab.com/gitlab-org/gitlab-runner/-/issues/6359 is implemented, the files remain.

For now I'm working around this issue with the following cron-job, and I do think this should be fixed in the GitLab runners, but maybe there's an approach that doesn't rely on proper container shutdown:


ls -d -1 /var/lib/sysbox/docker/* 2> /dev/null
| grep -wvf <(docker ps -a --no-trunc | tail -n +2 | cut -d ' ' -f 1)
| xargs --no-run-if-empty rm -rf
ctalledo commented 2 years ago

Hi @href,

Thanks for filing the issue.

From a cursory look at the source code it looks to me like this is only ever cleaned up if the container is stopped gracefully.

Not necessarily; Sysbox also monitors the container's rootfs (e.g, /var/lib/docker/overlay2/<uuid>/merged) and if it detects it has been removed, then it will also cleanup the associated state in /var/lib/sysbox.

If the container is killed, as is the case with GitLab runners

How exactly is the container killed in this case? Through a docker command, with a kill -9 signal to the container's init process, or some other way?

Thanks.

href commented 2 years ago

According to GitLab, those are the two steps that are executed:

From the source it seems like that removes the volumes and runs kill -9. Now I just assume that this is the problem. Maybe I should add that I use the runners to build docker in docker.

All I know for sure is that /var/lib/sysbox is not clean, even when docker ps -a is. This happens reliably whenever I do a CI run.

ctalledo commented 2 years ago

I would think "removeContainer" would invoke "docker rm", which in turn would cause Docker to remove the container's rootfs under /var/lib/docker/overlay2/<uuid>/, which in turn would cause Sysbox to remove the associated contents under /var/lib/sysbox. I confirmed this works well manually, not sure why it fails with GitLab.

If possible, please reproduce the problem and attach the sysbox-mgr log (journalctl --no-pager -u sysbox-mgr).

Normally, that log should show the following messages for a container's lifecycle:

Also, please send me the output of sudo tree -L 2 /var/lib/sysbox in your host, so I can see what's being left behind under that directory and correlate that to the sysbox-mgr log.

href commented 2 years ago

I reproduced the problem again (it seems to happen every time, but my workload is not diverse yet, so maybe there are CI runs that would work - I just haven't encountered them):

Journal output:

Feb 02 14:29:00 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:00" level=info msg="registered new container f5b7d04ff717"
Feb 02 14:29:00 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:00" level=info msg="unregistered container f5b7d04ff717"
Feb 02 14:29:01 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:01" level=info msg="registered new container 327e34d1c083"
Feb 02 14:29:01 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:01" level=info msg="unregistered container 327e34d1c083"
Feb 02 14:29:01 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:01" level=info msg="registered new container bb34e6057e6f"
Feb 02 14:29:02 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:02" level=info msg="unregistered container bb34e6057e6f"
Feb 02 14:29:02 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:02" level=info msg="registered new container 47029fa61d29"
Feb 02 14:29:29 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:29" level=info msg="unregistered container 47029fa61d29"
Feb 02 14:29:30 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:30" level=info msg="registered new container b4caa3282fab"
Feb 02 14:29:31 lab-shared-ci-runner1-rma1 sysbox-mgr[1869006]: time="2022-02-02 14:29:31" level=info msg="unregistered container b4caa3282fab"

Tree output:

/var/lib/sysbox
├── containerd
│  ├── 27ebda6d893322a5a169e8a2341d25afdecbb5a95f56b9c44ec2e822a7592527
│  ├── 327e34d1c0830d1ad3067d3e7a638d92696752abcd6e9d9a30500330ab1bf55b
│  ├── 47029fa61d29deccfab1b2d526a6621e0b20ea9379a9426edce127d2bca6c3eb
│  ├── 5ea6d63cde2e71486e41e240a1dec59d84f0210561f6381dfcdf859ee762bb57
│  ├── 622c489e87a0177f41a399ebc1c91c00037f978b5d285d015dccb49451cb4e00
│  ├── a9d9baccdc119498efcd408c11a456912e4758a68446aeb378c58f3e26bfbb79
│  ├── b4caa3282fabb7ad0998a29df970d1ee037bf166580ff5f56a8a3913cfb1a199
│  ├── b82c8e5039454a966f432c3933f105946933d6f23fc208519d374a959c2b15c3
│  ├── bb34e6057e6f06cd2354e6bed7bda1130eb3b47730ce5521493198208d542bf5
│  ├── cb27260d42e9c6b9b32c7f8efc45f0bfd96b4c3a2f164b0268d982b67b67e71b
│  ├── d25990aa417c180bd5c24703ea20b90da2e861df9f7eaeb40d114aba5f3c778e
│  ├── e1640418af6983b2323299578a09f97b3aeb18a445f1d85c6bdfeef971acd42f
│  ├── f1e5b5cdfc3ed8f74ab80e5323a56fa15ffbe63b45e25b94609f235d6a11d4f4
│  └── f5b7d04ff717959652c144b196364c01da09d89d32265d5d0b7a005931eff048
├── docker
│  ├── 327e34d1c0830d1ad3067d3e7a638d92696752abcd6e9d9a30500330ab1bf55b
│  ├── 47029fa61d29deccfab1b2d526a6621e0b20ea9379a9426edce127d2bca6c3eb
│  ├── b4caa3282fabb7ad0998a29df970d1ee037bf166580ff5f56a8a3913cfb1a199
│  ├── bb34e6057e6f06cd2354e6bed7bda1130eb3b47730ce5521493198208d542bf5
│  └── f5b7d04ff717959652c144b196364c01da09d89d32265d5d0b7a005931eff048
├── kubelet
│  ├── 27ebda6d893322a5a169e8a2341d25afdecbb5a95f56b9c44ec2e822a7592527
│  ├── 327e34d1c0830d1ad3067d3e7a638d92696752abcd6e9d9a30500330ab1bf55b
│  ├── 47029fa61d29deccfab1b2d526a6621e0b20ea9379a9426edce127d2bca6c3eb
│  ├── 5ea6d63cde2e71486e41e240a1dec59d84f0210561f6381dfcdf859ee762bb57
│  ├── 622c489e87a0177f41a399ebc1c91c00037f978b5d285d015dccb49451cb4e00
│  ├── a9d9baccdc119498efcd408c11a456912e4758a68446aeb378c58f3e26bfbb79
│  ├── b4caa3282fabb7ad0998a29df970d1ee037bf166580ff5f56a8a3913cfb1a199
│  ├── b82c8e5039454a966f432c3933f105946933d6f23fc208519d374a959c2b15c3
│  ├── bb34e6057e6f06cd2354e6bed7bda1130eb3b47730ce5521493198208d542bf5
│  ├── cb27260d42e9c6b9b32c7f8efc45f0bfd96b4c3a2f164b0268d982b67b67e71b
│  ├── d25990aa417c180bd5c24703ea20b90da2e861df9f7eaeb40d114aba5f3c778e
│  ├── e1640418af6983b2323299578a09f97b3aeb18a445f1d85c6bdfeef971acd42f
│  ├── f1e5b5cdfc3ed8f74ab80e5323a56fa15ffbe63b45e25b94609f235d6a11d4f4
│  └── f5b7d04ff717959652c144b196364c01da09d89d32265d5d0b7a005931eff048
├── rancher-k3s
│  ├── 27ebda6d893322a5a169e8a2341d25afdecbb5a95f56b9c44ec2e822a7592527
│  ├── 327e34d1c0830d1ad3067d3e7a638d92696752abcd6e9d9a30500330ab1bf55b
│  ├── 47029fa61d29deccfab1b2d526a6621e0b20ea9379a9426edce127d2bca6c3eb
│  ├── 5ea6d63cde2e71486e41e240a1dec59d84f0210561f6381dfcdf859ee762bb57
│  ├── 622c489e87a0177f41a399ebc1c91c00037f978b5d285d015dccb49451cb4e00
│  ├── a9d9baccdc119498efcd408c11a456912e4758a68446aeb378c58f3e26bfbb79
│  ├── b4caa3282fabb7ad0998a29df970d1ee037bf166580ff5f56a8a3913cfb1a199
│  ├── b82c8e5039454a966f432c3933f105946933d6f23fc208519d374a959c2b15c3
│  ├── bb34e6057e6f06cd2354e6bed7bda1130eb3b47730ce5521493198208d542bf5
│  ├── cb27260d42e9c6b9b32c7f8efc45f0bfd96b4c3a2f164b0268d982b67b67e71b
│  ├── d25990aa417c180bd5c24703ea20b90da2e861df9f7eaeb40d114aba5f3c778e
│  ├── e1640418af6983b2323299578a09f97b3aeb18a445f1d85c6bdfeef971acd42f
│  ├── f1e5b5cdfc3ed8f74ab80e5323a56fa15ffbe63b45e25b94609f235d6a11d4f4
│  └── f5b7d04ff717959652c144b196364c01da09d89d32265d5d0b7a005931eff048
└── shiftfs

Disk usage:

$ du -h -d 1 /var/lib/sysbox
60K /var/lib/sysbox/rancher-k3s
60K /var/lib/sysbox/kubelet
4.0K    /var/lib/sysbox/shiftfs
60K /var/lib/sysbox/containerd
445M    /var/lib/sysbox/docker
445M    /var/lib/sysbox
ctalledo commented 2 years ago

Hi @href, thanks for posting the info.

From the sysbox-mgr log snip you posted, it's clear Sysbox is not removing the state for the containers (i.e., I don't see logs such as released resources for container <UUID>).

Just to make sure:

1) I assume the rest of the sysbox-mgr log does not have such messages correct? 2) I assume that this was taken when all containers were stopped & removed by GitLab, correct?.

Can you also list the contents of ls -l /var/lib/docker/overlay2 ?

The sysbox-mgr monitors the contents of that directory to determine when a Docker container has been removed; in such an event, the sysbox-mgr removes the associated state under /var/lib/sysbox.

Thanks again.

ctalledo commented 2 years ago

Also, if you would like, feel free to join the Sysbox slack channel, as that is sometimes the best way to interact to debug problems like this one.

href commented 2 years ago

From the sysbox-mgr log snip you posted, it's clear Sysbox is not removing the state for the containers (i.e., I don't see logs such as released resources for container ).

I assume the rest of the sysbox-mgr log does not have such messages correct?

That's correct, I don't see released resources for container. I grepped the log of the last seven days for released and found nothing.

I assume that this was taken when all containers were stopped & removed by GitLab, correct?.

Correct.

Can you also list the contents of ls -l /var/lib/docker/overlay2 ?

root@host:~# ls -l /var/lib/docker/overlay2
total 4
drwx------ 2 root root 4096 Jan 14 13:00 l
root@host:~# find /var/lib/docker/overlay2
/var/lib/docker/overlay2
/var/lib/docker/overlay2/l

Because that l directory looked funny to me I removed it and re-ran the tests, but that didn't change the result.

What is interesting is that when I restart the sysbox service, the sysbox-mgr log thinks the docker images are still running, even though docker ps -a is empty:

... level=warning msg="The following containers are active and will stop operating properly:"
... level=warning msg="container id: 3685f5fadf64"
... level=warning msg="container id: 58169b4e5416"
... level=warning msg="container id: e5f5adb9ec42"
... level=warning msg="container id: f2723af91357"
... level=warning msg="container id: 46af7b38584a"
... level=warning msg="container id: 9e8cac285e96"

And I don't know if this matters, but we run /var/lib/docker on its own LVM partition. Since you mention that you watch for changes in the overlay2 directory. I guess depending on where you attach your watch process it might happen outside the filesystem boundary? Just a wild guess.

Also, if you would like, feel free to join the Sysbox slack channel, as that is sometimes the best way to interact to debug problems like this one.

If you don't mind the issue format suits me well as this is a bit of an on-off kind of project for me so I don't necessarily have any dedicated time I could chat. But I joined and I'll be hanging out if you have any questions - just can't guarantee I'll be responsive.

ctalledo commented 2 years ago

Hi @href,

That's correct, I don't see released resources for container. I grepped the log of the last seven days for released and found nothing.

Got it, that would explain why /var/lib/sysbox is not clean after removing all containers. We now how to figure out why Sysbox is not detecting the Docker container removal.

And I don't know if this matters, but we run /var/lib/docker on its own LVM partition. Since you mention that you watch for changes in the overlay2 directory. I guess depending on where you attach your watch process it might happen outside the filesystem boundary? Just a wild guess.

Sysbox watches for changes in each container's root-filesystem (which for Docker is under directory /var/lib/docker/overlay2/<uuid>/merged). It uses the fsnotify golang lib, which in turn relies on Linux's inotify mechanism.

I re-read the whole inotify description, and don't see any mention of it not working across filesystems, so in theory the fact that you have /var/lib/docker in a dedicated LVM partition should not matter. But I still suspect the problem is related to this. You can help me confirm by temporarily placing the Docker rootfs on the same filesystem where the sysbox-mgr process runs and see if the problem reproduces.

I also see a minor bug in sysbox-mgr: when it adds the inotify check on the container's rootfs, it does not check for errors (see here). Thus, I the kernel fails to add the inotify watch, it will be silently ignored.

href commented 2 years ago

I finally got it. It wasn't a problem with LVM, but with the docker storage driver. I had overlay configured (that was the default with the configuraiton management tool we employ). I switched to overlay2, and the release resources messages finally start popping up, and /var/lib/sysbox is cleaned up properly.

So I guess this issue could be renamed into something like "/var/lib/sysbox is not cleaned up with the overlay storage driver". Because that seems to be the problem. For me, this issue is solved, because there's no reason for me not to use overlay2.

Thank you for your help - glad to have found the root cause!

ctalledo commented 2 years ago

I had overlay configured (that was the default with the configuraiton management tool we employ). I switched to overlay2, and the release resources messages finally start popping up, and /var/lib/sysbox is cleaned up properly.

Ah ... interesting. Sysbox may have a bug then, probably unnoticed since most users use the overlay2 storage driver with Docker.

So I guess this issue could be renamed into something like "/var/lib/sysbox is not cleaned up with the overlay storage driver". Because that seems to be the problem

Makes sense, I'll rename it.

Thank you for your help - glad to have found the root cause!

Thank you @href for reporting it and helping us debug it.

ctalledo commented 2 years ago

Issue was root-caused, closing.