Closed felipecrs closed 3 years ago
Hi @felipecrs, thanks for reporting the issue.
What version of Sysbox are you using?
I ask because this looks similar to issue #254 (search for unlinkat in that issue), which is supposed to be fixed.
Without the PVC the error does not happen
That's strange; I noticed you are using storageClassName: rbd
for the PVC; I wonder if that makes a difference. Have you tried with local
or hostPath
?
Thanks!
I installed Sysbox today through the daemonset, so I it's the newest version:
$ sysbox-runc --version
sysbox-runc
edition: Community Edition (CE)
version: 0.4.1
commit: d540126188a1e8595c8f769aeb91833002c37b3a
built at: Fri Oct 1 19:33:49 UTC 2021
built by: Rodny Molina
oci-specs: 1.0.2-dev
I will try to change the PVC and check.
Well, I believe that using local volume works, as inside of the Dockerfile
there is a VOLUME
for the docker root path.
You are using the latest Sysbox, so the root cause must be something different from issue #254 (though it has the same symptom).
I will try to change the PVC and check.
Yes, please check and confirm if this works or not. If it does work, it gives us a strong clue that the problem is related to the storageClassName: rbd
.
Ops, with emptyDir
the same issue happens, so I guess we can remove PVC from the loop, this is the command:
$ kubectl run dind --rm -i --image ghcr.io/felipecrs/jenkins-agent-dind:latest --pod-running-timeout=3m --overrides='
{
"metadata": {
"annotations": {
"io.kubernetes.cri-o.userns-mode": "auto:size=65536"
}
},
"spec": {
"containers": [
{
"image": "ghcr.io/felipecrs/jenkins-agent-dind:latest",
"name": "dind",
"imagePullPolicy": "Always",
"tty": true,
"volumeMounts": [
{
"mountPath": "/home/jenkins/agent",
"name": "workspace-volume",
"readOnly": false
}
],
"command": ["/entrypoint.sh", "bash"]
}
],
"runtimeClassName": "sysbox-runc",
"volumes": [
{
"name": "workspace-volume",
"emptyDir": {}
}
]
}
}
'
I will rephrase the Issue.
@ctalledo I updated the issue description, it has also more logs such as the docker version and info within the sysbox pod. Can you please take a look again?
Thanks Felipe, will try to repro and debug a bit later today.
On my image, /home/jenkins/agent/docker
is the docker root dir rather than /var/lib/docker
. Wondering if it has something to do with the issue.
But when testing with docker:dind
I mounted the volume under /var/lib/docker
, of course.
On my image,
/home/jenkins/agent/docker
is the docker root dir rather than/var/lib/docker
. Wondering if it has something to do with the issue.
No that should not matter. I was thinking it's probably related to the type of storage of the PVC, though the fact that if fails with something as simple as emptyDir does not gel with that theory.
@ctalledo I furthered tested here. It turns out that it actually matters. Using /var/lib/docker
with my image works:
Notice that I delete the /etc/docker/daemon.json
so that I restore the data root dir before starting the daemon.
kubectl run dind --rm -i --image ghcr.io/felipecrs/jenkins-agent-dind:latest --pod-running-timeout=3m --overrides='
{
"metadata": {
"annotations": {
"io.kubernetes.cri-o.userns-mode": "auto:size=65536"
}
},
"spec": {
"containers": [
{
"image": "ghcr.io/felipecrs/jenkins-agent-dind:latest",
"name": "dind",
"imagePullPolicy": "Always",
"tty": true,
"volumeMounts": [
{
"mountPath": "/var/lib/docker",
"name": "workspace-volume",
"readOnly": false
}
],
"command": ["bash", "-xec", "sudo rm -f /etc/docker/daemon.json; exec /entrypoint.sh bash -xec \"df /var/lib/docker; docker version; docker info; docker pull gradle\""]
}
],
"runtimeClassName": "sysbox-runc",
"volumes": [
{
"name": "workspace-volume",
"emptyDir": {}
}
]
}
}
'
If you don't see a command prompt, try pressing enter.
INFO[2021-10-06T17:12:48.299069538Z] Loading containers: done.
WARN[2021-10-06T17:12:48.358322334Z] Not using native diff for overlay2, this may cause degraded performance for building images: running in a user namespace storage-driver=overlay2
INFO[2021-10-06T17:12:48.358539871Z] Docker daemon commit=79ea9d3 graphdriver(s)=overlay2 version=20.10.9
INFO[2021-10-06T17:12:48.358668340Z] Daemon has completed initialization
INFO[2021-10-06T17:12:48.534876604Z] API listen on /var/run/docker.sock
Client: Docker Engine - Community
Version: 20.10.9
API version: 1.41
Go version: go1.16.8
Git commit: c2ea9bc
Built: Mon Oct 4 16:08:29 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.9
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: 79ea9d3
Built: Mon Oct 4 16:06:37 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.11
GitCommit: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
[services.d] done.
+ df /var/lib/docker
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 130550852 14376736 109499480 12% /var/lib/docker
+ docker version
Client: Docker Engine - Community
Version: 20.10.9
API version: 1.41
Go version: go1.16.8
Git commit: c2ea9bc
Built: Mon Oct 4 16:08:29 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.9
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: 79ea9d3
Built: Mon Oct 4 16:06:37 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.11
GitCommit: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
+ docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.6.3)
compose: Docker Compose (Docker Inc., v2.0.1)
scan: Docker Scan (Docker Inc., v0.8.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.9
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc version: v1.0.2-0-g52b36a2
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 5.4.0-70-generic
Operating System: Ubuntu 20.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.34GiB
Name: dind
ID: RQFG:7SGQ:QEKF:HFJA:LIU7:RT63:TZSW:GUBS:BAZS:UG6V:KAUD:BQ6W
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
+ docker pull gradle
Using default tag: latest
latest: Pulling from gradle
f3ef4ff62e0d: Pull complete
706b9b9c1c44: Pull complete
0fffb0c672b9: Pull complete
5a54c3905797: Pull complete
830009aaff35: Pull complete
a28d173c1d5d: Pull complete
Digest: sha256:d7a3b2f32d4a78efe3f9c275d97b3c8f728f34a145511c9815b29645cf0eb854
Status: Downloaded newer image for gradle:latest
gradle:latest
[cmd] setpriv exited 0
INFO[2021-10-06T17:13:08.127827498Z] Processing signal 'terminated'
INFO[2021-10-06T17:13:08.130454646Z] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2021-10-06T17:13:08.130854411Z] Daemon shutdown complete
INFO[2021-10-06T17:13:08.130896632Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INFO[2021-10-06T17:13:08.130903359Z] stopping healthcheck following graceful shutdown module=libcontainerd
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
WARN[2021-10-06T17:13:09.131438672Z] grpc: addrConn.createTransport failed to connect to {unix:///var/run/docker/containerd/containerd.sock <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial unix:///var/run/docker/containerd/containerd.sock: timeout". Reconnecting... module=grpc
s6-svscanctl: fatal: unable to control /var/run/s6/services: supervisor not listening
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
Session ended, resume using 'kubectl attach dind -c dind -i -t' command when the pod is running
pod "dind" deleted
Hm... when I mount /home/jenkins/agent
:
$ df /home/jenkins/agent/docker
Filesystem 1K-blocks Used Available Use% Mounted on
/var/lib/sysbox/shiftfs/b3a26b5d-2d6c-40e7-a356-5e29903e7125 130550852 14374468 109501748 12% /home/jenkins/agent/docker
But when I mount /var/lib/docker
:
$ df /var/lib/docker
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 130550852 14376736 109499480 12% /var/lib/docker
It turns out that it actually matters. Using /var/lib/docker with my image works:
I see ... I understand the problem now: you are mounting an emptyDir (i.e., a host dir managed by K8s) into the container's /home/jenkins/agent/docker
. Since the container's mountpoint is /home/jenkins/agent/docker
, Sysbox does not know that it's a mount over the inner Docker's root directory (Sysbox assumes the inner Docker's data root is at /var/lib/docker
).
As a result, Sysbox treats the mount normally and mounts shiftfs on the emptyDir (so that it show up with proper permissions inside the container). The problem is that now we have the Docker data root inside the container working on top of shiftfs, and this does not work (i.e., the inner Docker will try to mount overlayfs on top of shiftfs and this does not work).
You've hit a limitation of Sysbox, documented here: the inner Docker data root must be at /var/lib/docker
. In this case, Sysbox knows that any host volumes mounted on that directory must not have shiftfs on them, and takes required precautions.
Sorry about this, glad we root caused it.
This limitation should go away soon (months), as we transition Sysbox to leverage the new Linux ID-Mapped mount feature which voids the need for shiftfs.
That's interesting. Thank you very much for the explanation.
Currently, I use PVC to give my builds a given amount of disk space through a network storage provider (because my nodes are not capable of providing such amount of storage by theirselves).
For that, I use the Jenkins Kubernetes Plugin for automatically creating a new PVC for each build, and then assigning it to the build's pod. But it's limited to the Jenkins Agent Workspace (/home/jenkins/agent
), and that's the reason why I map the Docker root dir to a folder within the workspace, /home/jenkins/agent/docker
.
I understand that, as you said, it's probably going to be fixed in the future. But for now, is there any workaround that you can think of?
Perhaps a symlink
from /home/jenkins/agent/docker
to /var/lib/docker
, or a bind mount that I can set during my container entrypoint. I will try, but if you have any idea feel free to suggest.
But for now, is there any workaround that you can think of?
Perhaps a symlink from /home/jenkins/agent/docker to /var/lib/docker, or a bind mount that I can set during my container entrypoint. I will try, but if you have any idea feel free to suggest.
I see ... that's a tough one.
A simlink from /home/jenkins/agent/docker
-> /var/lib/docker
would only partially help, because it would mean that the inner Docker will store its inner images in the container's /var/lib/docker
which is backed by a host dir managed by Sysbox (not the PVC). Since you indicated the host does not have much storage, that may not fix the problem per-se.
Let me think if there is another solution.
Well, I actually meant a symlink from /var/lib/docker
to /home/jenkins/agent/docker
, or a similar bind mount.
Something that I thought, but it would require to change sysbox's internals would be a new annotation like:
"metadata": {
"annotations": {
"sysbox.docker-root-dir": "/home/jenkins/agent/docker"
}
}
But have in mind that I don't know much about it so I don't even know if the sysbox-runc
is able to read an annotation.
And of course, if the use-case is worth it.
Well, my tries with symlinks or bind mounts didn't work. Which, I guess, is expected.
Well, I actually meant a symlink from
/var/lib/docker
to/home/jenkins/agent/docker
, or a similar bind mount.
I don't think that will work, because that would mean the inner Docker would store it's data in /home/jenkins/agent/docker
which has the shiftfs mount on it.
If you do the symlink the other way around (from /home/jenkins/agent/docker
-> /var/lib/docker
), then the inner Docker would work fine because /var/lib/docker
is a container mount backed by a host dir managed by Sysbox without shiftfs. This assumes however that your host has a decent amount of storage.
Thinking more about this, I am not sure the idea of having the inner Docker store it's images on a mount (/home/jenkins/agent/docker
) that is backed by a network-based PVC is a good idea. I say so because Docker uses that storage heavily, so if it's network backed it may not perform well. I've not tried it so I may be wrong.
Something that I thought, but it would require to change sysbox's internals would be a new annotation like:
"metadata": { "annotations": { "sysbox.docker-root-dir": "/home/jenkins/agent/docker" } }
I was thinking about something similar (but using env variables passed to the container rather than annotations), but in general we want to avoid this approach. Rather we want Sysbox to be smart enough to setup the container environment the proper way. I would prefer to implement the ID-mapped mount approach to this one, though it would require the host have a Linux kernel >= 5.12.
@felipecrs, question: in the ghcr.io/felipecrs/jenkins-agent-dind:latest
image, how did you configure the inner Docker's data-root? Did you do so by pre-configuring the /etc/docker/daemon.json
file inside that image?
Thinking more about this, I am not sure the idea of having the inner Docker store it's images on a mount (
/home/jenkins/agent/docker
) that is backed by a network-based PVC is a good idea. I say so because Docker uses that storage heavily, so if it's network backed it may not perform well. I've not tried it so I may be wrong.
I understand your thinking 100%. But it turns out that it's a very fast network-based storage poll, it's even faster than the SSDs attached to the nodes their selves. And, I have been running the builds using such arrange since more than an year already, although I'm using the normal privileged flag (that's the reason why I want to migrate to sysbox).
My nodes, as I said, suffer of low disk resources, all have an attached SSD of 128GB. I have builds which requires free 48Gi of space just for them, and it's often to run more in parallel in the same node. Not to count the fact that the kubelet in the node itself keeps downloading images on and on, and for having a decent amount of disk free, I would have to customize the k8s garbage collector on each node to make sure these dangling images gets deleted in time for allocating a pod with a requirement of 48Gi, as example. Sadly, if I let K8s do it on its own, and instead of using the network-based PVC but using a requests
of ephemeral-storage of 48Gi
, simply the pod does not get scheduled to any node as the normal GC threshold would never care about having 48Gi free.
I was thinking about something similar (but using env variables passed to the container rather than annotations), but in general we want to avoid this approach. Rather we want Sysbox to be smart enough to setup the container environment the proper way. I would prefer to implement the ID-mapped mount approach to this one, though it would require the host have a Linux kernel >= 5.12.
I totally understand. Better to do the definitive fix than spending time on the workaround.
But unfortunately, that won't work for me... my nodes all run Ubuntu 18.04, with Linux 5.4. The 5.12 didn't make its way even to 20.04 so I doubt it will do for 18.04 anytime soon. The cluster which I use is rented, and I have no option to ask for Ubuntu 20.04 either (despite as I said, Linux 5.12 is not even available to it yet -- HWE I mean).
Having that said, the definitive fix would probably not be effective for anyone else who is using Ubuntu, as of now. I don't know about Debian, though.
@felipecrs, question: in the
ghcr.io/felipecrs/jenkins-agent-dind:latest
image, how did you configure the inner Docker's data-root? Did you do so by pre-configuring the/etc/docker/daemon.json
file inside that image?
Yes, see:
Hi @felipecrs, thanks for the info.
We've found a simpler solution: when the container starts, Sysbox will look at the container's /etc/docker/daemon.json
file to understand where the inner Docker's data root resides. It will then use this info to setup the mounts for the inner Docker correctly.
@rodnymolina will be working on this soon.
That's really cool, nice approach and it would work for me. I would just suggest to document the fact that dockerd --data-root /other/dir
(as cli option rather than the file) won't be supported.
That's really cool, nice approach and it would work for me. I would just suggest to document the fact that
dockerd --data-root /other/dir
(as cli option rather than the file) won't be supported.
Yep, makes sense ... will do.
@felipecrs, the fix for this one just went in here. Please build Sysbox from sources when have a chance and let us know how it goes.
Sure, I will do. I plan to:
Build from source, create my own sysbox binary container and adjust the daemonset to use my custom image to install in my workers. Does it sounds like a good plan?
I just realized something: I think it would be a good idea to adjust the CI pipelines to publish a tag like :master
of the Sysbox binary images so that it would make it easier for everyone to test new features (unreleased). I can contribute a PR if you like this approach.
Build from source, create my own sysbox binary container and adjust the daemonset to use my custom image to install in my workers. Does it sounds like a good plan?
Let me give you a docker-image that you can point our manifests to.
Hey @rodnymolina, I don't mind building it myself, really. I just don't want that you "waste" your time only because of me... You are already doing much!
Thanks Felipe, but it's not that simple in this case as you don't have access to the code (k8s-installer) to generate these daemon-set docker images. I originally thought that you already had your setup available and that you would simply replace the old sysbox binaries with the new ones. But if that's not the case, then you need a new sysbox-k8s-deploy daemonset ...
Please point our manifests to this image: ghcr.io/nestybox/sysbox-deploy-k8s:issue_406
.
Let me know if any questions.
I just realized something: I think it would be a good idea to adjust the CI pipelines to publish a tag like :master of the Sysbox binary images so that it would make it easier for everyone to test new features (unreleased). I can contribute a PR if you like this approach.
Let us think about this for a bit. Will get back to you. Thanks!
Well, it didn't work at first, then I found that only the focal variant is updated in the image:
I went ahead and replaced the bionic binary with the focal one (and created a custom image for deploying in the daemonset).
But, after installing, everything seems to be working!
@felipecrs, glad to hear that it works!
And yes, should have told you that I had only built the ubuntu-focal image as that's what I found first in the description of this issue -- didn't read a more recent comment where you explicitly mentioned ubuntu-bionic. Sorry for that :-(
Don't worry! :D
The Ubuntu 20.04 is my dind image, btw.
Yes, I see that now :-)
Will go ahead and close this issue. Re-open it if you see anything else related to this fix.
Again, thanks a lot, I couldn't be more grateful. Now I ran into https://github.com/nestybox/sysbox/issues/410.
Some fun pictures:
Before the fix:
After the fix:
I'm glad I have written this pipeline for testing the capabilities of my Jenkins Agents. It allows me to find the issues I would have before I actually change the environment.
Truly a "graphical" description! Let's hope we can make it all green in the new issue :-)
In my Jenkins setup, I use the Kubernetes Plugin to allow it to spawn a new Pod for each build. Now, I'm integrating it with Sysbox, but I found this issue.
Some notes:
docker:dind
or theregistry.nestybox.com/nestybox/ubuntu-bionic-systemd-docker
emptyDir
volume the error does not happenubuntu
sysbox
, withprivileged
it works fine/home/jenkins/agent/docker
is the docker root dir rather than/var/lib/docker
I know there must be something wrong with my image and I will probably refactor my image to make use of systemd as the sysbox sample image does, but I would like to report it here in case someone else faces the same issue.
To reproduce: