Open lbernail opened 3 years ago
@AkihiroSuda I guess this is the fuse-overlay? Seems it doesn't invalidate the fuse cache or writeback is not implemented correctly.
edit: or is it the opaque handling? I initially looked that files where not there after clone but the issue seems to be that layer removes files and adds some new ones but in the next layer both appear.
@giuseppe Is this something known/tracked for fuse-overlayfs?
It could be fuse-overlayfs.
What version of fuse-overlayfs is it?
What is the underlying file system? Is it running on top of an another overlay mount?
What version of fuse-overlayfs is it?
docker run -it --rm --entrypoint apk moby/buildkit:v0.9.0-rootless info -v | grep fuse-overlayfs
fuse-overlayfs-1.6-r0
I am not sure I've followed the correct steps, but I've tried a simple docker build --no-cache .
as rootless on Ubuntu 20.04.2 with kernel 5.4.0-73-generic with the Dockerfile above, and this is what I get:
...
Step 7/8 : RUN git checkout $TAG && ls -l vendor/k8s.io/utils/io/ && find | wc -l
---> Running in d62a84ba8eb9
Note: switching to '93915e330be4291f209b13dc6df063729bd4d9c4'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 93915e33 Merge pull request #135 from DataDog/JulienBalestra/readiness
total 4
-rw-r--r-- 1 root root 1748 Oct 5 08:52 consistentread.go
5573
Removing intermediate container d62a84ba8eb9
---> c1831ee2992d
Step 8/8 : RUN echo "Checking content" && ls -l vendor/k8s.io/utils/io/ && find | wc -l
---> Running in 713d365767c0
Checking content
total 12
-rw-r--r-- 1 root root 134 Oct 5 08:52 README.md
-rw-r--r-- 1 root root 1748 Oct 5 08:52 consistentread.go
-rw-r--r-- 1 root root 2982 Oct 5 08:52 read.go
5605
Removing intermediate container 713d365767c0
---> 175956894c15
Successfully built 175956894c15
fuse-overlays is not even installed on the system.
@giuseppe we run the rootless buildkitd in a kubernetes pod with no specific mounts so the underlying fs for builds is the root fs of the container (overlay) We will try using an ephemeral volume to see if using ext4 as the underlying fs solves the issue
@AkihiroSuda moving to the native snapshotter solved all our problems with invalid layers but we took a huge hit in terms of performance (io throughput is much higher). I assume that's expected and that it is the only impact we should see from using this snapshotter?
We checked on our side and we are not using fuse-overlay (we don't load the fuse module and we don't run the k8s-hostdev-plugin plugin: https://github.com/moby/buildkit/pull/1384) So it seems the problem is really overlay in rootless mode (which confirms what you saw @giuseppe ) We will try with a more recent kernel next week (so far we have tested on 5.8 and we will test on 5.11) and we will keep you updated
I'm not 100% sure but I think I may have found the (or a) problem.
I simulated what happens directly on a linux host:
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..
At this point, lower
has:
ls -l lower/vendor/k8s.io/utils/io
total 8
-rw-rw-r-- 1 ddeng ddeng 2982 Oct 10 15:08 read.go
-rw-rw-r-- 1 ddeng ddeng 134 Oct 10 15:08 README.md
And here is what we get in merged
:
ls -l merged/vendor/k8s.io/utils/io
total 4
-rw-rw-r-- 1 ddeng ddeng 1748 Oct 10 15:09 consistentread.go
Which is the expected content.
What about upper
?
ls -l upper/vendor/k8s.io/utils/io
total 4
-rw-rw-r-- 1 ddeng ddeng 1748 Oct 10 15:09 consistentread.go
No whiteouts, let's look at the directory:
sudo getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
# file: upper/vendor/k8s.io/utils/io
trusted.overlay.opaque="y"
=> Directory is marked as opaque and the lower directory is ignored
Let's do the same in a user namespace, starting from scratch:
unshare -m -p -f -U -r bash
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..
At this point everything is the same, but
getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
returns nothing in the userns. The directory has the right xattr in the host user namespace (trusted.overlay.opaque="y"
), but we don't have the info in the userns so we can't add it when we snapshot.
So, I think the problem is with the extended attribute: given it's part of the Trusted
namespace we can't see it in a user namespace.
I suspect this only impacts opaque
directories and we can likely create a much simpler reproducer.
There is a solution in 5.11: https://github.com/torvalds/linux/commit/2d2f2d7322ff43e0fe92bf8cccdc0b09449bf2e1
And it looks promising:
unshare -m -p -f -U -r bash
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
mount -t overlay overlay -o userxattr -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..
Same as before, with the additional mount option from 5.11: -o userxattr
And as a result:
getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
# file: upper/vendor/k8s.io/utils/io
user.overlay.opaque="y"
Now I wonder if this will work with the overlay snapshotter because the new mount option is explicit and only work on 5.11+ kernels. We'll update our buildkit platform to 5.11 and will let you know what we find.
I'm not sure what happens between 2 RUN commands in buildkit. Are we archiving the diff into a tarball before processing to next one? (this seems to rely on github.com/docker/docker/pkg/archive )
Slightly related: https://github.com/moby/buildkit/pull/2181 computes diff from the overlayfs upper dir (which requires trusted.overlay.opaque
or user.overlay.opaque
). Before this PR, diff relied on walking differ (which is slow but should always be right I think). Once this PR goes into a release it may break some setups (kernel < 5.11). It does not seem to be our problem here because 0.9.0 still uses the walking differ (hence my first question on archive)
Are we archiving the diff into a tarball before processing to next one?
No. Just the previous upper becomes an additional lower.
@AkihiroSuda If the issue is that overlay in rootless mode doesn't work correctly for <5.11 kernels (at least for ubuntu) we should try to detect it and run fuse for these as well. cc @ktock for the overlay differ part
No. Just the previous upper becomes an additional lower.
This means we do not go through the Differ or Archiver ? Looks like I have another code path to explore. So we are unmounting and remounting using upper as an additional lower directly? (I wonder because I feel this should work, unless we do a copy, which may lead to loss of the xattr). I can easily check this on a 5.8 host
Differ only runs at the end of the build if an image result was needed.
OK I confirm this is the problem on 5.8:
unshare -m -p -f -U -r bash
mkdir lower merged upper work
git clone https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner.git lower
mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged
cd merged/ ; git checkout 93915e330be4291f209b13dc6df063729bd4d9c4 ; cd ..
At this point, merged has the right content and xattr does not show in userns:
ls merged/vendor/k8s.io/utils/io
consistentread.go
getfattr -R -d -m "" upper/vendor/k8s.io/utils/io
=> nothing
If we proceed with a new mount including upper as an additional layer:
umount merged
mkdir upper2
mount -t overlay overlay -olowerdir=upper:lower,upperdir=upper2,workdir=work merged
the content becomes wrong, showing the xattr is ignored for the second mount:
ls merged/vendor/k8s.io/utils/io
consistentread.go read.go README.md
On 5.11, this works completely fine (as long as -o userxattr
is added to mount options)
I confirm that the problem disappears with Ubuntu 20.04.3 and kernel 5.11 , which is great news 🎉 I agree with @tonistiigi I think we should not use overlayfs in kernel <5.11 even on Ubuntu: it will mostly work but will have weird edge cases as the one we encountered
I just noticed we have a lot of
[22995.791577] overlayfs: lowerdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
in our kernel logs. I assume we don't unmount overlayfs between the different mount operations used for RUN statements? It's likely just a warning and this does not impact our builds
Hello,
We are starting to run buildkit in rootless mode and in some cases, we seem to be having issues with the snapshotter. For instance, with this Dockerfile
We get this with buildkit in rootless mode:
Of course we'd expect content of the layer at step
#8
to be the same as the content at the end of step#7
We tested building in non-rootless mode and the problem disappeared. After looking into past issues, we found https://github.com/moby/buildkit/issues/1792 which felt a bit similar, so we tried in rootless mode with
--oci-worker-snapshotter=native
and the problem also disappeared.So the problem only happens in rootless mode with overlayfs snapshotter. Here is the configuration where we see this behavior:
--oci-worker-no-process-sandbox
Is there a way to diagnose what is failing when we use the overlayfs snapshotter? I could not find the impact of using the native snapshotter instead of the overlayfs one. I assume slower build?
Of course, we are more than happy to perform additional tests to gather more data