moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
7.95k stars 1.11k forks source link

COPY --link: fuse-overlayfs snapshotter use more disk space than overlayfs #3718

Open amurzeau opened 1 year ago

amurzeau commented 1 year ago

Hi,

Using fuse-overlayfs takes more disk space than overlayfs with COPY --link.

With fuse-overlayfs, the cache contains for the COPY --link:

With overlayfs, the cache contains:

When doing COPY --link with a big folder, this more than double the cache disk usage for that COPY --link operation with fuse-overlayfs. I there a way to avoid that ?

Dockerfile:

# syntax=docker/dockerfile:1.4
FROM debian:stretch as install
# file is 1GB
COPY file /opt/file

FROM debian:stretch

COPY --link --from=install /opt /opt/

RUN touch additional_file

buildctl du --verbose output with overlayfs snapshotter:

ID:             yh17vg8h6dffqahkzhpz2h49q
Parents:        fgdje1svtfnaey03bya7bwkwp
Created at:     2023-03-14 22:38:36.000084143 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           1.05GB
Description:    [install 2/2] COPY file /opt/file
Usage count:    1
Last used:      2023-03-14 22:39:13.846597779 +0000 UTC
Type:           regular

ID:             qbbon55g1a825cs7iif6vf1ka
Created at:     2023-03-14 22:38:29.183537132 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           1.05GB
Description:    local source for context
Usage count:    1
Last used:      2023-03-14 22:39:13.839566314 +0000 UTC
Type:           source.local

ID:             ih9vsij2zyqzg2mzgf1rwfb73
Created at:     2023-03-14 22:39:03.665783021 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           1.05GB
Description:    copy /opt /opt/
Usage count:    1
Last used:      2023-03-14 22:39:13.84237765 +0000 UTC
Type:           regular

ID:             fgdje1svtfnaey03bya7bwkwp
Created at:     2023-03-14 22:38:29.180439711 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           161.39MB
Description:    pulled from docker.io/library/debian:stretch@sha256:c5c5200ff1e9c73ffbf188b4a67eb1c91531b644856b4aefe86a58d2f0cb05be
Usage count:    1
Last used:      2023-03-14 22:39:13.833477814 +0000 UTC
Type:           regular

ID:             selly1bltsmy6pl9mwixb4phy
Created at:     2023-03-14 22:38:27.619310615 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           30.58MB
Description:    pulled from docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc
Usage count:    1
Last used:      2023-03-14 22:39:13.836613575 +0000 UTC
Type:           frontend

ID:             ye74olozrwr51kxyjzoigob0t
Created at:     2023-03-14 22:38:26.418070361 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           8.19kB
Description:    local source for dockerfile
Usage count:    1
Last used:      2023-03-14 22:39:13.84425123 +0000 UTC
Type:           source.local

ID:             y9mrzeuch0wi5mjf9bfg2k2i7
Parents:        o1whl4i5foqkne0lygomvms5k
Created at:     2023-03-14 22:39:05.668910308 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           8.19kB
Description:    mount / from exec /bin/sh -c touch additional_file
Usage count:    1
Last used:      2023-03-14 22:39:13.835582319 +0000 UTC
Type:           regular

ID:             o1whl4i5foqkne0lygomvms5k
Parents:        fgdje1svtfnaey03bya7bwkwp;ih9vsij2zyqzg2mzgf1rwfb73
Created at:     2023-03-14 22:39:05.310626729 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           8.19kB
Description:    [stage-1 2/3] LINK COPY --link --from=install /opt /opt/
Usage count:    1
Last used:      2023-03-14 22:39:13.837651372 +0000 UTC
Type:           regular

ID:             egoaowv188jltybkrv3zpkyr9
Created at:     2023-03-14 22:38:26.412776114 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           4.10kB
Description:    local source for context
Usage count:    1
Last used:      2023-03-14 22:39:13.841438724 +0000 UTC
Type:           source.local

Reclaimable:    3.34GB
Total:          3.34GB

buildctl du --verbose output with fuse-overlayfs snapshotter:

ID:             uyykdcm5fmonzo7i72950u7cd
Parents:        cbo9dzvvhztjwseexhskbgc1q;ums60775hoyuoo9enie58evln
Created at:     2023-03-14 22:40:37.675896467 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           1.16GB
Description:    [stage-1 2/3] LINK COPY --link --from=install /opt /opt/
Usage count:    1
Last used:      2023-03-14 22:40:39.214469007 +0000 UTC
Type:           regular

ID:             ums60775hoyuoo9enie58evln
Created at:     2023-03-14 22:40:37.560702711 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           1.05GB
Description:    copy /opt /opt/
Usage count:    1
Last used:      2023-03-14 22:40:39.221471281 +0000 UTC
Type:           regular

ID:             24i04sk49oyir6bggqdf0ud3t
Parents:        cbo9dzvvhztjwseexhskbgc1q
Created at:     2023-03-14 22:40:31.791307595 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           1.05GB
Description:    [install 2/2] COPY file /opt/file
Usage count:    1
Last used:      2023-03-14 22:40:39.227557681 +0000 UTC
Type:           regular

ID:             rih828yfy98opgah8l70wcaah
Created at:     2023-03-14 22:40:26.541231127 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           1.05GB
Description:    local source for context
Usage count:    1
Last used:      2023-03-14 22:40:39.220455825 +0000 UTC
Type:           source.local

ID:             cbo9dzvvhztjwseexhskbgc1q
Created at:     2023-03-14 22:40:26.537269372 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           161.39MB
Description:    pulled from docker.io/library/debian:stretch@sha256:c5c5200ff1e9c73ffbf188b4a67eb1c91531b644856b4aefe86a58d2f0cb05be
Usage count:    1
Last used:      2023-03-14 22:40:39.218532163 +0000 UTC
Type:           regular

ID:             cf9iygae7qg8af4a0dr0ag39a
Created at:     2023-03-14 22:40:24.937214129 +0000 UTC
Mutable:        false
Reclaimable:    true
Shared:         false
Size:           30.58MB
Description:    pulled from docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc
Usage count:    1
Last used:      2023-03-14 22:40:39.217524437 +0000 UTC
Type:           frontend

ID:             kdbhtu82ktu6v7l3a50e23idb
Parents:        uyykdcm5fmonzo7i72950u7cd
Created at:     2023-03-14 22:40:39.085046868 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           8.19kB
Description:    mount / from exec /bin/sh -c touch additional_file
Usage count:    1
Last used:      2023-03-14 22:40:39.21649723 +0000 UTC
Type:           regular

ID:             058553i7t4nveu9hmj2tdilrx
Created at:     2023-03-14 22:40:23.668668854 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           8.19kB
Description:    local source for dockerfile
Usage count:    1
Last used:      2023-03-14 22:40:39.225520338 +0000 UTC
Type:           source.local

ID:             yf96azgt95vuzzlmlu51tdn22
Created at:     2023-03-14 22:40:23.665576433 +0000 UTC
Mutable:        true
Reclaimable:    true
Shared:         false
Size:           4.10kB
Description:    local source for context
Usage count:    1
Last used:      2023-03-14 22:40:39.223501885 +0000 UTC
Type:           source.local

Reclaimable:    4.50GB
Total:          4.50GB

du inside buildkit folders

$ find  ~/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/ -type f -links +1 -ls
  4788800      4 -rwxr-xr-x   2 doc      doc          2301 avril 10  2022 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/153/fs/bin/gunzip
  4788800      4 -rwxr-xr-x   2 doc      doc          2301 avril 10  2022 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/153/fs/bin/uncompress
  4789387   1980 -rwxr-xr-x   2 doc      doc       2021960 juin 20  2020 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/153/fs/usr/bin/perl5.24.1
  4789387   1980 -rwxr-xr-x   2 doc      doc       2021960 juin 20  2020 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/153/fs/usr/bin/perl
  4779560      4 -rwxr-xr-x   2 doc      doc          2301 avril 10  2022 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/150/fs/bin/gunzip
  4779560      4 -rwxr-xr-x   2 doc      doc          2301 avril 10  2022 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/150/fs/bin/uncompress
  4780184   1976 -rwxr-xr-x   2 doc      doc       2021960 juin 20  2020 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/150/fs/usr/bin/perl5.24.1
  4780184   1976 -rwxr-xr-x   2 doc      doc       2021960 juin 20  2020 /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/150/fs/usr/bin/perl

$ du -sh ~/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/*
12K     /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/145
16K     /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/146
20M     /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/147
1001M   /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/149
111M    /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/150
1001M   /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/151
1001M   /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/152
1,1G    /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/153
20K     /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/156

$ find  ~/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/ -type f -links +1 -ls
  5019305 1024012 -rw-r--r--   2 doc      doc      1048576000 mars 13 22:21 /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/42/fs/opt/file
  4800269       4 -rwxr-xr-x   2 doc      doc            2301 avril 10  2022 /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/40/fs/bin/gunzip
  4800269       4 -rwxr-xr-x   2 doc      doc            2301 avril 10  2022 /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/40/fs/bin/uncompress
  4801486    1980 -rwxr-xr-x   2 doc      doc         2021960 juin 20  2020 /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/40/fs/usr/bin/perl5.24.1
  4801486    1980 -rwxr-xr-x   2 doc      doc         2021960 juin 20  2020 /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/40/fs/usr/bin/perl
  5019305 1024012 -rw-r--r--   2 doc      doc      1048576000 mars 13 22:21 /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/43/fs/opt/file

$ du -sh ~/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/*
12K     /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/35
16K     /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/36
20M     /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/37
1001M   /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/39
111M    /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/40
du: impossible de lire le répertoire '/home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/41/work/work': Permission non accordée
1001M   /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/41
1001M   /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/42
16K     /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/43
du: impossible de lire le répertoire '/home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/45/work/work': Permission non accordée
20K     /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/45

Versions:

I'm interested in fuse-overlayfs primarily to be able to use fuse-overlayfs on RHEL which doesn't allow overlayfs in rootless mode.

tonistiigi commented 1 year ago

cc @sipsma

I think this is expected. When --link makes a merge base, it creates hardlinks between files that are the same in the source and merged layers. If you use a spanshotter that is not capable of doing hardlinks then it needs to do a full copy.

amurzeau commented 1 year ago

In the 1.16GB layer in fuse-overlayfs case, there are these files:

$ ls /home/doc/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/153/fs
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

In overlayfs case, the equivalent layer only contains the opt/file hardlink:

$ ls /home/doc/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/43/fs/
opt

So this is expected to have all files from previous layers in the merged layer in fuse-overlayfs case ?

tonistiigi commented 1 year ago

There is a special optimization for certain snapshotters in https://github.com/moby/buildkit/blob/master/snapshot/merge.go#L19 that can do hardlinks. I'm not instantly sure if for fuse-overlayfs, there is a reason why the hardlink optimization is not possible at all or if it is just that it is not enabled(it might also need more fuse-specific work).

sipsma commented 1 year ago

There is a special optimization for certain snapshotters in https://github.com/moby/buildkit/blob/master/snapshot/merge.go#L19 that can do hardlinks. I'm not instantly sure if for fuse-overlayfs, there is a reason why the hardlink optimization is not possible at all or if it is just that it is not enabled(it might also need more fuse-specific work).

In theory it may be relatively straightforward to support fuse-overlayfs too; the hardlinking optimizations rely on the ability to get the underlying lowerdirs of the mount so they can be used to make hardlinks from the underlying filesystem, e.g. here: https://github.com/moby/buildkit/blob/2816a8328fb13b81bc26b6fd70c67be1ca86e5c6/snapshot/diffapply_unix.go#L171-L178

Provided the fuse-overlayfs snapshotter makes it easy to obtain those too (i.e. it doesn't just give you a bind mount sourced from a fuse mount or something indirect like that) then I think adding hardlink support for it would be fairly easy if anyone wants to contribute it.

Also, if support for this was added it would probably also make it trivial to support fuse-overlayfs in the overlay-optimized differ, which may provide other significant performance benefits.

amurzeau commented 1 year ago

I've tried a quick and dirty change:

And successfully run a build that used less disk space ! (the same disk usage as with overlayfs)

I've not checked everything yet, it seems too simple to be true ! There are probably edge cases to check.

amurzeau commented 1 year ago

I get Server message: toomanyrequests error when trying to execute integration tests:

httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/amd64/busybox/manifests/sha256:0d5a701f0ca53f38723108687add000e1922f812d4187dea7feaee85d2f5a6c5: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

Is there a way to cache downloaded images or to pass login information ? I've tried to put my docker login credentials in ~/.docker/config.json within the dev-env container, but tests don't seem to read that file, the error still occurs.

I'm executing them in debug mode from VS Code in a devcontainer based on dev-env target from ./Dockerfile.