sylabs / singularity

SingularityCE is the Community Edition of Singularity, an open source container platform designed to be simple, fast, and secure.
https://sylabs.io/docs/
Other
774 stars 98 forks source link

`overlay seal` must handle fuse-overlayfs whiteouts sensibly #3176

Closed dtrudg closed 3 months ago

dtrudg commented 3 months ago

Version of Singularity

main

Describe the bug

When a writable overlay is used in OCI mode, the base container image and the overlay image are mounted using FUSE programs. This necessitates using fuse-overlayfs to perform the overlay where --writable is specified, instead of a kernel overlay mount.

Execution with --writable and fuse-overlayfs results in the user home directory appearing in the overlay upper/. The user home directory (at least on Fedora 40 with fuse-overlayfs v1.13-dev) ends up containing whiteout opaque files, which appear in the aufs format, rather than the native overlay format:

$ singularity pull --oci docker://ubuntu
$ singularity overlay create ubuntu_latest.oci.sif
$ singularity run --oci --writable ubuntu_latest.oci.sif touch /foo

Results in an overlay that has the following content....

├── upper
│   ├── foo
│   └── home
│       └── dtrudg-sylabs
│           ├── .wh..opq
│           └── .wh..wh..opq
...

Note here that .wh..wh..opq is the correct form of an AUFS opaque marker. It's not clear why .wh..opq is appearing here at all (?!) Further... it is actually a 0:0 character device, which is the native overlay indication to whiteout a file called .wh..opq.

crwx------. 1 root root 0, 0 Aug  1 16:43 .wh..opq
-rwx------. 1 root root    0 Aug  1 16:43 .wh..wh..opq

For ongoing use of the overlay with Singularity, there doesn't seem to be any issue. Although we should verify whether deletion / opaque directories work properly across --writable with fuse-overlayfs and non---writable with kernel overlay.

When an overlay is sealed with singularity overlay seal then we convert the ext3 image to a squashfs. The squashfs definitely shouldn't include any AUFS format whiteouts. If we later use a singularity push --layer-format=tar then layer format conversion will attempt to convert from native overlay format whiteouts into AUFS format whiteouts (which are appropriate for tar layers).

The resulting layer tars, which have been through whiteout conversion, cause an error when importing into docker / podman etc. as there is a duplicate file in the tar.

We need to fully understand what exactly fuse-overlayfs is doing here... and why. We need to handle any whiteout files in singularity overlay seal so that they don't cause issues later on with --layer-format=tar conversion.

I think the key to this is understanding exactly how the 0:0 char device .wh..opq file ends up in the ext3 overlay image.

Perhaps the issue is related to how we are constructing the container final dir... as we already have a tmpfs overlay construct in play here.

dtrudg commented 3 months ago

Given some reports about fuse-overlayfs not working where the mountpoint is also upperdir... I checked that the issues don't stem from our usage where the mountpoint is also a lowerdir. (mount the overlay onto <bundledir>/final rather than <bundledir>/rootfs).

This is not the issue... after running with --writable, the overlay still contains the same things:

dtrudg-sylabs@mini:~/Git_Sylabs/singularity/extfs/upper/home/dtrudg-sylabs
04:56 pm $ ls -lah
ls: .wh..wh..opq: Permission denied
ls: .wh..opq: Permission denied
total 2.0K
drwxr-xr-t. 2 bin  bin  1.0K Aug  5 16:55 ./
drwxr-xr-x. 3 root root 1.0K Aug  5 16:55 ../
crwx------. 1 root root 0, 0 Aug  5 16:55 .wh..opq
-rwx------. 1 root root    0 Aug  5 16:55 .wh..wh..opq
dtrudg commented 3 months ago

There is a reference to .wh..opq files in containerd code... https://github.com/containerd/continuity/blob/50fa7de4fc5d1529fed1c4d6e3efad231bf5a232/fs/diff.go#L154

There is a reference to .wh..opq and fuse-overlayfs in craft code... https://github.com/canonical/craft-parts/blob/184d5142b97bc73b0a3980dd3219dffd11b62c18/craft_parts/executor/part_handler.py#L767