Introduction

moss-container is a tool to run process(es) in a containerized fashion. Historically it created a container by mounting file systems and bind-mounting directories, plus mknodding. All of these operations required CAP_SYS_ADMIN (root privileges).

It is ideal to build packages with regular privileges. Enter rootless containers. Similar to podman, we want to leverage Linux namespaces to mount file systems and directories in a confined namespace so that we isolate the build environment without cluttering/damaging the host system accidentally, plus we ensure a reproducible environment.

The problem

The work-in-progress https://github.com/serpent-os/moss-container/tree/userns branch almost does the above, barring that the internal rootfs is not owned by the internal root, but by the internal regular user (for various reasons not worth expanding). This is undesirable because we don't want to give users (or build scripts) the illusion that they can edit the entire rootfs at their leisure.

Setting correct internal permissions for directories coming from the host system isn't an easy task. We can't just chown -R internalUID:internalGID, because the UID and GID will be propagated to the host files as well! We must chown only to the eyes of the container.

idmapped mounts aren't viable because they require being root in the host, defeating the very purpose of rootless containers.

The solution

Make a copy of the rootfs somewhere and chown it. As simple as that.

Well, that's the gist of it, although both recursively copying and recursively chowning are slow operations.

To mitigate that, we can explore (and pick one of) the following possible optimizations:

Mount the rootfs as an OverlayFS mount point using the metacopy=on option. This way, an entire file is not copied when editing metadata, which is exactly the case of chown. Only metadata is copied.
Mount the rootfs as an OverlayFS mount point, read-only. I don't know if a read-only rootfs is viable (it depends on the initialization we have to perform), but that's surely the fastest solution.

OverlayFS documentation.

Possible implementation

Make moss-container have a --workdir flag. This will be the OverlayFS hierarchy parent. Here we will have:

$WORKDIR/upper
$WORKDIR/work
$WORKDIR/merged

This --workdir flag will be passed by boulder, resulting in a moss-container call like: moss-container -d /my/rootfs/path --workdir $XDG_CACHE_HOME/boulder/packagename --bind-rw /some/extra/dir. The build process will happen in $WORKDIR/merged.

Unfortunately to mount a rootless OverlayFS you need to set the userxattr option, which is incompatible with metacopy=on. chowning is very slow without it, as it'd copy the whole file.

With regards to the way Podman works, I created a Podman container based on the Serpent OS rootfs with this command: podman create --userns=keep-id --rootfs ~/Personal/SerpentOS/SOSROOT/ /bin/sleep infinity. If I then enter into that container, the rootfs IS NOT owned by root, but by my user. This demonstrates the rootfs ownership isn't set by Podman itself, but by the way the OCI image was constructed. This explains why Podman only chowns new files in the container (e.g. /etc/hosts), but not the whole rootfs.

The only two viable solutions I can think of are:

Mount the overlayed rootfs read only. This way, the owner is still incorrectly set, but nobody can mangle it since the mount point is read-only. It's an unorthodox solution, but it would work (likely).
Have a rootfs directory somewhere which is completely managed by moss-container. This way, we chown it only once and reuse it every time we need to fire moss-container. That's the way Podman works, after all.
- CAVEAT: if we want to run multiple simultaneous moss-container instances, we need multiple rootfs directories.
- CAVEAT: the rootfs will be owned by a different UID/GID than $USER's. The user will need to use sudo to delete the rootfs if in need, or we may add a moss-container delete command. Again, this is the way Podman works.

Third proposal: just use Podman?

I think what I'm going to describe is viable solution to all of our issues. You may not like it entirely, so feedback welcome:

moss-container shouldn't be a thing of its own. If it exists only for the sake of boulder, then the code should live inside the boulder's repository. This helps contributors to understand they solve the same problem and should be modified in tandem.
1. If we want moss-container to just be a library, no more work is required (obviously, its CLI flags will be migrated to the boulder's ones).
2. If we still want moss-container to be an executable, it's better to install it in /usr/libexec. This way we hint users it's not intended to be run standalone.
Have a boulder container subcommand:
- create: Creates a rootfs directory inside XDG_CACHE_DIR, or any other custom directory, owned by the proper UID/GID (see comment above).
- remove: Users won't be able to delete the rootfs since it's owned by a U/GID different than theirs. They'll have to use sudo, or the remove convenience subcommand.
- update: Updates the rootfs according the latest changes in the official stable repository.
When building, a unique directory is created leveraging OverlayFS. The lowerdir will be the rootfs, so that simultaneous builds can run without conflicting with each other, since the lowerdir is read-only. The unique directory may be named after the package we're building and its version, for example.

serpent-os / moss-container

Set proper permissions to the internal rootfs when rootless #8

Introduction

The problem

The solution

Possible implementation