serpent-os / moss-container

2 stars 3 forks source link

Set proper permissions to the internal rootfs when rootless #8

Open livingsilver94 opened 1 year ago

livingsilver94 commented 1 year ago

Introduction

moss-container is a tool to run process(es) in a containerized fashion. Historically it created a container by mounting file systems and bind-mounting directories, plus mknodding. All of these operations required CAP_SYS_ADMIN (root privileges).

It is ideal to build packages with regular privileges. Enter rootless containers. Similar to podman, we want to leverage Linux namespaces to mount file systems and directories in a confined namespace so that we isolate the build environment without cluttering/damaging the host system accidentally, plus we ensure a reproducible environment.

The problem

The work-in-progress https://github.com/serpent-os/moss-container/tree/userns branch almost does the above, barring that the internal rootfs is not owned by the internal root, but by the internal regular user (for various reasons not worth expanding). This is undesirable because we don't want to give users (or build scripts) the illusion that they can edit the entire rootfs at their leisure.

Setting correct internal permissions for directories coming from the host system isn't an easy task. We can't just chown -R internalUID:internalGID, because the UID and GID will be propagated to the host files as well! We must chown only to the eyes of the container.

idmapped mounts aren't viable because they require being root in the host, defeating the very purpose of rootless containers.

The solution

Make a copy of the rootfs somewhere and chown it. As simple as that.

Well, that's the gist of it, although both recursively copying and recursively chowning are slow operations.

To mitigate that, we can explore (and pick one of) the following possible optimizations:

OverlayFS documentation.

Possible implementation

Make moss-container have a --workdir flag. This will be the OverlayFS hierarchy parent. Here we will have:

This --workdir flag will be passed by boulder, resulting in a moss-container call like: moss-container -d /my/rootfs/path --workdir $XDG_CACHE_HOME/boulder/packagename --bind-rw /some/extra/dir. The build process will happen in $WORKDIR/merged.

livingsilver94 commented 1 year ago

Unfortunately to mount a rootless OverlayFS you need to set the userxattr option, which is incompatible with metacopy=on. chowning is very slow without it, as it'd copy the whole file.

With regards to the way Podman works, I created a Podman container based on the Serpent OS rootfs with this command: podman create --userns=keep-id --rootfs ~/Personal/SerpentOS/SOSROOT/ /bin/sleep infinity. If I then enter into that container, the rootfs IS NOT owned by root, but by my user. This demonstrates the rootfs ownership isn't set by Podman itself, but by the way the OCI image was constructed. This explains why Podman only chowns new files in the container (e.g. /etc/hosts), but not the whole rootfs.

The only two viable solutions I can think of are:

Third proposal: just use Podman?

livingsilver94 commented 1 year ago

I think what I'm going to describe is viable solution to all of our issues. You may not like it entirely, so feedback welcome:

  1. moss-container shouldn't be a thing of its own. If it exists only for the sake of boulder, then the code should live inside the boulder's repository. This helps contributors to understand they solve the same problem and should be modified in tandem.
    1. If we want moss-container to just be a library, no more work is required (obviously, its CLI flags will be migrated to the boulder's ones).
    2. If we still want moss-container to be an executable, it's better to install it in /usr/libexec. This way we hint users it's not intended to be run standalone.
  2. Have a boulder container subcommand:
    • create: Creates a rootfs directory inside XDG_CACHE_DIR, or any other custom directory, owned by the proper UID/GID (see comment above).
    • remove: Users won't be able to delete the rootfs since it's owned by a U/GID different than theirs. They'll have to use sudo, or the remove convenience subcommand.
    • update: Updates the rootfs according the latest changes in the official stable repository.
  3. When building, a unique directory is created leveraging OverlayFS. The lowerdir will be the rootfs, so that simultaneous builds can run without conflicting with each other, since the lowerdir is read-only. The unique directory may be named after the package we're building and its version, for example.