jdholtz / pacman-venv

Create an isolated virtual environment for Pacman and supported AUR helpers
MIT License
16 stars 1 forks source link

Add the Ability to Use Without Root #9

Open digitalsignalperson opened 1 year ago

digitalsignalperson commented 1 year ago

Description

running pacman-venv venv asks for sudo password for "Installing base environment"

can it be rootless?

Same for installing packages in the venv, it asks for sudo password.

What alternatives have you considered?

No response

jdholtz commented 1 year ago

Not that I know of as you are using pacman to install the packages and pacman requires root.

digitalsignalperson commented 1 year ago

hmm, I don't know much about namespaces, but that seems like an option

something from from this thread https://www.reddit.com/r/archlinux/comments/xbtcl8/make_pacman_work_without_root/

Use arch-chroot with -N. Note that the permissions will only look correct inside the chroot. This is because you really do need root permissions in order to create files as the root user. In general, this means you have to use pacstrap with -N as well.

from the manpage:

-N Run in unshare mode. This will use unshare(1) to create a new mount and user namespace, allowing regular users to create new system installations.

jdholtz commented 1 year ago

That looks interesting. I’ll definitely check that out.

Is there a reason rootless is desired? Pacman has a hard-coded root user check, so there isn’t any way around that by using the pacman command.

It would still be possible to use any other command inside the virtual environment that allows you to install packages as non-root. You might just have to use the --root flag (or the equivalent of that software). Support for this can also be added through a shim.

digitalsignalperson commented 1 year ago

You may have a scenario where you are using a computer that you don't have sudo/root access to, but you want to install some pacman packages. I think you can also argue there are some security benefits for things to be rootless in general.

With the venv concept rootless makes sense since we are not modifying the system in any way. Just like pip imagine if we had to use sudo every time for a non-system pip package install.

See also distrobox, junest, and conty all highlighting being able to be used without root as a key feature.

jdholtz commented 1 year ago

Those are good points. I’ll see how I can make this possible.

digitalsignalperson commented 1 year ago

I was just looking at how junest implements bwrap for this. One idea could be to use bwrap (or another namespace solution) in a similar way for the venv, but have the host filesystem mounted within the new root, and have the various PATH variables pointing to them as needed.

digitalsignalperson commented 1 year ago

i think unshare --fork --pid pacman -r "/path/to/root" -Sy packages is also similar to arch-chroot with -N

from install-arch-without-pacstrap.md

spfanning commented 1 year ago

The checkupdates script from pacman-contrib uses fakeroot to use pacman without root.

jdholtz commented 1 year ago

Thanks for looking into this! I’ll hopefully have time to implement this soon. This will be a very nice addition to the project.

jdholtz commented 1 year ago

I've been messing around with this for a little bit. So far it seems like unshare will work if you use the --map-root-user command: unshare --map-root-user --fork --pid -- pacman -Sy --cachedir pacman-venv/var/cache/pacman/pkg --root pacman-venv/ packages.

I also tried using fakeroot, but it kept throwing this error:

could not change the root directory (Operation not permitted)
error: command failed to execute correctly

I used the following command: fakeroot -- pacman -Sy --cachedir pacman-venv/var/cache/pacman/pkg --root pacman-venv/ packages

I'll have to look more into why this error is only occuring with fakeroot.

A disadvantage I noticed while trying to use pacman without root access is the need to use another cache directory. Unfortunately, /var/cache/pacman/pkg (I also tried with a symlink) is only accessible to root, so a new cache directory will need to be used for every virtual environment. The disadvantage to this is that a lot more disk space will be used. However, it could help to isolate the virtual environments more (although pip uses one global cache when running in virtualenv).

digitalsignalperson commented 1 year ago

Interesting, that makes sense that /var/cache/pacman would be read-only, since an unprivileged user could just stick a malicious package into the cache used for system packages.

Maybe there's a way to use the system cache if what the venv needs is in there, otherwise write to a venv or user-level package cache. Possibly symlinks? or overlayfs? Or maybe some shell script to manually copy/link the specific files, like a pre-pacman hook.

But I agree it might just be KISS and some level of isolation to just not care about sharing the cache.

Unrelated but this reminds me of how I already waste a lot of space forgetting to use paccache -r (maybe I should auto run that with a pacman hook or something)

digitalsignalperson commented 1 year ago

also this reminds me of virtualenv --system-site-packages venv

jdholtz commented 1 year ago

Maybe there's a way to use the system cache if what the venv needs is in there, otherwise write to a venv or user-level package cache

I did try with symlinks and but pacman appears to always need write-access to the cache, even when I installed the exact same packages that I had in my cache already. It threw a warning that said something along the lines of could not find or access cache directory. Using /tmp.

also this reminds me of virtualenv --system-site-packages venv

That’d be an interesting feature to add. Currently though, it doesn’t appear to be possible since pacman doesn’t support searching multiple DBpaths at once (AFAIK). Also, pip has special flags to work better with including system site packages in virtual environments. Unfortunately, there is not the same level of support for pacman.

digitalsignalperson commented 1 year ago

ah, as far as read-only cache sharing, arch wiki pacman tips & tricks has some options

2.3 Network shared pacman cache 2.3.1 Read-only cache 2.3.2 Overlay mount of read-only cache

digitalsignalperson commented 1 year ago

this is interesting https://wiki.archlinux.org/title/Bubblewrap/Examples#Filesystem_isolation

To further hide the contents of the file system (such as those in /var, /usr/bin and /usr/lib) and to sandbox even the installation of software, pacman can be made to install Arch packages into isolated filesystem trees.

jdholtz commented 1 year ago

I was looking into fakeroot and fakechroot to build packages in isolation. That Wiki section will help a lot. Thanks!

And sorry I haven’t had much time to work on this lately but hopefully I will be able to soon.

digitalsignalperson commented 1 year ago

all good! I've just been poking around at containers/bubblewrap stuff and thought I'd share!

jdholtz commented 1 year ago

I've started work on this in the no-root branch. Feel free to try it out if you want (git pull && git checkout no-root && make). It uses unshare to run pacman and overrides yay's flags to replace sudo with unshare as well.

I still want to attempt to use the global pacman cache within the virtual environment, which is currently not possible without using root privileges (the cache directory is not writable without root). I have successfully gotten it to work by mounting using sudo, but I will work on it some more to see if I can take advantage of unshare namespaces to achieve this.

digitalsignalperson commented 1 year ago

Cool!

Would it do the job to specify multiple cache dirs? It will be able to read from both, but it will only download into the one that is writable. Assuming the venv is the writable one.

man pacman

--cachedir

Specify an alternative package cache location (the default is /var/cache/pacman/pkg). Multiple cache directories can be specified, and they are tried in the order they are passed to pacman.

man pacman.conf

CacheDir = /path/to/cache/dir

Overrides the default location of the package cache directory. The default is /var/cache/pacman/pkg/. Multiple cache directories can be specified, and they are tried in the order they are listed in the config file. If a file is not found in any cache directory, it will be downloaded to the first cache directory with write access. NOTE: this is an absolute path, the root path is not automatically prepended.

digitalsignalperson commented 1 year ago

ya adding this 2nd cachedir, works for relying on host cache, but downloading new things into venv cache

diff --git a/lib/pacman-venv/pacman-venv b/lib/pacman-venv/pacman-venv
index 7848e23..da0704a 100755
--- a/lib/pacman-venv/pacman-venv
+++ b/lib/pacman-venv/pacman-venv
@@ -75,7 +75,7 @@ install_packages() {
     local pacman_opts=()
     [[ "${INTERACTIVE}" != 1 ]] && pacman_opts+=("--noconfirm")

-    unshare --map-root-user -- pacman --root "${root_dir}" --cachedir "${root_dir}/var/cache/pacman/pkg" "${pacman_opts[@]}" -Sy "${packages[@]}"
+    unshare --map-root-user -- pacman --root "${root_dir}" --cachedir "${root_dir}/var/cache/pacman/pkg" --cachedir "/var/cache/pacman/pkg" "${pacman_opts[@]}" -Sy "${packages[@]}"
 }

 add_activation_scripts() {
diff --git a/lib/pacman-venv/shims/pacman b/lib/pacman-venv/shims/pacman
index 65db2ba..0044e53 100644
--- a/lib/pacman-venv/shims/pacman
+++ b/lib/pacman-venv/shims/pacman
@@ -5,6 +5,7 @@ _pacman() {
     # to the virtual environment instead of the root directory
     PATH="${PATH#*:}" unshare --map-root-user -- pacman \
         --cachedir "${_PACMAN_VENV}/var/cache/pacman/pkg" \
+        --cachedir "/var/cache/pacman/pkg" \
         --hookdir "${_PACMAN_VENV}/etc/pacman.d/hooks" \
         --root "${_PACMAN_VENV}" \
         "$@"
diff --git a/lib/pacman-venv/shims/yay b/lib/pacman-venv/shims/yay
index a067b66..d9d5e89 100644
--- a/lib/pacman-venv/shims/yay
+++ b/lib/pacman-venv/shims/yay
@@ -17,6 +17,7 @@ _yay() {
         --sudo /usr/bin/unshare \
         --sudoflags "--map-root-user" \
         --cachedir "${_PACMAN_VENV}/var/cache/pacman/pkg" \
+        --cachedir "/var/cache/pacman/pkg" \
         --hookdir "${_PACMAN_VENV}/etc/pacman.d/hooks" \
         --root "${_PACMAN_VENV}" \
         "${args[@]}"

Trying installing some random things, a few that didn't work:

yay -S magicavoxel
magicavoxel
cp: cannot stat '/usr/share/magicavoxel/config/dict.txt': No such file or directory

pacman -S blobwars
blobwars
Could not chdir to '/usr/share/games/blobwars/': No such file or directory

1st one may be something to do with it uses wine 2nd was a completely random one I pulled off the wiki games list.

A more complicated failure trying yay -S quake2

finally after install trying to run it:

/path/to/the/venv/usr/bin/quake2: line 3: cd: /opt/quake2: No such file or directory
/path/to/the/venv/usr/bin/quake2: line 4: ./sdlquake2: No such file or directory

Are these types of errors an unavoidable limitation because of hard-coded/absolute paths?

jdholtz commented 1 year ago

ya adding this 2nd cachedir, works for relying on host cache, but downloading new things into venv cache

That's great. That's a very convenient way to solve this issue.

Are these types of errors an unavoidable limitation because of hard-coded/absolute paths?

This is the main issue with pacman-venv and why the concept of it may not be achievable. First, not all PKGBUILD's respect using the --root flag. A solution to this could be to run pacman/yay in a fakechroot environment which would solve the hard-coded paths in PKGBUILD's. Unfortunately, this means specifying the host cache would not work as that wouldn't be visible to pacman.

However, this would only solve installation of packages. As you pointed out in quake2, it relies on using absolute paths. The solution to this could be to run it in a chroot environment as well, but this is tedious for users to do as they would need to remember which commands to run it for. Another solution could be to run every command that is in the virtual environment in a chroot environment, but that would make every package installed in the venv useless outside of that filesystem. Additionally, it wouldn't be really any different than existing solutions utilizing containers.

Due to these reasons, I'm not sure how practical this idea of a Pacman Virtual Environment is in the first place. It does work theoretically (and practically in some cases), but it overall seems like there are issues that simply don't seem solvable.

digitalsignalperson commented 1 year ago

Hmm, interesting to think about possible solutions. Some brainstorming notes:

digitalsignalperson commented 1 year ago

and in the overlay cases, the venv can still have it's own pacman database as specified by the args to pacman etc in the venv activation script.

I had some other weird ideas for "overlaying" pacman databases:

jdholtz commented 1 year ago

Those are great suggestions. I definitely think the OverlayFS system at / (or specific directories) could work well and solve most problems the current venv has.

I’m not too familiar with how Pac-Man’s databases work, but I definitely agree that there would probably need to be some isolation between the host and venv’s databases.

I’ll start looking into solutions to this. Thanks for all the help with this!

jdholtz commented 10 months ago

Hey @digitalsignalperson, I've been working on this for a few hours. However, I have run into an issue that I don't see a good way around and I wanted to get your opinion on it.

My main thoughts were to have an OverlayFS that would merge the virtual environment directory and the root directory. Everything would be written to the venv directory to keep the isolation. Then, the user would chroot into the merged directory and basically be in an isolated environment that has both packages from the system as well as the venv. However, this has the downside that any changes made inside the chroot would solely get written to the venv--meaning any project files or config files updated would not be reflected on the system's FS, just inside the venv.

Due to this, I have a couple possible solutions, but please let me know if you have any better ones.

  1. Chroot into the venv directory before installing a package (ensures isolation while also making sure packages that don't support installing in a separate root directory are installed correctly).
  2. Run all commands in the venv under a chroot. This would have the advantage that programs like quake being installed in the venv working (the error was shown by you above). The disadvantage, however, would be that any programs installed in the venv couldn't read config files outside of the venv (such as vim not reading .vimrc in the user's home directory, but instead from venv/home/user/.vimrc)

I have not explored the original feature request of using pacman-venv without root using the above steps, but making sure everything is installed correctly inside the venv is a higher priority, as the project is not usable without that.

digitalsignalperson commented 10 months ago

Hey, happy to share my ramblings on this.

Thinking about it, isn't an issue with overlayfs that the lower dir is not read-only? It's undefined behavior when the lower dir changes i believe. Unless you take a filesystem snapshot and clone it or something. Perhaps some of the other merge-style filesystems can handle the lower dir changing. But it still seems like you'd end up with weird conflicts, like if you install a package in venv, but it upgrades a dependency that's already on the host. Or the other way around, you do a system update on the host, but the venv has some things behind.

Maybe there's some kind of solution around every time you activate the venv, you install the venv packages into a tmpfs as your chroot (should be very fast). The packages are cached (outside the tmpfs), and if need be, the latest versions are downloaded based on the sync state of the host. It could be a rule for the venv to not use pacman with u. You exit any venvs when you want to system upgrade the host, then go back and re-activate the venv after.

Regarding the overlayfs issue, maybe it's possible to mix two read-write filesystems with mergerfs? https://github.com/trapexit/mergerfs

With the overlay/merge approach, I assumed the lower directory includes the original home directory (if desired). But it's cool to be able to isolate that as well. Actually preferred sometimes. If specific dotfiles are desired from the original home folder, those could always be specified in the venv configuration. You could then bind mount specific files/folders read-only or read-write as needed. This is easy with the bwrap --bind args.

I also noticed bubblewrap merged a feature to bind overlay filesystems with the bwrap command. I feel like this makes for the simplest rootless chroot system to use, but I'm still not sure if overlay idea works anymore based on the read-write issue of the lower filesystem.

If merging with the underlying host system ends up with too many issues to be viable, I guess it's not the worst thing to just have a completely independent arch install to chroot into. The performance cost in this venv use-case I feel like is very low. Maybe it increases the cost of your installed venv by max 1-2GB(?) by reinstalling some packages already on the host? You wouldn't need overlayfs if you bind mount the host pacman cache in the venv in an alternate cache directory and specify the host and venv pacman caches in the venv pacman config such that any new packages download into the venv cache. With all this I wonder if there's any benefit to creating the venv rootfs in tmpfs every activation (like pacstrap it and then installing all the specified packages). Or maybe I just think it's cool cause I'm playing with ramroot stuff lately.

jdholtz commented 10 months ago

Thank you very much for the extensive feedback on this.

I agree that an overlayfs will most likely not work due to the issues I have run into and what you have mentioned as well.

I will now try to see if I can get pacman/yay to install in the venv (chrooted) every time. I will worry about the cache at a later point (that can be a further enhancement, as right now both rootless and building/running in chroot is being worked on).

You could then bind mount specific files/folders read-only or read-write as needed.

I really like this idea. I haven't messed around with bwrap yet, but I will do that soon.

The performance cost in this venv use-case I feel like is very low.

I agree. If we can eventually get the venv concept to work, then we can start to focus our attention on optimizing it for cases like this.

digitalsignalperson commented 9 months ago

although I say those caveats about about overlayfs, in practice I've been doing some messing around with this and nothing has exploded

mount -t overlay fakeoverlay -o "lowerdir=/,upperdir=/mnt/root_upper,workdir=/tmp/root_work" /tmp/root
mount --bind /var/cache/pacman/pkg /tmp/root/var/cache/pacman/pkg

https://docs.kernel.org/filesystems/overlayfs.html#changes-to-underlying-filesystems

Changes to the underlying filesystems while part of a mounted overlay filesystem are not allowed. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock.

jdholtz commented 9 months ago

Interesting. I will try a different way that I mentioned above (chroot), which may be simpler. Then I can see if I can get the overlayfs working by using the commands you provided and see which one is better.

jdholtz commented 9 months ago

I've been messing around with this again for a little while. I tried the overlayfs commands you sent. The issue is that, for mounting the overlayfs, you need root permissions which kinda takes away the point of this issue being opened in the first place. Additionally, to feel like you have one integrated environment, you need to chroot, which uses root as well (fakechroot kind of works, but it is not maintained anymore and I'm hesitant to use something that isn't maintained).

I also tried to get pacman to install (or yay to build AUR packages) in the virtual environment using chroot, but that again wouldn't work without root permissions (chroot is needed which needs permissions). The other issue with this approach (not using overlayfs) is I am not sure how to execute commands in the virtual environment inside a chroot environment (again, need root) and commands on the host machine without chroot.

I'm running out of ideas to get this project to work well...

digitalsignalperson commented 8 months ago

I've done a bit more messing around with bubblewrap I can share some stuff on.

Here's some notes on having the ability to use sudo within a sandbox: https://github.com/containers/bubblewrap/issues/468#issuecomment-1875834957

But I'm not sure if that is useful in any way. What I did have varying success with are three methods:

  1. symlinking all files from the host into the new root in the sandbox to allow for an overlay-like mutability
  2. using an overlay filesystem (without root) inside the sandbox (doesn't work)
  3. hard linking all files from the host into the sandbox, but changing folder permissions to allow new files to be written freely

TBD if any of this is practical, but it's now I hobby I guess lol.

0. Baseline

Here's a baseline bwrap for steam that for me worked with all the games I tested without any issues. (Note I have both an amd and nvidia gpu). Starting from the host system having everything installed except steam.

#!/bin/bash

packages="steam lib32-nvidia-utils lib32-vulkan-radeon lib32-sdl"
if ! pacman -Qq $packages; then
    sudo pacman -S $packages --noconfirm --needed
    systemctl --user restart pipewire
fi

xhost +si:localuser:$USER

bwrap \
    --symlink /usr/bin /bin \
    --symlink /usr/bin /sbin \
    --symlink /usr/lib /lib \
    --symlink /usr/lib64 /lib64 \
    --ro-bind /usr /usr \
    --ro-bind /etc /etc \
    --ro-bind /opt/cuda /opt/cuda \
    --tmpfs /tmp \
    --proc /proc \
    --dev-bind /dev /dev \
    --bind /sys /sys \
    --dir "$XDG_RUNTIME_DIR" \
    --ro-bind /tmp/.X11-unix /tmp/.X11-unix \
    --ro-bind "$XDG_RUNTIME_DIR/pipewire-0" "$XDG_RUNTIME_DIR/pipewire-0" \
    --ro-bind "$XDG_RUNTIME_DIR/pulse" "$XDG_RUNTIME_DIR/pulse" \
    --ro-bind /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf \
    --unshare-all \
    --share-net \
    --die-with-parent \
    --new-session \
    --bind $HOME/Machines/steam $HOME \
    --chdir $HOME \
    -- bash -c 'dbus-run-session -- steam'

I'm not sharing the host dbus (big way to escape), and have a ~/Machines/steam I'm binding as the home directory in the container.

1. symlinking all files from the host

Initially bind the host system to /rootfs/ inside the container. Use LD_LIBRARY_PATH so that this works for our initial setup. And the initial /{bin,sbin,lib,lib64} symlinks will point inside /rootfs/usr/.... Include binding the pacman cache and database.

Use cp -Lrs to copy symlinks of every file in /rootfs/{usr,etc,var/cache/pacman} to /{usr,etc,var/cache/pacman}. Make a real copy of the host /rootfs/var/lib/pacman database into /var/lib/pacman that the sandbox is free to start from and overwrite.

Reset all the symlinks from /rootfs/usr/... back to /usr/...

Then install packages with fakeroot fakechroot pacman and launch steam

#!/bin/bash

xhost +si:localuser:$USER

bwrap \
    --symlink /rootfs/usr/bin /bin \
    --symlink /rootfs/usr/bin /sbin \
    --symlink /rootfs/usr/lib /lib \
    --symlink /rootfs/usr/lib64 /lib64 \
    --dir /rootfs \
    --ro-bind /usr /rootfs/usr \
    --ro-bind /etc /rootfs/etc \
    --ro-bind /opt/cuda /opt/cuda \
    --ro-bind /var/lib/pacman /rootfs/var/lib/pacman \
    --ro-bind /var/cache/pacman /rootfs/var/cache/pacman \
    --tmpfs /tmp \
    --proc /proc \
    --dev-bind /dev /dev \
    --bind /sys /sys \
    --dir "$XDG_RUNTIME_DIR" \
    --ro-bind /tmp/.X11-unix /tmp/.X11-unix \
    --ro-bind "$XDG_RUNTIME_DIR/pipewire-0" "$XDG_RUNTIME_DIR/pipewire-0" \
    --ro-bind "$XDG_RUNTIME_DIR/pulse" "$XDG_RUNTIME_DIR/pulse" \
    --ro-bind /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf \
    --unshare-all \
    --share-net \
    --die-with-parent \
    --new-session \
    --bind $HOME/Machines/steam $HOME \
    --setenv PATH "$PATH /rootfs/usr/bin" \
    --setenv LD_LIBRARY_PATH /lib:/rootfs/usr/lib \
    --chdir / \
    --ro-bind-data 3 "/setup.sh" \
    -- /rootfs/usr/bin/bash ./setup.sh \
    3<<EOF
#!/bin/bash

## Create symlinks for mutable filesystems
/rootfs/usr/bin/cp -Lrs /rootfs/usr /
/rootfs/usr/bin/cp -Lrs /rootfs/etc /
mkdir -p /var/cache
/rootfs/usr/bin/cp -Lrs /rootfs/var/cache/pacman /var/cache/

## Copy the pacman database so we can write to it
mkdir -p /var/lib
/rootfs/usr/bin/cp -r /rootfs/var/lib/pacman /var/lib/

## Switch the symlinks back to normal
ln -sfn /usr/bin /bin
ln -sfn /usr/bin /sbin
ln -sfn /usr/lib /lib
ln -sfn /usr/lib64 /lib64

fakechroot fakeroot pacman -S steam lib32-nvidia-utils lib32-vulkan-radeon lib32-sdl --noconfirm --needed
dbus-run-session -- steam
EOF

Some games seemed to work. But I think it's the proton ones that fail. Those games themselves use a modified bwrap called pressure-vessel, and you'll see error messages like

pressure-vessel-wrap[1319]: W: "rootfs/usr/lib64/ld-linux-x86-64.so.2" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "rootfs/usr/lib64/ld-linux-x86-64.so.2" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "rootfs/usr/lib32/ld-linux.so.2" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "rootfs/usr/lib32/ld-linux.so.2" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "/rootfs/usr/lib32/gconv" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "/rootfs/usr/lib/gconv" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "rootfs/usr/share/libdrm" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "rootfs/usr/share/libdrm" is unlikely to appear in "/run/host"
pressure-vessel-wrap[1319]: W: "rootfs/usr/share/drirc.d" is unlikely to appear in "/run/host"
bwrap: execvp /usr/lib/pressure-vessel/from-host/bin/pressure-vessel-adverb: No such file or directory

It's a pitfall (important security feature) of bwrap that if you bind a location into a sandbox, if they contain symlinks pointing to something not bound into the sandbox, then those files won't exist in the sandbox. I think that's what is happening, where our /rootfs containing most files is not bound into this nested game bwrap command.

I was looking through the strace output for this to see if I could figure out any solution.

I think what could work is to figure out which paths the pressure-vessel bwrap is binding, and to move our /rootfs target inside that. Maybe it's as simple as something like putting /rootfs into /usr/rootfs.

This method is nice because it requires no root. We can create a container based off the host system state, and inside the container you can install anything you want. You can't overwrite existing files (we don't have permissions to the target of those symlinks), unless you specifically make copies of them (like done with the pacman database). And making copies to overwrite does not require root!

I'm running this on a ramroot system and the setup of creating symlinks takes only a few seconds (e.g. 2 seconds to make the symlinks for /usr). I think when I tried it on an SSD-backed root it was like 10 seconds to make the links for /usr.

2. using an overlay filesystem (without root) inside the sandbox

This is based off the recipie in this comment: https://github.com/containers/bubblewrap/issues/412#issuecomment-812237642

#!/bin/bash

xhost +si:localuser:$USER

cat > /tmp/bwrap_overlayfs_wrapper << 'EOF'
#!/bin/bash
set -euo pipefail

for p in etc usr var opt; do
    ls /etc/resolv.conf  # Something to test access as the changes are being made
    echo "overlaying ${p}"
    mkdir -p /tmp/${p}_work
    mkdir -p /tmp/${p}_upper
    mount -t overlay -o lowerdir=/${p},upperdir=/tmp/${p}_upper,workdir=/tmp/${p}_work none /${p}
done

cleanup() {
    for p in usr etc var opt; do
        umount /${p}
    done
}
trap cleanup EXIT

# Drop capabilities that should have been given to the wrapper then execute the original program
(cd "$(pwd)" && capsh --drop=CAP_SYS_ADMIN --drop=CAP_SETPCAP --drop=CAP_DAC_OVERRIDE --caps="" --shell=/usr/bin/env -- -- "$@")
EOF
chmod +x /tmp/bwrap_overlayfs_wrapper

bwrap \
    --symlink /usr/bin /bin \
    --symlink /usr/bin /sbin \
    --symlink /usr/lib /lib \
    --symlink /usr/lib64 /lib64 \
    --ro-bind /usr /usr \
    --ro-bind /etc /etc \
    --ro-bind /var/lib/pacman /var/lib/pacman \
    --ro-bind /var/cache/pacman /var/cache/pacman \
    --ro-bind /opt/cuda /opt/cuda \
    --tmpfs /tmp \
    --proc /proc \
    --dev-bind /dev /dev \
    --bind /sys /sys \
    --dir "$XDG_RUNTIME_DIR" \
    --ro-bind /tmp/.X11-unix /tmp/.X11-unix \
    --ro-bind "$XDG_RUNTIME_DIR/pipewire-0" "$XDG_RUNTIME_DIR/pipewire-0" \
    --ro-bind "$XDG_RUNTIME_DIR/pulse" "$XDG_RUNTIME_DIR/pulse" \
    --unshare-all \
    --share-net \
    --die-with-parent \
    --new-session \
    --bind $HOME/Machines/steam $HOME \
    --chdir / \
    --ro-bind /tmp/bwrap_overlayfs_wrapper /tmp/bwrap_overlayfs_wrapper \
    --ro-bind-data 3 "/setup.sh" \
    --cap-add CAP_SETPCAP --cap-add CAP_DAC_OVERRIDE --cap-add CAP_SYS_ADMIN \
    -- /tmp/bwrap_overlayfs_wrapper \
    /usr/bin/bash ./setup.sh \
3<<EOF
#!/bin/bash
# fakechroot fakeroot pacman -S steam lib32-nvidia-utils lib32-vulkan-radeon lib32-sdl --noconfirm --needed
# dbus-run-session -- steam
bash
EOF

In the bwrap_overlayfs_wrapper script, it prints the following:

/etc/resolv.conf
overlaying etc
/etc/resolv.conf
overlaying usr
/tmp/bwrap_overlayfs_wrapper: line 5: /usr/bin/ls: No such file or directory

I find that once /usr is swapped out, we lose access to it. Strange though we can still access /etc after swapping that out.

But this method seems useless because you don't have permission to write to the overlay, even if the upperdir and mountpoint are owned by you.

I wonder if some of the user mapping tricks mentioned here would solve anything.

3. hard linking all files from the host into the sandbox, but changing folder permissions to allow new files to be written freely

The setup is to run this as root

uid=1000
dst=/mnt/hroot
rm -r $dst
mkdir $dst
for src in usr etc; do
    echo "$src"
    ## 1. Hard link everything
    time cp -arl --parents /$src $dst
    ## 2. The directories aren't hard links, so we can change the perms on them
    ##    For anything that isn't protected, give ownership to $uid
    ##    For simplicty, only give xx5 (e.g. 755) files to $uid
    find "$dst/$src" -type d | xargs stat -c '%n %a' | grep -v '5$' | awk -v dst="$dst" -v src="$src" '{ sub("^" dst "/" src, ""); print $1}' | rsync -rI --chown=$uid:$uid --exclude-from=- --include='*/' --exclude='*' "$dst/$src/" "$dst/$src/"
done

## Make a real copy of the pacman database for the sandbox to use and overwrite
cp -r --parents /var/lib/pacman $dst
chown -R $uid:$uid $dst/var

## Make other random files writable as needed
cp /etc/ld.so.cache $dst/etc/ld.so.cache
chown $uid:$uid $dst/etc/ld.so.cache

It's similar to the symlinking way, but with hardlinks. Only root can make the hardlinks unfortunately. Since the folders created by cp -arl are not links (folders can't be hardlinks) we are free to change their ownership.

Almost all the folders in /usr and /etc have 0755 permissions. The find/grep/awk/rsync command changes the ownership of all 0755-permissioned folders to $uid. It works by finding the folders where the last permission digit is not 5 and excluding them from the chown operation. And the chown is being done with rsync and some include/exclude magic. That rsync command is something to be sure of to make sure you aren't chowning secret files that the user shouldn't access (e.g. /etc/sudoers.d, /etc/credstore, ...).

After this, the result is something very similar to the symlink method, but any sandbox created by steam will work because we have hard links instead of symlinks. The bwrap command for the user is:

bwrap \
    --symlink /usr/bin /bin \
    --symlink /usr/bin /sbin \
    --symlink /usr/lib /lib \
    --symlink /usr/lib64 /lib64 \
    --bind /mnt/hroot/usr /usr \
    --bind /mnt/hroot/etc /etc \
    --bind /mnt/hroot/opt /opt \
    --bind /mnt/hroot/var/lib/pacman /var/lib/pacman \
    --bind /var/cache/pacman /var/cache/pacman \
    --tmpfs /tmp \
    --proc /proc \
    --dev-bind /dev /dev \
    --bind /sys /sys \
    --dir "$XDG_RUNTIME_DIR" \
    --ro-bind /tmp/.X11-unix /tmp/.X11-unix \
    --ro-bind "$XDG_RUNTIME_DIR/pipewire-0" "$XDG_RUNTIME_DIR/pipewire-0" \
    --ro-bind "$XDG_RUNTIME_DIR/pulse" "$XDG_RUNTIME_DIR/pulse" \
    --ro-bind /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf \
    --unshare-all \
    --share-net \
    --die-with-parent \
    --new-session \
    --bind $HOME/Machines/steam $HOME \
    --chdir $HOME \
    -- bash -c "fakechroot fakeroot pacman -S steam lib32-nvidia-utils lib32-vulkan-radeon lib32-sdl --noconfirm --needed; dbus-run-session -- steam"

Thoughts

Too bad there's no way to easily create an overlay over files owned by root and to have the merged filesystem be writable by the user. The user instead has to either copy all the files it has access to, or use symlinks.

A custom fuse filesystem for sure could do it though. That could be interesting to try with python-fuse bindings. It might be something the existing fuse filesystems can do already, but I haven't checked.

The hardlink way feels a bit scary by doing those chown operations on the folders, and yeah against the point of avoiding using root. Even though it's set up properly, any change has a risk of making privileged files available to the sandbox. But at least those chowns are only on the folders. Even if we chowned all the folders, it's likely all files in a 700 folder are also 700, so you would get to see file names but not read the contents of the secrets.

Does /var/cache/pacman actually need to be owned by root? Every pacman -S verifies the gpg signatures, so I wonder why not make the cache user-writable. If someone wanted to compromise a package, it should be stopped by the signature check I think. If this can be owned by the user, then it's easier to share for sandboxes to write into. Alternatively, just do all the caching of packages on the host with sudo pacman -S --downloadonly, and read-only bind the cache into the container. Then you never actually execute any code from the package on the host (e.g. when a PKGBUILD contains scriptlet or post-install hooks) and it stays isolated in your sandbox. Another alternative could be to create a uid specifically to own the pacman cache. Then allow that to be used with bwrap, and it isn't as privileged as full on root.

digitalsignalperson commented 8 months ago

I can confirm that for 1. symlinking all files from the host, replacing /rootfs with /usr/rootfs does the trick for this case with steam games.