podman: issues running applications in containers from non-root accounts

flyn-org commented 3 years ago

Maintainer: @dangowrt Environment: x86_64 master

Description: The podman utility is meant to allow non-root users to run applications in containers. This bug is meant to track things preventing this from working on OpenWrt. (Note there is also ongoing work on uxc.)

The default OpenWrt kernel does not contain support for EXT4 security labels (EXT4_FS_SECURITY), and this seems to cause podman to fail under some circumstances. I reported this upstream, and upstream merged a fix. See issue #9687 and pull request #851. For reference, here is the corresponding error:
```
Error: error creating container storage: error creating an ID-mapped copy of layer "[hash]": exit status 1: error during chown: storage-chown-by-maps: lgetxattr bin: operation not supported
```
/etc/containers/* is not readable by anyone but root. Fedora lets non-root users read files in this directory, so we could probably do the same on OpenWrt.
Non-root users cannot write to /sys/fs/cgroup/*. I am not sure how to safely handle this, and I have not yet figured out how other distributions do it. I am still trying to find information about how this is done.
Running podman run ... wants to mount /proc and so on in the container. This fails when run as non-root with:
```
mounting "/proc" to rootfs at "/proc" caused: operation not permitted
```
This might be related to the user namespaces, but I am not yet sure of this. I have installed shadow-newuidmap and -newgidmap, and I think I have set /etc/subid and /etc/subgid properly.
/var/tmp does not permit non-root users to write, and it does not bear the sticky bit. Strange.

For reference, I found the following articles helpful for understanding how to build namespaces on Linux:

https://medium.com/@teddyking/linux-namespaces-850489d3ccf https://medium.com/@teddyking/namespaces-in-go-basics-e3f0fc1ff69a https://medium.com/@teddyking/namespaces-in-go-user-a54ef9476f2a https://medium.com/@teddyking/namespaces-in-go-reexec-3d1295b91af8 https://medium.com/@teddyking/namespaces-in-go-mount-e4c04fe9fb29 https://medium.com/@teddyking/namespaces-in-go-network-fdcf63e76100 https://medium.com/@teddyking/namespaces-in-go-uts-d47aebcdf00e

dangowrt commented 3 years ago

I'm working on updating to podman-3.0.1 here: https://github.com/openwrt/packages/pull/15264

Regarding the issues you discovered when using rootless containers:

-> should be solved by updating to podman-3.0.1
-> I will address permission for /etc/containers/ in the package Makefile
-> I guess this would need some mediator process running as root doing that on behalf of the user (within some constraints, of course)
-> Regarding mounting /proc: I've struggled for weeks until I figured that one and it works now when using uxc. Quite a few things need to fall in place for that to work. See also man 7 user_namespaces (Effect of capabilities within a user namespace).
Weird, but I also don't fully understand why it would be required. Usually I use private tmpfs for containers (mounting additional tmpfs works non-root)

flyn-org commented 3 years ago

@dangowrt, @oskarirauta: Quick update, referencing the item numbers in the initial comment above:

Appears fixed by the 3.1.1 podman package.
Please see https://github.com/openwrt/packages/pull/15673 and https://github.com/openwrt/packages/pull/15889.
I am still trying to figure this out.
@dangowrt, would you be willing to elaborate how to use uxc, as alluded to in your comment above?
Solved by running, e.g., TMPDIR=/tmp podman build ..., but requires a fix for podman: https://github.com/containers/podman/issues/10698.

njhsi commented 3 years ago

Thanks for all the concluded information of rootless podman on openwrt!

And, if any progress or status now? especially the 4) /proc issue.

flyn-org commented 3 years ago

Update:

Remains fixed.
Now fixed: I confirmed merging #15673 and #15889 fixed this.
No change.
Presently working with upstream; see https://github.com/containers/podman/issues/10713. Help welcome!
Fixed upstream (https://github.com/containers/podman/issues/10698). Requires OpenWrt package update; see https://github.com/openwrt/packages/pull/16547.

flyn-org commented 3 years ago

Update:

Remains fixed.
Remains fixed.
No change.
Presently working with upstream; see https://github.com/containers/podman/issues/10713. Help welcome!
Fixed by https://github.com/openwrt/packages/pull/16547.

bmansvk commented 2 years ago

I am facing this issue with OpenWRT 21.02.1 and podman 3.4.1 from the distribution. Running rootless containers fails with /proc mount permission denied.

Is something new about this? Upstream podman issue seems to be closed without any resolution: https://github.com/containers/podman/issues/10713

flyn-org commented 2 years ago

Nothing new. The current theory is that a difference between the OpenWrt kernel configuration and the configurations found in conventional distributions is causing rootless podman to fail on OpenWrt. This is plausible, but I have so far found little documentation on which kernel facilities rootless podman relies on.

dangowrt commented 2 years ago

I'm also seeing this problem when using procd-ujail and uxc as OCI run-time, mounting /proc after CLONE_NEWUSER return -EPERM eventhough all documented requirements are met. I also guess it's build-time kernel configuration which may be causing this and haven't yet figured out what it may be :cry:

flyn-org commented 2 years ago

@dangowrt perhaps this has something to do with Linux (access-control) capabilities? Just brain storming here. I think I am repeating some of your earlier comments.

I have been reading https://github.com/util-linux/util-linux/blob/master/sys-utils/unshare.c, trying to make sense of how non-root can mount /proc in any case.

Is it possible the problem has to do with OpenWrt's support for capabilities? I know that OpenWrt compiles the kernel's ext4 module without support for security extended attributes, but I thought OpenWrt's kernel supported capabilities themselves by default. (Can you ever disable capability support in the kernel?) I thought that unshare might exec a (missing) system utility to set capabilities, but it only seems to exec its final argument.

The unshare man page contains "ensure that capabilities granted in the user namespace are preserved in the child process."

The user_namespaces man page has a section titled "Capabilities," which seems relevant. (You already noted this.)

Other than that, I wonder about musl. Is there something wrong there? The unshare utility uses capset and capget, and the man pages seem to indicate other interfaces are more portable. I don't quite see anything wrong yet.

Fedora:

$ unshare -UrmunipCf bash
$ mkdir /tmp/mount
$ mount -t proc none /tmp/mount
$ cat 169666/status | grep Cap    # bash
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

OpenWrt:

$ unshare -UrmunipCf sh
$ mkdir /tmp/mount
$ mount -t proc none /tmp/mount
mount: permission denied (are you root?)
$ cat /proc/7697/status | grep Cap    # sh
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

dangowrt commented 2 years ago

I used strace to see the syscalls actually thrown at the kernel, and they do make sense but kernel replies -EPERM. So I don't think we need to look into musl being the cause.

flyn-org commented 2 years ago

It seems we need to trace into the kernel. I was looking at kernel source last night, but I don't see anything yet. Also, strace would not reveal faulty capability inheritance, right? It is as if some policy within the kernel is out-of-whack.

dangowrt commented 2 years ago

Capabilities are fine, just like you have documented above already. I have also already verified that namespace ownership is not the problem, as that was the other requirement mentioned in the documentation (I've ported ns_show to OpenWrt to be able to do so)

oskarirauta commented 2 years ago

Most likely it is a issue with namespacing - on centos, even with a much older podman, pid namespaces were separated, meaning if I had a container running nginx, I was not able to see nginx process unless in container's namespace. Now with even currently latest package of podman, with renewed container configuration, when explicitly set to user private PID Namespace, still, I can see nginx (or whatever) pid from openwrt shell with ps. If I remember correctly, it doesn't even matter if you use full ps or one with BusyBox, result is the same.

Although issue described here does not have a lot to do with pid namespacing, this tells that there is a glitch with namespaces;

One could try with cname v2.. Because most other systems use that.. I tried it once but there was a problem of some kind so I reverted back to v1 and learned to live with what I have.

Also musl should not be a problem although it's described as a patched problem in Podman's issue tracker, as alpinelinux has these issues sorted out and uses musl.

This is probably not related to the issue, but ownerships... Openwrt is mostly designed for routers and even though it's Linux system, as a clean install it has quite poor support for other users besides root, as when used on a "basic router" - there is a limited space available and so unnecessary stuff is left out - like user management which of course is available as package. I wonder if there's some crippling there... But most likely not. 99% sure about that..

dangowrt commented 2 years ago

pid namespace ownership was also my guess. You may try with the ns_show tool which can help to inspect namespace ownership, I previously copy&pasted it from a manpage and packaged it for OpenWrt for exactly this same bug hunt openwrt-ns_show.tar.gz

Regarding lack of user management: Apart from lack of common Linux userspace tools multiuser support is very integrated into OpenWrt, to the level of having ACLs for ubus and rpcd based on user/group. Also many services now run as their own user instead of running as root. What we do lack, however, is any facility to handle /etc/subuid and /etc/subgid and I'm also not sure if anything more needs to be done about those files (in musl or busybox, for example). Having a talk with the folks at Alpine Linux may also help to figure out whether it's an issues with OpenWrt-specific kernel configuration or if the root of the problem is somewhere in musl/busybox userland (or busybox-configuration even...)

oskarirauta commented 2 years ago

There's also larger namespacing issues with Openwrt/containers - also private networking (rootfull) is broken. Connection from sub namespace to master can't be stopped without firewall rules.

dangowrt commented 2 years ago

Network namespaces in general work fine and are isolated properly, I'm using that every day with procd-ujail and uxc. To allow traffic from sub-namespace to parent, veth interface needs to be created, otherwise no route exists. Probably the problem is specific to cni...

alexojegu commented 10 months ago

I don't know if it can help or if it has no relation: https://forum.openwrt.org/t/lxc-unprivileged-container-no-uid-mapping-for-container-root/37778/3

But can't launch container,may be get error like this:
lxc_mount_auto_mounts:810 - Operation not permitted - Failed to mount "proc"
so remount /proc and /sys width relatime is need, do
/usr/bin/mount -t sys sys -o remount,rw,nosuid,nodev,noexec,relatime /sys
/usr/bin/mount -t proc proc -o remount,rw,nosuid,nodev,noexec,relatime /proc
then you can launch container.

openwrt / packages

podman: issues running applications in containers from non-root accounts #15096