Open flyn-org opened 3 years ago
I'm working on updating to podman-3.0.1 here: https://github.com/openwrt/packages/pull/15264
Regarding the issues you discovered when using rootless containers:
/etc/containers/
in the package Makefile/proc
: I've struggled for weeks until I figured that one and it works now when using uxc
. Quite a few things need to fall in place for that to work. See also man 7 user_namespaces
(Effect of capabilities within a user namespace).@dangowrt, @oskarirauta: Quick update, referencing the item numbers in the initial comment above:
uxc
, as alluded to in your comment above?TMPDIR=/tmp podman build ...
, but requires a fix for podman: https://github.com/containers/podman/issues/10698.Thanks for all the concluded information of rootless podman on openwrt!
And, if any progress or status now? especially the 4) /proc issue.
Update:
Update:
I am facing this issue with OpenWRT 21.02.1 and podman 3.4.1 from the distribution. Running rootless containers fails with /proc mount permission denied.
Is something new about this? Upstream podman issue seems to be closed without any resolution: https://github.com/containers/podman/issues/10713
Nothing new. The current theory is that a difference between the OpenWrt kernel configuration and the configurations found in conventional distributions is causing rootless podman to fail on OpenWrt. This is plausible, but I have so far found little documentation on which kernel facilities rootless podman relies on.
I'm also seeing this problem when using procd-ujail
and uxc
as OCI run-time, mounting /proc
after CLONE_NEWUSER return -EPERM eventhough all documented requirements are met. I also guess it's build-time kernel configuration which may be causing this and haven't yet figured out what it may be :cry:
@dangowrt perhaps this has something to do with Linux (access-control) capabilities? Just brain storming here. I think I am repeating some of your earlier comments.
I have been reading https://github.com/util-linux/util-linux/blob/master/sys-utils/unshare.c, trying to make sense of how non-root can mount /proc in any case.
Is it possible the problem has to do with OpenWrt's support for capabilities? I know that OpenWrt compiles the kernel's ext4 module without support for security extended attributes, but I thought OpenWrt's kernel supported capabilities themselves by default. (Can you ever disable capability support in the kernel?) I thought that unshare might exec a (missing) system utility to set capabilities, but it only seems to exec its final argument.
The unshare man page contains "ensure that capabilities granted in the user namespace are preserved in the child process."
The user_namespaces man page has a section titled "Capabilities," which seems relevant. (You already noted this.)
Other than that, I wonder about musl. Is there something wrong there? The unshare utility uses capset and capget, and the man pages seem to indicate other interfaces are more portable. I don't quite see anything wrong yet.
Fedora:
$ unshare -UrmunipCf bash
$ mkdir /tmp/mount
$ mount -t proc none /tmp/mount
$ cat 169666/status | grep Cap # bash
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
OpenWrt:
$ unshare -UrmunipCf sh
$ mkdir /tmp/mount
$ mount -t proc none /tmp/mount
mount: permission denied (are you root?)
$ cat /proc/7697/status | grep Cap # sh
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
I used strace
to see the syscalls actually thrown at the kernel, and they do make sense but kernel replies -EPERM. So I don't think we need to look into musl being the cause.
It seems we need to trace into the kernel. I was looking at kernel source last night, but I don't see anything yet. Also, strace would not reveal faulty capability inheritance, right? It is as if some policy within the kernel is out-of-whack.
Capabilities are fine, just like you have documented above already. I have also already verified that namespace ownership is not the problem, as that was the other requirement mentioned in the documentation (I've ported ns_show
to OpenWrt to be able to do so)
Most likely it is a issue with namespacing - on centos, even with a much older podman, pid namespaces were separated, meaning if I had a container running nginx, I was not able to see nginx process unless in container's namespace. Now with even currently latest package of podman, with renewed container configuration, when explicitly set to user private PID Namespace, still, I can see nginx (or whatever) pid from openwrt shell with ps. If I remember correctly, it doesn't even matter if you use full ps or one with BusyBox, result is the same.
Although issue described here does not have a lot to do with pid namespacing, this tells that there is a glitch with namespaces;
One could try with cname v2.. Because most other systems use that.. I tried it once but there was a problem of some kind so I reverted back to v1 and learned to live with what I have.
Also musl should not be a problem although it's described as a patched problem in Podman's issue tracker, as alpinelinux has these issues sorted out and uses musl.
This is probably not related to the issue, but ownerships... Openwrt is mostly designed for routers and even though it's Linux system, as a clean install it has quite poor support for other users besides root, as when used on a "basic router" - there is a limited space available and so unnecessary stuff is left out - like user management which of course is available as package. I wonder if there's some crippling there... But most likely not. 99% sure about that..
pid namespace ownership was also my guess. You may try with the ns_show
tool which can help to inspect namespace ownership, I previously copy&pasted it from a manpage and packaged it for OpenWrt for exactly this same bug hunt openwrt-ns_show.tar.gz
Regarding lack of user management: Apart from lack of common Linux userspace tools multiuser support is very integrated into OpenWrt, to the level of having ACLs for ubus and rpcd based on user/group. Also many services now run as their own user instead of running as root.
What we do lack, however, is any facility to handle /etc/subuid
and /etc/subgid
and I'm also not sure if anything more needs to be done about those files (in musl or busybox, for example).
Having a talk with the folks at Alpine Linux may also help to figure out whether it's an issues with OpenWrt-specific kernel configuration or if the root of the problem is somewhere in musl/busybox userland (or busybox-configuration even...)
There's also larger namespacing issues with Openwrt/containers - also private networking (rootfull) is broken. Connection from sub namespace to master can't be stopped without firewall rules.
Network namespaces in general work fine and are isolated properly, I'm using that every day with procd-ujail and uxc. To allow traffic from sub-namespace to parent, veth
interface needs to be created, otherwise no route exists. Probably the problem is specific to cni
...
I don't know if it can help or if it has no relation: https://forum.openwrt.org/t/lxc-unprivileged-container-no-uid-mapping-for-container-root/37778/3
But can't launch container,may be get error like this:
lxc_mount_auto_mounts:810 - Operation not permitted - Failed to mount "proc"
so remount /proc and /sys width relatime is need, do
/usr/bin/mount -t sys sys -o remount,rw,nosuid,nodev,noexec,relatime /sys /usr/bin/mount -t proc proc -o remount,rw,nosuid,nodev,noexec,relatime /proc
then you can launch container.
Maintainer: @dangowrt Environment: x86_64 master
Description: The podman utility is meant to allow non-root users to run applications in containers. This bug is meant to track things preventing this from working on OpenWrt. (Note there is also ongoing work on uxc.)
The default OpenWrt kernel does not contain support for EXT4 security labels (EXT4_FS_SECURITY), and this seems to cause podman to fail under some circumstances. I reported this upstream, and upstream merged a fix. See issue #9687 and pull request #851. For reference, here is the corresponding error:
/etc/containers/* is not readable by anyone but root. Fedora lets non-root users read files in this directory, so we could probably do the same on OpenWrt.
Non-root users cannot write to /sys/fs/cgroup/*. I am not sure how to safely handle this, and I have not yet figured out how other distributions do it. I am still trying to find information about how this is done.
Running
podman run ...
wants to mount /proc and so on in the container. This fails when run as non-root with:This might be related to the user namespaces, but I am not yet sure of this. I have installed shadow-newuidmap and -newgidmap, and I think I have set /etc/subid and /etc/subgid properly.
/var/tmp does not permit non-root users to write, and it does not bear the sticky bit. Strange.
For reference, I found the following articles helpful for understanding how to build namespaces on Linux:
https://medium.com/@teddyking/linux-namespaces-850489d3ccf https://medium.com/@teddyking/namespaces-in-go-basics-e3f0fc1ff69a https://medium.com/@teddyking/namespaces-in-go-user-a54ef9476f2a https://medium.com/@teddyking/namespaces-in-go-reexec-3d1295b91af8 https://medium.com/@teddyking/namespaces-in-go-mount-e4c04fe9fb29 https://medium.com/@teddyking/namespaces-in-go-network-fdcf63e76100 https://medium.com/@teddyking/namespaces-in-go-uts-d47aebcdf00e