Open cyphar opened 8 years ago
This is a currently known restriction in the kernel that you cant mount sys
without CAP_SYS_ADMIN rights. Removing sysfs mounting should allow you to start the container
I think the patch note is here:
Also discussed a bit in https://github.com/docker/docker/issues/21800
@dqminh But we're using user namespaces, so we have CAP_SYS_ADMIN
in the namespace. If you add the network namespace to the config, it works perfectly fine. I think it's more nuanced problem (possibly how we're messing around with mount options in rootfs_linux
).
But we're using user namespaces, so we have CAP_SYS_ADMIN in the namespace
That's not quite true I think. You only have CAP_SYS_ADMIN in net namespace created by the user, not when you join net namespace of the host.
Ah, you meant the user namespace that "owns" the net namespace. Okay, if that's the requirement for mounting all of /sys
(which seems odd), we'll have to not mount sysfs
. We should probably add this to the validator, so people don't run into this by accident.
I've removed sysfs
from my config and that appears to work now. Unfortunately, it looks like I still don't have network access for some reason ...
/cc @davidlt
Unfortunately, it looks like I still don't have network access for some reason ...
Hmm it should work ( at least when i tested this a few weeks ago :p ). What did you use to test network access ? ping
or anything that uses CAPNET* will not work though.
I was just using netcat
. I've had enough bad experiences with capabilities to know better than trust ping
in containers. ;)
Seems to work, at least yum makecache
worked, but I am facing issues trying to install anything useful in the container, e.g.
Running transaction
Installing : fipscheck-lib-1.4.1-5.el7.x86_64 1/3
Error unpacking rpm package fipscheck-lib-1.4.1-5.el7.x86_64
error: unpacking of archive failed on file /usr/lib64/libfipscheck.so.1;5728b733: cpio: symlink
Installing : fipscheck-1.4.1-5.el7.x86_64 2/3
Error unpacking rpm package fipscheck-1.4.1-5.el7.x86_64
error: fipscheck-lib-1.4.1-5.el7.x86_64: install failed
error: unpacking of archive failed on file /usr/bin/fipscheck;5728b733: cpio: open
error: fipscheck-1.4.1-5.el7.x86_64: install failed
groupadd: cannot open /etc/gshadow
Installing : openssh-6.6.1p1-25.el7_2.x86_64 3/3
Error unpacking rpm package openssh-6.6.1p1-25.el7_2.x86_64
error: unpacking of archive failed on file /usr/bin/ssh-keygen;5728b733: cpio: open
I guess, I have to built an image with e.g. Docker and include wanted packages.
Here is a better proof that it works. Is there a way to map /etc/resolv.conf
from the host to the container?
[davidlt@pccms205 test2]$ cat /etc/redhat-release
Fedora release 24 (Twenty Four)
[davidlt@pccms205 test2]$ runc --root $PWD start test_cont
sh-4.2# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
sh-4.2# dig google.com
; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55412
;; flags: qr rd ra; QUERY: 1, ANSWER: 15, AUTHORITY: 4, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 195 IN A 195.112.88.178
google.com. 195 IN A 195.112.88.177
google.com. 195 IN A 195.112.88.185
google.com. 195 IN A 195.112.88.179
google.com. 195 IN A 195.112.88.184
google.com. 195 IN A 195.112.88.180
google.com. 195 IN A 195.112.88.187
google.com. 195 IN A 195.112.88.181
google.com. 195 IN A 195.112.88.188
google.com. 195 IN A 195.112.88.189
google.com. 195 IN A 195.112.88.183
google.com. 195 IN A 195.112.88.175
google.com. 195 IN A 195.112.88.176
google.com. 195 IN A 195.112.88.182
google.com. 195 IN A 195.112.88.186
;; AUTHORITY SECTION:
google.com. 59409 IN NS ns2.google.com.
google.com. 59409 IN NS ns4.google.com.
google.com. 59409 IN NS ns1.google.com.
google.com. 59409 IN NS ns3.google.com.
;; ADDITIONAL SECTION:
ns1.google.com. 37866 IN A 216.239.32.10
ns2.google.com. 72394 IN A 216.239.34.10
ns3.google.com. 35936 IN A 216.239.36.10
ns4.google.com. 56592 IN A 216.239.38.10
;; Query time: 1 msec
;; SERVER: 137.138.17.5#53(137.138.17.5)
;; WHEN: Tue May 03 15:27:22 UTC 2016
;; MSG SIZE rcvd: 415
You can try bindmounting the file. You'd have to create the file in the rootfs
of your container (manually), then adding a bind
option for it in config.json
. You could also use pre-start hooks if you really wanted to just copy the file (but that would make it go out of sync).
The difficulty with unpriviledged net namespaces is with connecting them to the outside world:
$ unshare -nUfr sh
sh-4.3# ip route
sh-4.3# ip addr
1: lo:
To setup that connection, you need someone with priviledged access in the runtime namespace 1 to setup a bridge and throw one half of a veth connection over the wall (e.g. 2), or setup iptable rules, etc., etc. to connect the runtime net namespace with the container net namespace.
In the absence of such a cooperative privileged user, you can still use unprivileged net namespaces for isolated network tests (and you can probably setup subcontainers and have the unprivileged user setup bridging between those subcontainers).
Yeah, need a privileged helper for setting up veth pair to host bridge. lxc also uses a privileged helper to setup networking for unprivileged containers called lxc-user-nic
.
Ran into a similar issue when runc is given a network namespace file.
However it runs fine if either namespace file path or user namespace is removed from config.json.
Is there a work around to use network namespace created in host namespace?
$ sudo runc run hello
container_linux.go:344: starting container process caused "process_linux.go:424: container init caused "rootfs_linux.go:58: mounting "sysfs" to rootfs "/tmp/hello-world/rootfs" at "/sys" caused "operation not permitted"""
$ jq '.' config.json
{
"ociVersion": "1.0.1-dev",
"process": {
"terminal": false,
"user": {
"uid": 0,
"gid": 0
},
"args": [
"/hello"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"cwd": "/",
"capabilities": {
"bounding": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"effective": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"inheritable": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"permitted": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
],
"ambient": [
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
},
"rlimits": [
{
"type": "RLIMIT_NOFILE",
"hard": 1024,
"soft": 1024
}
],
"noNewPrivileges": true
},
"root": {
"path": "rootfs",
"readonly": true
},
"hostname": "runc",
"mounts": [
{
"destination": "/proc",
"type": "proc",
"source": "proc"
},
{
"destination": "/dev",
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
{
"destination": "/dev/pts",
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
{
"destination": "/dev/shm",
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
{
"destination": "/dev/mqueue",
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
{
"destination": "/sys",
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev",
"ro"
]
},
{
"destination": "/sys/fs/cgroup",
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
],
"linux": {
"uidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 32000
}
],
"gidMappings": [
{
"containerID": 0,
"hostID": 1000,
"size": 32000
}
],
"resources": {
"devices": [
{
"allow": false,
"access": "rwm"
}
]
},
"namespaces": [
{
"type": "pid"
},
{
"type": "network",
"path": "/var/run/netns/ns1"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "user"
}
],
"maskedPaths": [
"/proc/kcore",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/sys/firmware",
"/proc/scsi"
],
"readonlyPaths": [
"/proc/asound",
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
}
}
I discovered this while working on rootless containers. It looks like there's some issues using a non-network namespaced setup. This is also blocking rootless containers from having networking (since we need to just use host networking).
Here's the config, but the important thing to note is that I've added some dummy user namespace setup and removed the network section from
namespaces
.Blocking #774.