Closed crosbymichael closed 8 years ago
cc @tonistiigi
Just reading through the kernel, there's only two places where this could come from:
SYSCALL_DEFINE2(capset, cap_user_header_t, header, const cap_user_data_t, data)
{
/* ... */
if (get_user(pid, &header->pid))
return -EFAULT;
/* may only affect current now */
if (pid != 0 && pid != task_pid_vnr(current))
return -EPERM;
/* ... */
}
Which is always applicable, and this:
int cap_capset(struct cred *new,
const struct cred *old,
const kernel_cap_t *effective,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted)
{
if (cap_inh_is_capped() &&
!cap_issubset(*inheritable,
cap_combine(old->cap_inheritable,
old->cap_permitted)))
/* incapable of using this inheritable set */
return -EPERM;
if (!cap_issubset(*inheritable,
cap_combine(old->cap_inheritable,
old->cap_bset)))
/* no new pI capabilities outside bounding set */
return -EPERM;
/* verify restrictions on target's new Permitted set */
if (!cap_issubset(*permitted, old->cap_permitted))
return -EPERM;
/* verify the _new_Effective_ is a subset of the _new_Permitted_ */
if (!cap_issubset(*effective, *permitted))
return -EPERM;
/* ... */
}
Which is only applicable if you have the security module enabled.
I'm not sure if we lock to an OS thread when doing capabilities setting (we should be), but the first case will be triggered if we're switched between physical OS threads -- it's a bit weird this happens on ARM maybe the scheduler acts differently. This only applies if the library we use doesn't set pid = 0
(which implies the current process).
I got the same error on the same spot when I tried to enable user namespace on RHEL7, and that's because we lack of an upstream patch for making capability userns aware.
Your problem is likely not the same, but it also looks like a kernel bug, so give the kernel version would help.
@cyphar I think https://github.com/opencontainers/runc/blob/master/libcontainer/capabilities_linux.go#L68
this line will also call cap_capget
and cap_task_prctl
-> cap_prctl_drop
in kernel.
So it's also possible that the line return EPERM would be:
869 static int cap_prctl_drop(unsigned long cap)
870 {
871 struct cred *new;
872
873 if (!ns_capable(current_user_ns(), CAP_SETPCAP))
874 return -EPERM; // <----- This line
875 if (!cap_valid(cap))
876 return -EINVAL;
877
878 new = prepare_creds();
879 if (!new)
880 return -ENOMEM;
881 cap_lower(new->cap_bset, cap);
882 return commit_creds(new);
883 }
By my logs it seems to be coming from capset
https://github.com/opencontainers/runc/blob/master/Godeps/_workspace/src/github.com/syndtr/gocapability/capability/capability_linux.go#L445
We do seem to lock on a os thread in https://github.com/opencontainers/runc/blob/master/start.go#L93 . It may be happening on arm just because our arm boxes are much slower, so more likely to get unscheduled. The testcase doesn't have userns enabled.
root@arm-8:~/runc# uname -a
Linux arm-8 4.3.3-docker-1 #1 SMP Wed Jan 20 13:31:30 UTC 2016 armv7l armv7l armv7l GNU/Linux
root@arm-8:~/runc# zgrep CONFIG_SECURITY /proc/config.gz
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
We have a weird error coming from: https://github.com/opencontainers/runc/blob/master/libcontainer/capabilities_linux.go#L68
on arm systems but its not consistent. We get an EPERM from this like once every 100-1000 containers. Anyone have any ideas?