opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.77k stars 2.09k forks source link

Cap Set Apply EPERM on ARM systems #684

Closed crosbymichael closed 8 years ago

crosbymichael commented 8 years ago

We have a weird error coming from: https://github.com/opencontainers/runc/blob/master/libcontainer/capabilities_linux.go#L68

on arm systems but its not consistent. We get an EPERM from this like once every 100-1000 containers. Anyone have any ideas?

crosbymichael commented 8 years ago

cc @tonistiigi

cyphar commented 8 years ago

Just reading through the kernel, there's only two places where this could come from:

SYSCALL_DEFINE2(capset, cap_user_header_t, header, const cap_user_data_t, data) 
{
    /* ... */

    if (get_user(pid, &header->pid))                                            
        return -EFAULT;

    /* may only affect current now */                                           
    if (pid != 0 && pid != task_pid_vnr(current))                               
        return -EPERM;

    /* ... */
}

Which is always applicable, and this:

int cap_capset(struct cred *new,                                                
           const struct cred *old,                                              
           const kernel_cap_t *effective,                                       
           const kernel_cap_t *inheritable,                                     
           const kernel_cap_t *permitted)                                       
{                                                                               
    if (cap_inh_is_capped() &&                                                  
        !cap_issubset(*inheritable,                                             
              cap_combine(old->cap_inheritable,                                 
                      old->cap_permitted)))                                     
        /* incapable of using this inheritable set */                           
        return -EPERM;                                                          

    if (!cap_issubset(*inheritable,                                             
              cap_combine(old->cap_inheritable,                                 
                      old->cap_bset)))                                          
        /* no new pI capabilities outside bounding set */                       
        return -EPERM;                                                          

    /* verify restrictions on target's new Permitted set */                     
    if (!cap_issubset(*permitted, old->cap_permitted))                          
        return -EPERM;                                                          

    /* verify the _new_Effective_ is a subset of the _new_Permitted_ */         
    if (!cap_issubset(*effective, *permitted))                                  
        return -EPERM;

    /* ... */
}

Which is only applicable if you have the security module enabled.

I'm not sure if we lock to an OS thread when doing capabilities setting (we should be), but the first case will be triggered if we're switched between physical OS threads -- it's a bit weird this happens on ARM maybe the scheduler acts differently. This only applies if the library we use doesn't set pid = 0 (which implies the current process).

hqhq commented 8 years ago

I got the same error on the same spot when I tried to enable user namespace on RHEL7, and that's because we lack of an upstream patch for making capability userns aware.

Your problem is likely not the same, but it also looks like a kernel bug, so give the kernel version would help.

hqhq commented 8 years ago

@cyphar I think https://github.com/opencontainers/runc/blob/master/libcontainer/capabilities_linux.go#L68 this line will also call cap_capget and cap_task_prctl -> cap_prctl_drop in kernel.

So it's also possible that the line return EPERM would be:

 869 static int cap_prctl_drop(unsigned long cap)
 870 {
 871         struct cred *new;
 872
 873         if (!ns_capable(current_user_ns(), CAP_SETPCAP))
 874                 return -EPERM;       // <----- This line
 875         if (!cap_valid(cap))
 876                 return -EINVAL;
 877
 878         new = prepare_creds();
 879         if (!new)
 880                 return -ENOMEM;
 881         cap_lower(new->cap_bset, cap);
 882         return commit_creds(new);
 883 }
tonistiigi commented 8 years ago

By my logs it seems to be coming from capset https://github.com/opencontainers/runc/blob/master/Godeps/_workspace/src/github.com/syndtr/gocapability/capability/capability_linux.go#L445

We do seem to lock on a os thread in https://github.com/opencontainers/runc/blob/master/start.go#L93 . It may be happening on arm just because our arm boxes are much slower, so more likely to get unscheduled. The testcase doesn't have userns enabled.

root@arm-8:~/runc# uname -a
Linux arm-8 4.3.3-docker-1 #1 SMP Wed Jan 20 13:31:30 UTC 2016 armv7l armv7l armv7l GNU/Linux
root@arm-8:~/runc# zgrep  CONFIG_SECURITY /proc/config.gz
# CONFIG_SECURITY_DMESG_RESTRICT is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set