opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.9k stars 2.11k forks source link

libct/cg: openat2 optimization not working across container boundaries #3026

Open MikeDombo opened 3 years ago

MikeDombo commented 3 years ago

Hi, I'm using libcontainer to create my container and I found a regression when Ubuntu updated from kernel 5.4 to kernel 5.8 on Thursday/Friday. I'm currently seeing process_linux.go:385: applying cgroup configuration for process caused: openat2 /sys/fs/cgroup/cpuset/<container id>/cpuset.cpus: no such file or directory. On kernel 5.4 using the exact same code it work fine. I edited libcontainer to do a directory listing and I can verify that the cpuset.cpus "file" does exist as it should, however it is unable to be opened by libcontainer. I am able to run cat on the path without error from outside the container.

Any pointers would be incredibly helpful, thank you very much.

cyphar commented 3 years ago

It seems that the issue is that openat2 is available after the update (openat2 was added in Linux 5.6). However this code definitely works on both old and new kernels, so it's a bit puzzling why it's failing for you. Does your code mess around with open files at all? With openat2, we cache a file descriptor to /sys/fs/cgroup -- so if some of your code is dup-ing over that file descriptor with another file descriptor, you're going to end up with the cgroup2 code trying to open /some/random/file/cpuset/container-id/cpuset.cpus which will either fail or cause security issues.

(As an aside @kolyshkin I think that the error message in the openat2 code should be changed to better describe the path being opened -- readlink of the fd would be a good idea. I'll write a patch for that.)

cyphar commented 3 years ago

Hi, it seems that you've opened an issue that is in relation to using github.com/opencontainers/runc/libcontainer directly from your own Go code. While we may do our best to help you solve your issue, we do not support this. In addition, our Semantic Versioning version does not include the Go API in its scope -- meaning we are free to break the Go APIs in point releases.

The reason for this policy is that runc has fairly complicated behaviour that is shared between the runc command-line code and the internal libcontainer libraries, as well as a fairly complicated lifecycle that is very easy to not configure correctly. In addition, we have several security hardenings within runc that could be broken inadvertently by code using libcontainer and not being aware of this.

libcontainer being a public-accessible library is a historical accident and in the future we may put it entirely inside an internal package to stop it from being imported entirely. If you can use runc, or the go-runc library (which wraps the runc binary) please do so. If you cannot, please open an issue describing why you cannot use runc and we will see if it's possible to expand runc so that you can use runc directly.

NOTE: This is a saved reply. Sorry if it reads as a cookie-cutter response, it was written to explain the situation around libcontainer. If you are already aware of this, and are fine accepting the risks of using libcontainer directly, you can ignore this message.
MikeDombo commented 3 years ago

Hi @cyphar thank you very much for both of the replies.

So to test the hypothesis I changed prepareOpenat2 to always return an error so that we'd just use good ol' open and that actually solved the problem!

I don't believe we're messing around with anything, duping or otherwise, aside from running the container root in an overlayfs which is maybe related.

cyphar commented 3 years ago

What happens if you apply this patch?

diff --git a/libcontainer/cgroups/fscommon/open.go b/libcontainer/cgroups/fscommon/open.go
index e95876a21769..2ef1e15cfae8 100644
--- a/libcontainer/cgroups/fscommon/open.go
+++ b/libcontainer/cgroups/fscommon/open.go
@@ -1,6 +1,7 @@
 package fscommon

 import (
+       "fmt"
        "os"
        "strings"
        "sync"
@@ -86,7 +87,13 @@ func OpenFile(dir, file string, flags int) (*os.File, error) {
                        Mode:    uint64(mode),
                })
        if err != nil {
-               return nil, &os.PathError{Op: "openat2", Path: dir + "/" + file, Err: err}
+               procpath := fmt.Sprintf("/proc/self/fd/%d", cgroupFd)
+               realdir, _ := os.Readlink(procpath)
+               path := dir + "/" + file
+               if realdir != dir {
+                       path = fmt.Sprintf("[%s=%d!=%s]/%s", procpath, realdir, dir, file)
+               }
+               return nil, &os.PathError{Op: "openat2", Path: path, Err: err}
        }

        return os.NewFile(uintptr(fd), cgroupfsPrefix+relname), nil
MikeDombo commented 3 years ago

[/proc/self/fd/30=/!=/sys/fs/cgroup/cpuset/9ac5cec4-16ac-4fdf-4f7f-51aa982a9b7f]/cpuset.cpus: no such file or directory

So apparently what should be the cgroup path is actually /

cyphar commented 3 years ago

Hmm, so it seems that fd 30 is being swapped with a handle to / or something like that? You can try to add some similar readlink-based debugging code to prepareOpenat2 to see whether the original handle is correct.

MikeDombo commented 3 years ago

Thank you so much Aleksa, I don't want to take up more of your valuable time with this. Your pointer to look at the openat2 call certainly saved hours of work, so thank you for that! I'll keep looking into it, but worst case we just use the fallback path without openat2 which seems like it should be fine.

I'd suspect the FD is getting messed up either in entering the mount namespace or mounting the overlayfs.

kolyshkin commented 3 years ago

I'd suspect the FD is getting messed up either in entering the mount namespace or mounting the overlayfs.

Apparently we did not have such a use case in mind. runc works with cgroup files before entering any namespaces.

I guess we can check on error if cgroupFd still resolves to /sys/fs/cgroup, and reopen it if not, but that seems overly complicated. I actually wrote this but it looks borderline ugly.

@MikeDombo If the source code is available, I will be happy to help resolve this.

MikeDombo commented 3 years ago

Thank you for looking Kir. Sadly our usage isn't open source at this time. Hopefully when we do open source it we'll have moved to using runc properly instead of going directly to libcontainer.