opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.86k stars 2.11k forks source link

nsexec: moving as much as we can to Go #3951

Open cyphar opened 1 year ago

cyphar commented 1 year ago

Here is a list of things that nsexec does today, and a description of possible ideas and challenges to moving them to Go:


Agree with @cyphar -- if we can do it in Go, we should do it in Go.

Overall I very much hope we'll eventually be able to do all of it in Go. For example, with cgroupfd support in the kernel (since v5.7) and golang stdlib (since 1.20), we can enter cgroups way easier.

Originally posted by @kolyshkin in https://github.com/opencontainers/runc/issues/3943#issuecomment-1659392866

cyphar commented 1 year ago

@kolyshkin Never say never, and I would love to remove all the C code from our codebase, but I'm not sure if even on newer kernels and with the newest stdlib we will be able to do that (at the very least I don't think Go has handling for newuidmap -- though this could of course be added). I can come up with a list of things to do in a separate issue if you want to have a chat about the problem. For one thing, I think that (for performance and security reasons) we almost certainly want to implement the runc userns creation for mount_setattr(2) in CGO as a (slightly unsafe) fork. CLONE_INTO_CGROUP is something we might want but as I mentioned in #3931, cgroupv2 doesn't migrate memory usage when moving cgroups, so if we use CLONE_INTO_CGROUP we will need to also move the ensure_cloned_binary() logic out of runc init -- though we can always implement in Go so this is probably not that big of a deal.

Originally posted by @cyphar in https://github.com/opencontainers/runc/issues/3943#issuecomment-1659409419

lifubang commented 5 months ago

I have a similar work in local, I think I have completed most of the work, once I have refactored all my code, I'll open a PR. I have moved all stage-0 and stage-2 to go code, and only leave stage-1 in c. When I was working, I find some limitations in go:

  1. After we unshare/setns a new pid ns in cgo, the program will be crashed when entering the go routine from cgo;
  2. After we using setns to join the init process's mount ns, the program will be crashed when entering the go routine from cgo.

For the second problem, I have opened a issue in go to discuss: (https://github.com/golang/go/issues/67653). The the first problem, I can't find the core reason, but I have moved the pid ns setup in go routine, but it seems ugly. Welcome more suggestions.