opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.89k stars 2.11k forks source link

Improve libcontainer's systemd cgroup driver #2007

Open filbranden opened 5 years ago

filbranden commented 5 years ago

I'm opening this issue to track improvements on systemd cgroup driver I and some others are working on.

The main motivations to improve the systemd cgroup driver are:

The steps I have in mind for this series of refactorings is:

  1. Set systemd unit properties rather than writing to the cgroup tree. PR #1991 is my WIP towards that goal. The idea is to extend the API, so it allows encoding systemd property settings that reflect the configuration, and then migrate all controllers to set systemd properties rather than write directly to the cgroup subtree.
  1. Support working under an unified hierarchy. I have a branch which does most of that (assuming step 1 has been addressed, I'm taking some shortcuts there.) Setting properties goes through systemd, so paths don't matter there. Reading stats goes through the tree, but it's easy to detect which hierarchy is in use and to adjust paths to match the correct one.

  2. Create sub-cgroups of a delegated systemd scope, to pass it to containers. This follows the recommendations from systemd, which are also based on the cgroupv2 implementation (particularly the workings of cgroup.subtree_control), so having libcontainer support this delegation protocol would definitely be advantageous.

One large concern here is testing. I was talking to @mrunalp about this and he offered to help. Since running this setup on a container can be challenging, he suggested using an external test runner, possibly monitoring GitHub PRs and reporting changes back to GitHub. We'll be working on this testing and will report back here when we have more details. (We might open separate issues to track testing on systemd as well.)

I'm planning to join the OCI call on Wednesday March 13th (also to discuss related opencontainers/runtime-spec#1005 and opencontainers/runtime-spec#1002), so I'm happy to talk more about this libcontainer systemd cgroup driver improvement proposal.

filbranden commented 5 years ago

The testing introduced in #2014, which runs on Travis-CI, which happens to use Ubuntu Xenial, which includes systemd v229 (much older than the one I was testing with) made it clear that it's hard to have libcontainer make a heavier use of systemd properties in StartTransientUnit while maintaining compatibility with older versions of systemd, which is an objective to be preserved.

So I'm changing my approach slightly here. Instead of refactoring the current code, I'll write a new implementation that works in unified cgroup hierarchy only and then use that code only when unified cgroup hierarchy is detected, falling back to the current code in case the hybrid or legacy hierarchy is in use.

The issue with backward compatibility of properties is still a relevant issue, so I opened issue systemd/systemd#12460 requesting a better solution for that. True, that will only be available in systemd v243 or v244, but since no mainstream Linux distributions have switched to using unified hierarchy by default yet and by the time they do they'll probably have a systemd version that includes that code, we should be OK (in any case, the libcontainer code will implicitly check that a systemd version that includes that feature will be running, so while "unified cgroup hierarchy" is the trigger, there are other components that need to be in place for this all to work correctly.)

This also helps answer the question of having all the features supported in the cgroup2 mode (such as cpuset and freezer cgroup), since we will only activate the new implementation once all the components are in place (kernel 5.2 for freezer support in cgroup2, systemd v244 or maybe v245 to include the D-Bus RPCs to expose cpuset and freezer, etc.) Again, using "unified cgroup hierarchy" as the trigger, but knowing there are more dependencies that need to be in place for that to work.

So I'll close #1991 for now, since the final approach will be different. I opened #2047 to push the constructor of the systemd cgroup driver into the systemd module itself, that way when there are two separate implementations of it (one for "legacy" and another for "unified"), that point can serve as detection to make the decision.

Thanks! Filipe

AkihiroSuda commented 5 years ago

What's the next step?

AkihiroSuda commented 5 years ago

@filbranden For unified mode, I think we only care really recent version of systemd. Maybe #1991 can be reopened?