hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.95k stars 1.96k forks source link

Make use of systemd delegate cgroup when possible. #18211

Open shoenig opened 1 year ago

shoenig commented 1 year ago

Per https://systemd.io/CGROUP_DELEGATION/ (at the bottom)

🚫 Never create your own cgroups below arbitrary cgroups systemd manages, i.e cgroups you haven’t set Delegate= in. Specifically: 🔥 don’t create your own cgroups below the root cgroup 🔥.

Currently Nomad does exactly this - it creates the nomad.slice cgroup under the root cgroup regardless if systemd is in use or not. We should modify our linux packaging to set the Delegate in the systemd unit file so that we are in line with the expected usage of systemd.

However we'll need to continue supporting the mode of operation we have today - not all Linux operating systems use systemd (and thus have no delegate mechanism), and not all users use our Linux packaging. We'll also want to update our production documentation to make recommendations for such users.

tgross commented 3 months ago

Another challenge with setting delegation is that we have a nomad.slice and slices aren't supported for delegation; we'd need to create a scope below nomad.slice and delegate that.

tgross commented 3 months ago

This was the bit from that document:

Let’s stress one thing: delegation is available on scope and service units only. It’s expressly not available on slice units. Why? Because slice units are our inner nodes of the cgroup trees and we freely attach services and scopes to them. If we’d allow delegation on slice units then this would mean that both systemd and your own manager would create/delete cgroups below the slice unit and that conflicts with the single-writer rule.

In an experiment I'm hacking on, turns out this doesn't really matter because we're not creating a "slice unit", we're just creating our own cgroup directory that happens to be called "slice". If we did want to create a slice unit in the package (which we will if we want to allow less-privileged Nomad agents), we'd instead want to have 2 Delegate= fields pointing to the shared.slice and reserved.slice not-really-slices below that. I've smoked-tested this so far, but it needs further investigation.