System processes/daemons

maxpain commented 2 years ago

Hi. Is it possible to isolate my pods (using exclusive CPU core) from the system (host) processes/daemons, which aren't managed by Kubernetes, so no one process from the host can't interrupt my pod?

Levovar commented 2 years ago

yep! one of the main drivers behind creating Pooler is to be able to do that, both for exclusive and shared type of usage isolation goes both ways tho: you need to tell system processes to leave subset A-M alone, and you need to tell your Pods to leave N-Z subset alone (you also don't want your poll mode driver cores to interrupt system daemons)

latter is self-explanatory, you define the CPU pools as described in Pooler documentation. for the former there are multiple techniques, I personally prefer using cpusets here as well. you can create a distinct cgroup hierarchy from these daemons / processes to your liking, on a systemd managed system this would probably mean "slice"s (or you can just leave them in their default "slice" if you have one on your distro). then you can define a distinct cpuset at the top of your hierarchy either manually or via some operating system abstractions (libcgroup, CPUAffinity in systemd, probably other stuff exists on other distros I don't have deep experiences with) note that the solution heavily depends on your host OS distro and might need some scripting, but nevertheless entirely doable

maxpain commented 2 years ago

I use Talos Linux (cool thing, btw). This distro doesn't have systemd. All the system processes run in containerd. How I can pin these system processes to specific CPU cores? Or at least tell the scheduler not to run them on the same cores as my isolated pods?

maxpain commented 2 years ago

Can I just use “isolcpus” kernel arg?

TimoLindqvist commented 2 years ago

The 'isolcpus' is a valid option too.

maxpain commented 2 years ago

The problem of "isolcpus" is that these CPUs, when not exclusively allocated by "guaranteed" pods, are unavailable for other "best effort" and "burstable" pods. So it doesn't fit my needs :(

maxpain commented 2 years ago

I have this use case: We host game servers in Kubernetes clusters. They are single-threaded. In one pod, we have one container with a game server, and another is a sidecar container with some helper processes.

We also have a few daemonsets for maintenance (like logging (promtail), monitoring (kube-prometheus-stack), updating game server files, uploading game replays to the s3, and so on).

Every container with a game server (actually a Linux thread) should allocate one dedicated CPU thread and be pinned to it (to avoid context switches and CPU cache misses to make sure we have consistent latency and fps without any jitters).

I want behavior like this: For example, I have one server with Ryzen 9 5950x (16 cores / 32 threads).

During pick hours, we have 30 game servers, and all of them allocate CPU threads exclusively (one game server per one CPU thread), so all other sidecar containers, daemonsets, and system processes/daemons (including kubelet, etc.) should run on the last CPU core and never schedule to first 15 CPU cores.
During the period of low online, we have, for example, 10 game servers that allocate 10 CPU threads (5 CPU cores). Other 11 CPU cores should be available for those system processes/daemons, daemonsets, etc.

Any ideas on how to achieve this behavior?

Levovar commented 2 years ago

without going into much details I think the summary of your requirements are better served by CPU Manager, and not Pooler (1: you want dynamic pools, 2: you dont seem to mind sharing cores between system threads and utility workloads)

you should configure your Kubelet to run CPU Manager with the static profile, and have your game server Pods satisfy the criteria of exclusive allocation (Guaranteed QoS, whole integer CPU requests). you should also configure the mandatory minimum amount of system CPUs into --reserved-cpus: this will ensure these cores will never be allocated to game server Pods

in addition to this you need to confine your system processes yourself as described above, CPU Manager won't help you with that part (neither would Pooler for that matter). I know Talos and this is exactly why I don't like it, sleek and secure design is fine until you want to do something complex like this. so that part you kinda need to figure out yourself based on your description I would rather try and set the desired system cpuset into the cpuset cgroup of all the different system processes rather than using isolcpus. possible problem with this approach is that 1: you need to do this manually 2: after every system restart unless Talos provides you kind of a "service" API to control the cpuset of the cgroups it creates for the system processes

maxpain commented 2 years ago

without going into much details I think the summary of your requirements are better served by CPU Manager, and not Pooler (1: you want dynamic pools, 2: you dont seem to mind sharing cores between system threads and utility workloads)

you should configure your Kubelet to run CPU Manager with the static profile, and have your game server Pods satisfy the criteria of exclusive allocation (Guaranteed QoS, whole integer CPU requests)

My pods have two containers (the main for game server and sidecar for helper processes), I don't want for them to be on the same CPU thread. And these pods can't have Guaranteed QoS because the main container has resources like this:

resources:
    requests:
        cpu: 1
        memory: 1Gi
    limits:
        cpu: 1
        memory: 1Gi

And sidecar container like this:

resources:
    requests:
        cpu: 10m
        memory: 64Mi

so the QoS of Pod is Burstable, so CPU Manager just ignores this.

Levovar commented 2 years ago

at some point you need to make concessions and accept how the eco-system works tho if you want your requirements to be satisfied :) you can easily put this Pod into Guaranteed, all you need to do is add cpu: 10m and memory: 64mi to the limit portion of your second container (or increase the value of both to a suitable number). once your Pod goes into Guaranteed your first container gets an exclusive CPU, and they wont share anymore

maxpain commented 2 years ago

after every system restart unless Talos provides you kind of a "service" API to control the cpuset of the cgroups it creates for the system processes

I can do this using DaemonSets. It is common approach, by the way.

maxpain commented 2 years ago

at some point you need to make concessions and accept how the eco-system works tho if you want your requirements to be satisfied :) you can easily put this Pod into Guaranteed, all you need to do is add cpu: 10m and memory: 64mi to the limit portion of your second container (or increase the value of both to a suitable number). once your Pod goes into Guaranteed your first container gets an exclusive CPU, and they wont share anymore

Yes, but the problem is that cpu.requests should be whole CPU thread, like 1000, 2000 and so on.

In my case it will be 1010, so CPU Manager ignores it.

Levovar commented 2 years ago

nope, that's not a criteria of being in Guranteed QoS. it is only a criteria of exclusive CPU allocation your Pod is not in Guaranteed because your second container doesnt satisfy the requirement - that is, have both limits and requests defined for both CPU and memory with the same value

maxpain commented 2 years ago

it is only a criteria of exclusive CPU allocation

Yes! This is why I can't use built-in CPU manager and looking for projects like Intel CPU Manager And Nokia CPU Pooler.

Actually I want to pin one specific thread of process from 1st container to exclusive CPU thread

Levovar commented 2 years ago

you can once you understand the intricacies :) resource management is done on the container level in K8s, not on the Pod you are just hamstringing yourself rn cause you are not satisfying one of the criteria of your container1's exclusive CPU allocation, which is that the Pod needs to be in Guaranteed QoS

Levovar commented 2 years ago

(yes it is stupid, but it is what it is)

maxpain commented 2 years ago

Thank you! I will try your recommendations. I think I can make some DaemonSet with a script, which runs every 10 seconds and pins all the system processes to specific cores (using cpuset as I understand).

Regarding --reserved-cpus option. If I understand correctly, I can't run my utility pods on these CPUs, right?

Levovar commented 2 years ago

unfortunately (or in your case I guess fortunately?) you can. the only thing --reserved-cpus ensures is that these system CPUs are never assigned as an exclusive CPU to an exclusive request (in your case your game server Pods). but the "default" cpuset shared by all the non-exclusive containers is not affected by this parameter, and always includes all the CPU cores of the system (minus the exclusively allocated threads)

maxpain commented 2 years ago

Wow, this is what I need, actually :D

Levovar commented 2 years ago

ye in my field this is kind of detrimental (consider the case: a simple container with no limit can eat away all the CPU time from things like interrupt processing, I/O handling etc), but I'm happy it works for at least somebody :)

maxpain commented 2 years ago

The problem is that I need to place only one main thread of a game server on an exclusive CPU. Other sub-threads should be placed on infra CPU cores.

Levovar commented 2 years ago

the only solution for that is re-architecting your process structure and placing your "sub-threads" into a different container as discussed above

if you can't / don't want to do that then you have to accept some kind of drawbacks in any case:

CPU Manager - 1: you ask for two exclusive CPUs per game server and taskset your sub threads to CPU 2. Guaranteed latency but you waste a core per server
CPU Manager - 2: try playing with priorities and stuff. no wasted CPU, best only best effort latency for the gameserver as everything is running on infra cores
CPU-Pooler: combined exclusive+shared pool allocation to same container, sub-threads scheduled to shared resources. Your exclusive cores are not dynamically allocated (wasted CPU during your off-peak hours), your shared threads are sharing CPU resources with each other but not with system processes

kind of a pick your poison situation

nokia / CPU-Pooler

System processes/daemons #76