nokia / CPU-Pooler

A Device Plugin for Kubernetes, which exposes the CPU cores as consumable Devices to the Kubernetes scheduler.
BSD 3-Clause "New" or "Revised" License
93 stars 22 forks source link

Introducing CPU socket alignment capability by adding support for K8s TopologyManager #34

Closed Levovar closed 4 years ago

Levovar commented 5 years ago

Solves https://github.com/nokia/CPU-Pooler/issues/24.

And here we are, finally coming to a full circle. The whole reason of "exploiting" the DPAPI from the very beginning now bears its fruit - with a mere ~100 lines of code we introduce native CPU socket alignment capability to CPU-Pooler by simply implementing the TopologyManager related DPAPI changes introduced with K8s 1.16.

We only report the socket ID for devices belonging to the exclusive pool. The socket information is parsed from the output of the " lscpu -p="cpu,node" -J" command

Levovar commented 5 years ago

I can prob only test it next week, cause I'm on holiday from Wednesday to Monday

Levovar commented 4 years ago

@TimoLindqvist out of WIP because it works! :) On a 2 socket system with the PF on node1: [root@compute-1 cloudadmin]# cat /sys/class/net/ens2f1/device/numa_node 0

and with CPUs distributed between nodes: [cloudadmin@compute-1 ~]$ lscpu -p=cpu,node 0,0 1,0 2,0 3,0 4,0 5,0 6,0 7,0 8,0 9,0 10,0 11,0 12,0 13,0 14,1 15,1 16,1 17,1 18,1 19,1 20,1 21,1 22,1 23,1 24,1 25,1 26,1 27,1 28,0 29,0 30,0 31,0 32,0 33,0 34,0 35,0 36,0 37,0 38,0 39,0 40,0 41,0 42,1 43,1 44,1 45,1 46,1 47,1 48,1 49,1 50,1 51,1 52,1 53,1 54,1 55,1

I instantiated following Pod a couple of times: apiVersion: v1 kind: Pod metadata: name: sriov-pod labels: env: test annotations: danm.k8s.io/interfaces: | [ {"clusterNetwork":"default", "ip":"dynamic"}, {"clusterNetwork":"sriov", "ip":"none"} ] spec: containers:

And the 5 exclusive CPU cores always come from first NUMA node: [cloudadmin@compute-1 ~]$ docker exec -ti b53fe83a4a5c sh / # cat /proc/1/status | grep -i cpus_allowed Cpus_allowed: 000140,00001c00 Cpus_allowed_list: 10-12,38,40

We need Kubernetes 1.17 for this to work (I tested with release candidate 1), but from our side we are NUMA capable :)

Levovar commented 4 years ago

README updated PR is final from my side

TimoLindqvist commented 4 years ago

Looks basically ok.

Should we have some unit tests. I mean testing different topologies with real hardware is a bit problematic but we could get the topology info as lscpu output and create unit tests for those ?

Could we support earlier k8s (at least 1.16) version together with the latest (1.17) that supports Topology Manager ? With older version we don't of course have the NUMA alignment functionality.

Levovar commented 4 years ago

@TimoLindqvist I updated the README, pls re-check!

UT: at one point we are planning to, but I don't have capacity for it now. and Tamás is still doing product work

K8s compatibility: no it unfortunately cannot work with 1.16, because it needs the enhancement I asked from the community on top of alpha :) https://github.com/kubernetes/kubernetes/pull/83492 Based on the release notes I checked it is only part of 1.17

TimoLindqvist commented 4 years ago

Ok, this ready to be merged then. I'm just thinking about support for older k8s versions if bug fixes or some other features are needed but 1.17 is not an option yet. Separate branch then ?

Levovar commented 4 years ago

I mean Pooler itself works with all K8s versions. Only your exclusive CPU pools won't be topology aligned I don't see a reason for branching, the implementation IMO is backward compatible

Levovar commented 4 years ago

To clarify: I had a 4 node setup, and I only updated the worker to K8s 1.17 :) my other 3 nodes were 1.16. DP works perfectly on all nodes! Just prior to 1.17 Topology Manager won't even invoke topology alignment cause the Pods asking for CPU-Pooler managed resources will never belong to the Guaranteed QoS class, as they don't ask default, and exclusive CPU resources at the same time