A dedicated_cores pod do not have an exclusive CPU

flpanbin commented 4 weeks ago

What happened?

I created a dedicated_cores pod, but it not have an exclusive CPU.

dedicated_cores_pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    "katalyst.kubewharf.io/qos_level": dedicated_cores
    "katalyst.kubewharf.io/memory_enhancement": '{
      "numa_binding": "true",
      "numa_exclusive": "true"
    }'
  name: numa-dedicated-normal-pod
  namespace: default
spec:
  containers:
    - name: stress
      image: joedval/stress:latest
      command:
        - stress
        - -c
        - "1"
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: "1"
          memory: 1Gi
        limits:
          cpu: "1"
          memory: 1Gi
  schedulerName: katalyst-scheduler

check the cpuset for the pod:

root@ubuntu:~# ./get_cpuset.sh numa-dedicated-normal-pod
Wed 05 Jun 2024 03:03:08 AM UTC
0-47

What did you expect to happen?

The dedicated_cores pod should be allocated an exclusive CPU core.

How can we reproduce it (as minimally and precisely as possible)?

Create a dedicated cores pod, like dedicated_cores_pod.yaml, as mentioned above.

Software version

``` root@ubuntu:~/katalyst/examples# helm list -A NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION katalyst-colocation katalyst-system 1 2024-05-24 09:28:44.44903291 +0000 UTC deployed katalyst-colocation-orm-0.5.0 v0.5.0 malachite malachite-system 1 2024-05-24 09:16:19.208333849 +0000 UTC deployed malachite-0.1.0 0.1.0 ```

WangZzzhe commented 4 weeks ago

@flpanbin 请问是在什么模式下运行的？（QRM or ORM）另外可以提供下节点的NUMA信息吗？

"katalyst.kubewharf.io/memory_enhancement": '{
      "numa_binding": "true",
      "numa_exclusive": "true"
    }'

在numa_exclusive = true的情况下pod会独占整个NUMA

flpanbin commented 4 weeks ago

@flpanbin 请问是在什么模式下运行的？（QRM or ORM）另外可以提供下节点的NUMA信息吗？
"katalyst.kubewharf.io/memory_enhancement": '{
      "numa_binding": "true",
      "numa_exclusive": "true"
    }'
在numa_exclusive = true的情况下pod会独占整个NUMA

QRM模式下运行的，节点的numa信息：

root@ubuntu:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 32145 MB
node 0 free: 30247 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 32248 MB
node 1 free: 30001 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

flpanbin commented 4 weeks ago

@WangZzzhe 看了下对应节点的日志，有错误提示：err: rpc error: code = Unknown desc = hint is empty，不知道和这个是否有关系。

I0606 01:28:02.100303       1 manager.go:417] [ORM] addContainer, pod: numa-dedicated-normal-pod, container: stress
W0606 01:28:02.100337       1 manager.go:488] [ORM] pod: default/numa-dedicated-normal-pod; container: stress allocate resource: cpu without numa nodes affinity
I0606 01:28:02.101041       1 policy.go:683] "[katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).Allocate] called" podNamespace="default" podName="numa-dedicated-normal-pod" containerName="stress" podType="" podRole="" containerType="MAIN" qosLevel="dedicated_cores" numCPUsInt=1 numCPUsFloat64=1 isDebugPod=false
E0606 01:28:02.101136       1 policy_allocation_handlers.go:304] "[katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).dedicatedCoresWithNUMABindingAllocationHandler] unable to allocate CPUs" err="hint is empty" podNamespace="default" podName="numa-dedicated-normal-pod" containerName="stress" numCPUsInt=1 numCPUsFloat64=1
E0606 01:28:02.101727       1 manager.go:501] [ORM] addContainer allocate fail, pod numa-dedicated-normal-pod, container stress, err: rpc error: code = Unknown desc = hint is empty
E0606 01:28:02.101786       1 manager.go:605] [ORM] re addContainer fail, pod numa-dedicated-normal-pod container stress, err: [ORM] addContainer allocate fail, pod numa-dedicated-normal-pod, container stress, err: rpc error: code = Unknown desc = hint is empty
I0606 01:28:02.102672       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/.3396218561\": CREATE"
I0606 01:28:02.102733       1 plugin_watcher.go:174] "Ignoring file (starts with '.')" path=".3396218561"
I0606 01:28:02.105100       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/kubelet_qrm_checkpoint\": CREATE"
I0606 01:28:02.105244       1 plugin_watcher.go:184] "Ignoring non socket file" path="kubelet_qrm_checkpoint"
I0606 01:28:02.216433       1 provisioner.go:84] [malachite] heartbeat
I0606 01:28:02.217117       1 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: katalyst-agent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.6.202.153:10250/stats/summary?timeout=10s'
I0606 01:28:02.218075       1 round_trippers.go:553] GET https://10.6.202.153:10250/stats/summary?timeout=10s 403 Forbidden in 0 milliseconds
I0606 01:28:02.218100       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 0 ms Duration 0 ms
I0606 01:28:02.218192       1 round_trippers.go:577] Response Headers:
I0606 01:28:02.218217       1 round_trippers.go:580]     Content-Type: text/plain; charset=utf-8
I0606 01:28:02.218231       1 round_trippers.go:580]     Content-Length: 114
I0606 01:28:02.218242       1 round_trippers.go:580]     Date: Thu, 06 Jun 2024 01:28:02 GMT
I0606 01:28:02.218272       1 request.go:1154] Response Body: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)
E0606 01:28:02.218319       1 provisioner.go:65] failed to update stats/summary from kubelet: "failed to get kubelet config for summary api, error: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)"
I0606 01:28:02.234127       1 pod.go:206] get metric mem.usage.container for pod numa-dedicated-normal-pod, collect time 2024-06-06 01:28:01 +0000 UTC, left len 3
I0606 01:28:02.234210       1 pod.go:206] get metric cpu.load.1min.container for pod numa-dedicated-normal-pod, collect time 2024-06-06 01:28:01 +0000 UTC, left len 2
I0606 01:28:02.234227       1 pod.go:206] get metric cpu.usage.container for pod numa-dedicated-normal-pod, collect time 2024-06-06 01:28:01 +0000 UTC, left len 1

flpanbin commented 4 weeks ago

@WangZzzhe agent的参数：

template:
    metadata:
      annotations:
        katalyst.kubewharf.io/qos_level: system_cores
      creationTimestamp: null
      labels:
        app: katalyst-agent
        app.kubernetes.io/instance: katalyst-colocation
        app.kubernetes.io/name: katalyst-agent
    spec:
      containers:
      - args:
        - --plugin-registration-dir=/var/lib/katalyst/plugin-socks
        - --checkpoint-manager-directory=/var/lib/katalyst/plugin-checkpoint
        - --locking-file=/tmp/katalyst_colocation_katalyst_agent_lock
        - --node-name=$(MY_NODE_NAME)
        - --node-address=$(MY_NODE_ADDRESS)
        - --agents=*
        - --cpu-resource-plugin-advisor=true
        - --enable-cpu-pressure-eviction=true
        - --enable-kubelet-secure-port=true
        - --enable-reclaim=true
        - --enable-report-topology-policy=true
        - --eviction-plugins=*
        - --memory-resource-plugin-advisor=true
        - --orm-devices-provider=kubelet
        - --orm-kubelet-pod-resources-endpoints=/var/lib/kubelet/pod-resources/kubelet.sock
        - --orm-resource-names-map=resource.katalyst.kubewharf.io/reclaimed_millicpu=cpu,resource.katalyst.kubewharf.io/reclaimed_memory=memory
        - --pod-resources-server-endpoint=/var/lib/katalyst/pod-resources/kubelet.sock
        - --qrm-socket-dirs=/var/lib/katalyst/plugin-socks
        - --topology-policy-name=none
        - --v=9
        command:
        - katalyst-agent

系统版本信息：

root@ubuntu:~# uname -a
Linux ubuntu 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

WangZzzhe commented 4 weeks ago

@flpanbin 修改 --topology-policy-name=none 为 --topology-policy-name=best-effort 再尝试下 none policy下不会进行资源分配

flpanbin commented 4 weeks ago

@flpanbin 修改 --topology-policy-name=none 为 --topology-policy-name=best-effort 再尝试下 none policy下不会进行资源分配

@WangZzzhe 设置了 --topology-policy-name=best-effort 还是不行

root@ubuntu:~/katalyst# ps -ef | grep katalyst-agent
root     2423499 2423425 15 Jun05 ?        01:31:28 katalyst-metric --leader-elect-resource-name=katalyst-colocation-katalyst-metric --leader-elect-resource-namespace=katalyst-system --collector-pod-selector=app=katalyst-agent
root     2604106 2604047  3 02:59 ?        00:00:05 katalyst-agent --plugin-registration-dir=/var/lib/katalyst/plugin-socks --checkpoint-manager-directory=/var/lib/katalyst/plugin-checkpoint --locking-file=/tmp/katalyst_colocation_katalyst_agent_lock --node-name=node1 --node-address=10.6.202.152 --agents=* --cpu-resource-plugin-advisor=true --enable-cpu-pressure-eviction=true --enable-kubelet-secure-port=true --enable-reclaim=true --enable-report-topology-policy=true --eviction-plugins=* --memory-resource-plugin-advisor=true --orm-devices-provider=kubelet --orm-kubelet-pod-resources-endpoints=/var/lib/kubelet/pod-resources/kubelet.sock --orm-resource-names-map=resource.katalyst.kubewharf.io/reclaimed_millicpu=cpu,resource.katalyst.kubewharf.io/reclaimed_memory=memory --pod-resources-server-endpoint=/var/lib/katalyst/pod-resources/kubelet.sock --qrm-socket-dirs=/var/lib/katalyst/plugin-socks --topology-policy-name=best-effort --v=9

flpanbin commented 4 weeks ago

@WangZzzhe 请问这个有什么定位思路吗？还需要提供哪些信息来定位问题呢？我按照文档安装的 Kubewharf enhanced kubernetes, 文档提供的 helm 命令安装的相关组件： helm install katalyst-colocation -n katalyst-system --create-namespace kubewharf/katalyst-colocation

root@ubuntu:~/katalyst/examples# kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
10.6.202.151   Ready    control-plane   13d   v1.24.6-kubewharf.8
node1          Ready    <none>          13d   v1.24.6-kubewharf.8
node2          Ready    <none>          13d   v1.24.6-kubewharf.8
root@ubuntu:~/katalyst/examples# containerd -v
containerd github.com/containerd/containerd v1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d

WangZzzhe commented 4 weeks ago

@flpanbin 根据katalyst-agent的启动参数，应该是运行了ORM模式，kubelet中的qosResourceManager是没有生效的。从KCNR的信息看这个pod已经分配成功了，独占了一个NUMA的24C

flpanbin commented 4 weeks ago

@flpanbin 根据katalyst-agent的启动参数，应该是运行了ORM模式，kubelet中的qosResourceManager是没有生效的。从KCNR的信息看这个pod已经分配成功了，独占了一个NUMA的24C

但是为什么我查看容器的 cpuset 和还是显示的 0-47呢，按理说应该是分配了24core

root@ubuntu:~/katalyst# cat /sys/fs/cgroup/cpuset/kubepods/podb077c70f-6103-43f9-ba77-64d67ec736ba/cpuset.cpus
0-47
root@ubuntu:~/katalyst# cat /sys/fs/cgroup/cpuset/kubepods/podb077c70f-6103-43f9-ba77-64d67ec736ba/80c10c4e45cbe275bafac9cf8b74ef72e6c0b9565d33389fce86f6e7c3737843/cpuset.cpus
0-47
root@ubuntu:~/katalyst# cat /sys/fs/cgroup/cpuset/kubepods/podb077c70f-6103-43f9-ba77-64d67ec736ba/604fc985574b2ff117630b699dd4e9ac77c95d316b6ece904385694db0090268/cpuset.cpus
0-47

WangZzzhe commented 4 weeks ago

@flpanbin

观察 /var/lib/katalyst/qrm_advisor/cpu_plugin_state文件中是否有dedicated pod的分配信息
如果有，查找日志中是否有 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/orm/manager.go#L517 相关流程的错误日志
如果没有分配信息，查找日志中是否有 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/qrm-plugins/cpu/dynamicpolicy/policy_allocation_handlers.go#L275 相关流程的分配日志或者错误日志

flpanbin commented 4 weeks ago

@WangZzzhe

syncContainer 没有看到有错误日志，但是看有日志显示分配的结果是 0-47：

cpu for pod: default/dedicated-normal-pod2, container: stress is {CpusetCpus false true 48 0-47 map[] map[] nil {} 0}

I0606 06:28:10.191773       1 manager.go:536] [ORM] reconcile...
I0606 06:28:10.193289       1 policy.go:406] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).GetResourcesAllocation] called
I0606 06:28:10.194287       1 manager.go:550] [ORM] skip getResourceAllocation of resource: resource.katalyst.kubewharf.io/net_bandwidth, because plugin needn't reconciling
I0606 06:28:10.194349       1 manager.go:519] [ORM] syncContainer, pod: katalyst-colocation-katalyst-agent-h6gj8, container: katalyst-agent
I0606 06:28:10.194375       1 manager.go:522] got pod katalyst-colocation-katalyst-agent-h6gj8 container katalyst-agent resources nil
I0606 06:28:10.194450       1 manager.go:519] [ORM] syncContainer, pod: katalyst-colocation-katalyst-metric-85c47ff4bf-7lw4g, container: katalyst-metric
I0606 06:28:10.194472       1 manager.go:522] got pod katalyst-colocation-katalyst-metric-85c47ff4bf-7lw4g container katalyst-metric resources nil
I0606 06:28:10.194550       1 manager.go:630] [ORM] allocation information for resources memory - accompanying resource: memory for pod: default/dedicated-normal-pod2, container: stress is {CpusetMems false true 6.7522060288e+10 0-1 map[] map[] nil {} 0}
I0606 06:28:10.194723       1 manager.go:630] [ORM] allocation information for resources cpu - accompanying resource: cpu for pod: default/dedicated-normal-pod2, container: stress is {CpusetCpus false true 48 0-47 map[] map[] nil {} 0}
I0606 06:28:10.194822       1 manager.go:519] [ORM] syncContainer, pod: dedicated-normal-pod2, container: stress
I0606 06:28:10.195592       1 manager.go:519] [ORM] syncContainer, pod: malachite-fvp5p, container: malachite
I0606 06:28:10.195679       1 manager.go:522] got pod malachite-fvp5p container malachite resources nil
I0606 06:28:10.195707       1 manager.go:654] [ORM] map resource name: resource.katalyst.kubewharf.io/reclaimed_millicpu to cpu
I0606 06:28:10.195784       1 manager.go:654] [ORM] map resource name: resource.katalyst.kubewharf.io/reclaimed_memory to memory
I0606 06:28:10.195804       1 manager.go:630] [ORM] allocation information for resources memory - accompanying resource: memory for pod: default/reclaimed-large-pod-node1, container: stress is {CpusetMems false true 0 0-1 map[] map[] nil {} 0}
I0606 06:28:10.195919       1 manager.go:654] [ORM] map resource name: resource.katalyst.kubewharf.io/reclaimed_millicpu to cpu
I0606 06:28:10.195988       1 manager.go:630] [ORM] allocation information for resources cpu - accompanying resource: cpu for pod: default/reclaimed-large-pod-node1, container: stress is {CpusetCpus false true 40 4-23,28-47 map[] map[] nil {} 0}
I0606 06:28:10.196018       1 manager.go:519] [ORM] syncContainer, pod: reclaimed-large-pod-node1, container: stress
I0606 06:28:10.197415       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/.2613201109\": CREATE"
I0606 06:28:10.197480       1 plugin_watcher.go:174] "Ignoring file (starts with '.')" path=".2613201109"
I0606 06:28:10.199087       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/kubelet_qrm_checkpoint\": CREATE"
I0606 06:28:10.199115       1 plugin_watcher.go:184] "Ignoring non socket file" path="kubelet_qrm_checkpoint"
I0606 06:28:10.287012       1 manager.go:312] genericSync
I0606 06:28:10.288051       1 manager.go:387] "GetReportContent" costs="928.548µs" pluginName="headroom-reporter-plugin"
I0606 06:28:10.288224       1 manager.go:387] "GetReportContent" costs="121.926µs" pluginName="system-reporter-plugin"
I0606 06:28:10.288510       1 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: katalyst-agent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.6.202.152:10250/pods?timeout=10s'

flpanbin commented 4 weeks ago

@WangZzzhe

/var/lib/katalyst/qrm_advisor/cpu_plugin_state 显示分配了多次，最后一次的分配结果是`"allocation_result": "0-47",

{
  "policyName": "dynamic",
  "machineState": {
    "0": {
      "default_cpuset": "",
      "allocated_cpuset": "0-23",
      "pod_entries": {
        "754dc697-7658-47cc-8faa-0280c2931925": {
          "stress": {
            "pod_uid": "754dc697-7658-47cc-8faa-0280c2931925",
            "pod_namespace": "default",
            "pod_name": "dedicated-normal-pod2",
            "container_name": "stress",
            "container_type": "MAIN",
            "owner_pool_name": "dedicated",
            "allocation_result": "0-23",
            "original_allocation_result": "0-23",
            "topology_aware_assignments": {
              "0": "0-23"
            },
            "original_topology_aware_assignments": {
              "0": "0-23"
            },
            "init_timestamp": "2024-06-06 05:09:27.902883009 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores",
              "numa_binding": "true",
--
            "original_topology_aware_assignments": {
              "0": "0-19"
            },
            "init_timestamp": "2024-06-06 05:58:40.67405812 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "qosLevel": "reclaimed_cores",
            "request_quantity": 42000
          }
        }
      }
    },
    "1": {
      "default_cpuset": "",
      "allocated_cpuset": "24-47",
      "pod_entries": {
        "754dc697-7658-47cc-8faa-0280c2931925": {
          "stress": {
            "pod_uid": "754dc697-7658-47cc-8faa-0280c2931925",
            "pod_namespace": "default",
            "pod_name": "dedicated-normal-pod2",
            "container_name": "stress",
            "container_type": "MAIN",
            "owner_pool_name": "dedicated",
            "allocation_result": "24-47",
            "original_allocation_result": "24-47",
            "topology_aware_assignments": {
              "1": "24-47"
            },
            "original_topology_aware_assignments": {
              "1": "24-47"
            },
            "init_timestamp": "2024-06-06 05:09:27.902883009 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores",
              "numa_binding": "true",
--
              "1": "24-43"
            },
            "original_topology_aware_assignments": {
              "1": "24-43"
            },
            "init_timestamp": "2024-06-06 05:58:40.67405812 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "qosLevel": "reclaimed_cores",
            "request_quantity": 42000
          }
        }
      }
    }
  },
  "pod_entries": {
    "754dc697-7658-47cc-8faa-0280c2931925": {
      "stress": {
        "pod_uid": "754dc697-7658-47cc-8faa-0280c2931925",
        "pod_namespace": "default",
        "pod_name": "dedicated-normal-pod2",
        "container_name": "stress",
        "container_type": "MAIN",
        "owner_pool_name": "dedicated",
        "allocation_result": "0-47",
        "original_allocation_result": "0-47",
        "topology_aware_assignments": {
          "0": "0-23",
          "1": "24-47"
        },
        "original_topology_aware_assignments": {
          "0": "0-23",
          "1": "24-47"
        },
        "init_timestamp": "2024-06-06 05:09:27.902883009 +0000 UTC",
        "labels": {
          "katalyst.kubewharf.io/qos_level": "dedicated_cores"
        },
        "annotations": {
...

kubewharf / katalyst-core