kubewharf / katalyst-core

Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components
Apache License 2.0
389 stars 91 forks source link

A dedicated_cores pod do not have an exclusive CPU #608

Open flpanbin opened 4 weeks ago

flpanbin commented 4 weeks ago

What happened?

I created a dedicated_cores pod, but it not have an exclusive CPU.

dedicated_cores_pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    "katalyst.kubewharf.io/qos_level": dedicated_cores
    "katalyst.kubewharf.io/memory_enhancement": '{
      "numa_binding": "true",
      "numa_exclusive": "true"
    }'
  name: numa-dedicated-normal-pod
  namespace: default
spec:
  containers:
    - name: stress
      image: joedval/stress:latest
      command:
        - stress
        - -c
        - "1"
      imagePullPolicy: IfNotPresent
      resources:
        requests:
          cpu: "1"
          memory: 1Gi
        limits:
          cpu: "1"
          memory: 1Gi
  schedulerName: katalyst-scheduler

check the cpuset for the pod:

root@ubuntu:~# ./get_cpuset.sh numa-dedicated-normal-pod
Wed 05 Jun 2024 03:03:08 AM UTC
0-47

What did you expect to happen?

The dedicated_cores pod should be allocated an exclusive CPU core.

How can we reproduce it (as minimally and precisely as possible)?

Create a dedicated cores pod, like dedicated_cores_pod.yaml, as mentioned above.

Software version

``` root@ubuntu:~/katalyst/examples# helm list -A NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION katalyst-colocation katalyst-system 1 2024-05-24 09:28:44.44903291 +0000 UTC deployed katalyst-colocation-orm-0.5.0 v0.5.0 malachite malachite-system 1 2024-05-24 09:16:19.208333849 +0000 UTC deployed malachite-0.1.0 0.1.0 ```
WangZzzhe commented 4 weeks ago

@flpanbin 请问是在什么模式下运行的?(QRM or ORM) 另外可以提供下节点的NUMA信息吗?

"katalyst.kubewharf.io/memory_enhancement": '{
      "numa_binding": "true",
      "numa_exclusive": "true"
    }'

numa_exclusive = true的情况下pod会独占整个NUMA

flpanbin commented 4 weeks ago

@flpanbin 请问是在什么模式下运行的?(QRM or ORM) 另外可以提供下节点的NUMA信息吗?

"katalyst.kubewharf.io/memory_enhancement": '{
      "numa_binding": "true",
      "numa_exclusive": "true"
    }'

numa_exclusive = true的情况下pod会独占整个NUMA

QRM模式下运行的,节点的numa信息:

root@ubuntu:~# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 32145 MB
node 0 free: 30247 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 32248 MB
node 1 free: 30001 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
flpanbin commented 4 weeks ago

@WangZzzhe 看了下对应节点的日志,有错误提示:err: rpc error: code = Unknown desc = hint is empty,不知道和这个是否有关系。

I0606 01:28:02.100303       1 manager.go:417] [ORM] addContainer, pod: numa-dedicated-normal-pod, container: stress
W0606 01:28:02.100337       1 manager.go:488] [ORM] pod: default/numa-dedicated-normal-pod; container: stress allocate resource: cpu without numa nodes affinity
I0606 01:28:02.101041       1 policy.go:683] "[katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).Allocate] called" podNamespace="default" podName="numa-dedicated-normal-pod" containerName="stress" podType="" podRole="" containerType="MAIN" qosLevel="dedicated_cores" numCPUsInt=1 numCPUsFloat64=1 isDebugPod=false
E0606 01:28:02.101136       1 policy_allocation_handlers.go:304] "[katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).dedicatedCoresWithNUMABindingAllocationHandler] unable to allocate CPUs" err="hint is empty" podNamespace="default" podName="numa-dedicated-normal-pod" containerName="stress" numCPUsInt=1 numCPUsFloat64=1
E0606 01:28:02.101727       1 manager.go:501] [ORM] addContainer allocate fail, pod numa-dedicated-normal-pod, container stress, err: rpc error: code = Unknown desc = hint is empty
E0606 01:28:02.101786       1 manager.go:605] [ORM] re addContainer fail, pod numa-dedicated-normal-pod container stress, err: [ORM] addContainer allocate fail, pod numa-dedicated-normal-pod, container stress, err: rpc error: code = Unknown desc = hint is empty
I0606 01:28:02.102672       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/.3396218561\": CREATE"
I0606 01:28:02.102733       1 plugin_watcher.go:174] "Ignoring file (starts with '.')" path=".3396218561"
I0606 01:28:02.105100       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/kubelet_qrm_checkpoint\": CREATE"
I0606 01:28:02.105244       1 plugin_watcher.go:184] "Ignoring non socket file" path="kubelet_qrm_checkpoint"
I0606 01:28:02.216433       1 provisioner.go:84] [malachite] heartbeat
I0606 01:28:02.217117       1 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: katalyst-agent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.6.202.153:10250/stats/summary?timeout=10s'
I0606 01:28:02.218075       1 round_trippers.go:553] GET https://10.6.202.153:10250/stats/summary?timeout=10s 403 Forbidden in 0 milliseconds
I0606 01:28:02.218100       1 round_trippers.go:570] HTTP Statistics: GetConnection 0 ms ServerProcessing 0 ms Duration 0 ms
I0606 01:28:02.218192       1 round_trippers.go:577] Response Headers:
I0606 01:28:02.218217       1 round_trippers.go:580]     Content-Type: text/plain; charset=utf-8
I0606 01:28:02.218231       1 round_trippers.go:580]     Content-Length: 114
I0606 01:28:02.218242       1 round_trippers.go:580]     Date: Thu, 06 Jun 2024 01:28:02 GMT
I0606 01:28:02.218272       1 request.go:1154] Response Body: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)
E0606 01:28:02.218319       1 provisioner.go:65] failed to update stats/summary from kubelet: "failed to get kubelet config for summary api, error: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)"
I0606 01:28:02.234127       1 pod.go:206] get metric mem.usage.container for pod numa-dedicated-normal-pod, collect time 2024-06-06 01:28:01 +0000 UTC, left len 3
I0606 01:28:02.234210       1 pod.go:206] get metric cpu.load.1min.container for pod numa-dedicated-normal-pod, collect time 2024-06-06 01:28:01 +0000 UTC, left len 2
I0606 01:28:02.234227       1 pod.go:206] get metric cpu.usage.container for pod numa-dedicated-normal-pod, collect time 2024-06-06 01:28:01 +0000 UTC, left len 1
flpanbin commented 4 weeks ago

@WangZzzhe agent的参数:

template:
    metadata:
      annotations:
        katalyst.kubewharf.io/qos_level: system_cores
      creationTimestamp: null
      labels:
        app: katalyst-agent
        app.kubernetes.io/instance: katalyst-colocation
        app.kubernetes.io/name: katalyst-agent
    spec:
      containers:
      - args:
        - --plugin-registration-dir=/var/lib/katalyst/plugin-socks
        - --checkpoint-manager-directory=/var/lib/katalyst/plugin-checkpoint
        - --locking-file=/tmp/katalyst_colocation_katalyst_agent_lock
        - --node-name=$(MY_NODE_NAME)
        - --node-address=$(MY_NODE_ADDRESS)
        - --agents=*
        - --cpu-resource-plugin-advisor=true
        - --enable-cpu-pressure-eviction=true
        - --enable-kubelet-secure-port=true
        - --enable-reclaim=true
        - --enable-report-topology-policy=true
        - --eviction-plugins=*
        - --memory-resource-plugin-advisor=true
        - --orm-devices-provider=kubelet
        - --orm-kubelet-pod-resources-endpoints=/var/lib/kubelet/pod-resources/kubelet.sock
        - --orm-resource-names-map=resource.katalyst.kubewharf.io/reclaimed_millicpu=cpu,resource.katalyst.kubewharf.io/reclaimed_memory=memory
        - --pod-resources-server-endpoint=/var/lib/katalyst/pod-resources/kubelet.sock
        - --qrm-socket-dirs=/var/lib/katalyst/plugin-socks
        - --topology-policy-name=none
        - --v=9
        command:
        - katalyst-agent

系统版本信息:

root@ubuntu:~# uname -a
Linux ubuntu 5.4.0-182-generic #202-Ubuntu SMP Fri Apr 26 12:29:36 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
WangZzzhe commented 4 weeks ago

@flpanbin 修改 --topology-policy-name=none--topology-policy-name=best-effort 再尝试下 none policy下不会进行资源分配

flpanbin commented 4 weeks ago

@flpanbin 修改 --topology-policy-name=none--topology-policy-name=best-effort 再尝试下 none policy下不会进行资源分配

@WangZzzhe 设置了 --topology-policy-name=best-effort 还是不行

root@ubuntu:~/katalyst# ps -ef | grep katalyst-agent
root     2423499 2423425 15 Jun05 ?        01:31:28 katalyst-metric --leader-elect-resource-name=katalyst-colocation-katalyst-metric --leader-elect-resource-namespace=katalyst-system --collector-pod-selector=app=katalyst-agent
root     2604106 2604047  3 02:59 ?        00:00:05 katalyst-agent --plugin-registration-dir=/var/lib/katalyst/plugin-socks --checkpoint-manager-directory=/var/lib/katalyst/plugin-checkpoint --locking-file=/tmp/katalyst_colocation_katalyst_agent_lock --node-name=node1 --node-address=10.6.202.152 --agents=* --cpu-resource-plugin-advisor=true --enable-cpu-pressure-eviction=true --enable-kubelet-secure-port=true --enable-reclaim=true --enable-report-topology-policy=true --eviction-plugins=* --memory-resource-plugin-advisor=true --orm-devices-provider=kubelet --orm-kubelet-pod-resources-endpoints=/var/lib/kubelet/pod-resources/kubelet.sock --orm-resource-names-map=resource.katalyst.kubewharf.io/reclaimed_millicpu=cpu,resource.katalyst.kubewharf.io/reclaimed_memory=memory --pod-resources-server-endpoint=/var/lib/katalyst/pod-resources/kubelet.sock --qrm-socket-dirs=/var/lib/katalyst/plugin-socks --topology-policy-name=best-effort --v=9

image

flpanbin commented 4 weeks ago

@WangZzzhe 请问这个有什么定位思路吗?还需要提供哪些信息来定位问题呢? 我按照文档安装的 Kubewharf enhanced kubernetes, 文档提供的 helm 命令安装的相关组件: helm install katalyst-colocation -n katalyst-system --create-namespace kubewharf/katalyst-colocation

root@ubuntu:~/katalyst/examples# kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
10.6.202.151   Ready    control-plane   13d   v1.24.6-kubewharf.8
node1          Ready    <none>          13d   v1.24.6-kubewharf.8
node2          Ready    <none>          13d   v1.24.6-kubewharf.8
root@ubuntu:~/katalyst/examples# containerd -v
containerd github.com/containerd/containerd v1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d
WangZzzhe commented 4 weeks ago

@flpanbin 根据katalyst-agent的启动参数,应该是运行了ORM模式,kubelet中的qosResourceManager是没有生效的。 image 从KCNR的信息看这个pod已经分配成功了,独占了一个NUMA的24C

flpanbin commented 4 weeks ago

@flpanbin 根据katalyst-agent的启动参数,应该是运行了ORM模式,kubelet中的qosResourceManager是没有生效的。 image 从KCNR的信息看这个pod已经分配成功了,独占了一个NUMA的24C

但是为什么我查看容器的 cpuset 和 还是显示的 0-47呢,按理说应该是分配了24core

root@ubuntu:~/katalyst# cat /sys/fs/cgroup/cpuset/kubepods/podb077c70f-6103-43f9-ba77-64d67ec736ba/cpuset.cpus
0-47
root@ubuntu:~/katalyst# cat /sys/fs/cgroup/cpuset/kubepods/podb077c70f-6103-43f9-ba77-64d67ec736ba/80c10c4e45cbe275bafac9cf8b74ef72e6c0b9565d33389fce86f6e7c3737843/cpuset.cpus
0-47
root@ubuntu:~/katalyst# cat /sys/fs/cgroup/cpuset/kubepods/podb077c70f-6103-43f9-ba77-64d67ec736ba/604fc985574b2ff117630b699dd4e9ac77c95d316b6ece904385694db0090268/cpuset.cpus
0-47
WangZzzhe commented 4 weeks ago

@flpanbin

  1. 观察 /var/lib/katalyst/qrm_advisor/cpu_plugin_state文件中是否有dedicated pod的分配信息
  2. 如果有,查找日志中是否有 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/orm/manager.go#L517 相关流程的错误日志
  3. 如果没有分配信息,查找日志中是否有 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/qrm-plugins/cpu/dynamicpolicy/policy_allocation_handlers.go#L275 相关流程的分配日志或者错误日志
flpanbin commented 4 weeks ago

@WangZzzhe

syncContainer 没有看到有错误日志,但是看有日志显示分配的结果是 0-47:

cpu for pod: default/dedicated-normal-pod2, container: stress is {CpusetCpus false true 48 0-47 map[] map[] nil {} 0}

I0606 06:28:10.191773       1 manager.go:536] [ORM] reconcile...
I0606 06:28:10.193289       1 policy.go:406] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).GetResourcesAllocation] called
I0606 06:28:10.194287       1 manager.go:550] [ORM] skip getResourceAllocation of resource: resource.katalyst.kubewharf.io/net_bandwidth, because plugin needn't reconciling
I0606 06:28:10.194349       1 manager.go:519] [ORM] syncContainer, pod: katalyst-colocation-katalyst-agent-h6gj8, container: katalyst-agent
I0606 06:28:10.194375       1 manager.go:522] got pod katalyst-colocation-katalyst-agent-h6gj8 container katalyst-agent resources nil
I0606 06:28:10.194450       1 manager.go:519] [ORM] syncContainer, pod: katalyst-colocation-katalyst-metric-85c47ff4bf-7lw4g, container: katalyst-metric
I0606 06:28:10.194472       1 manager.go:522] got pod katalyst-colocation-katalyst-metric-85c47ff4bf-7lw4g container katalyst-metric resources nil
I0606 06:28:10.194550       1 manager.go:630] [ORM] allocation information for resources memory - accompanying resource: memory for pod: default/dedicated-normal-pod2, container: stress is {CpusetMems false true 6.7522060288e+10 0-1 map[] map[] nil {} 0}
I0606 06:28:10.194723       1 manager.go:630] [ORM] allocation information for resources cpu - accompanying resource: cpu for pod: default/dedicated-normal-pod2, container: stress is {CpusetCpus false true 48 0-47 map[] map[] nil {} 0}
I0606 06:28:10.194822       1 manager.go:519] [ORM] syncContainer, pod: dedicated-normal-pod2, container: stress
I0606 06:28:10.195592       1 manager.go:519] [ORM] syncContainer, pod: malachite-fvp5p, container: malachite
I0606 06:28:10.195679       1 manager.go:522] got pod malachite-fvp5p container malachite resources nil
I0606 06:28:10.195707       1 manager.go:654] [ORM] map resource name: resource.katalyst.kubewharf.io/reclaimed_millicpu to cpu
I0606 06:28:10.195784       1 manager.go:654] [ORM] map resource name: resource.katalyst.kubewharf.io/reclaimed_memory to memory
I0606 06:28:10.195804       1 manager.go:630] [ORM] allocation information for resources memory - accompanying resource: memory for pod: default/reclaimed-large-pod-node1, container: stress is {CpusetMems false true 0 0-1 map[] map[] nil {} 0}
I0606 06:28:10.195919       1 manager.go:654] [ORM] map resource name: resource.katalyst.kubewharf.io/reclaimed_millicpu to cpu
I0606 06:28:10.195988       1 manager.go:630] [ORM] allocation information for resources cpu - accompanying resource: cpu for pod: default/reclaimed-large-pod-node1, container: stress is {CpusetCpus false true 40 4-23,28-47 map[] map[] nil {} 0}
I0606 06:28:10.196018       1 manager.go:519] [ORM] syncContainer, pod: reclaimed-large-pod-node1, container: stress
I0606 06:28:10.197415       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/.2613201109\": CREATE"
I0606 06:28:10.197480       1 plugin_watcher.go:174] "Ignoring file (starts with '.')" path=".2613201109"
I0606 06:28:10.199087       1 plugin_watcher.go:160] "Handling create event" event="\"/var/lib/katalyst/plugin-socks/kubelet_qrm_checkpoint\": CREATE"
I0606 06:28:10.199115       1 plugin_watcher.go:184] "Ignoring non socket file" path="kubelet_qrm_checkpoint"
I0606 06:28:10.287012       1 manager.go:312] genericSync
I0606 06:28:10.288051       1 manager.go:387] "GetReportContent" costs="928.548µs" pluginName="headroom-reporter-plugin"
I0606 06:28:10.288224       1 manager.go:387] "GetReportContent" costs="121.926µs" pluginName="system-reporter-plugin"
I0606 06:28:10.288510       1 round_trippers.go:466] curl -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: katalyst-agent/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer <masked>" 'https://10.6.202.152:10250/pods?timeout=10s'
flpanbin commented 4 weeks ago

@WangZzzhe

/var/lib/katalyst/qrm_advisor/cpu_plugin_state 显示分配了多次,最后一次的分配结果是`"allocation_result": "0-47",

{
  "policyName": "dynamic",
  "machineState": {
    "0": {
      "default_cpuset": "",
      "allocated_cpuset": "0-23",
      "pod_entries": {
        "754dc697-7658-47cc-8faa-0280c2931925": {
          "stress": {
            "pod_uid": "754dc697-7658-47cc-8faa-0280c2931925",
            "pod_namespace": "default",
            "pod_name": "dedicated-normal-pod2",
            "container_name": "stress",
            "container_type": "MAIN",
            "owner_pool_name": "dedicated",
            "allocation_result": "0-23",
            "original_allocation_result": "0-23",
            "topology_aware_assignments": {
              "0": "0-23"
            },
            "original_topology_aware_assignments": {
              "0": "0-23"
            },
            "init_timestamp": "2024-06-06 05:09:27.902883009 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores",
              "numa_binding": "true",
--
            "original_topology_aware_assignments": {
              "0": "0-19"
            },
            "init_timestamp": "2024-06-06 05:58:40.67405812 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "qosLevel": "reclaimed_cores",
            "request_quantity": 42000
          }
        }
      }
    },
    "1": {
      "default_cpuset": "",
      "allocated_cpuset": "24-47",
      "pod_entries": {
        "754dc697-7658-47cc-8faa-0280c2931925": {
          "stress": {
            "pod_uid": "754dc697-7658-47cc-8faa-0280c2931925",
            "pod_namespace": "default",
            "pod_name": "dedicated-normal-pod2",
            "container_name": "stress",
            "container_type": "MAIN",
            "owner_pool_name": "dedicated",
            "allocation_result": "24-47",
            "original_allocation_result": "24-47",
            "topology_aware_assignments": {
              "1": "24-47"
            },
            "original_topology_aware_assignments": {
              "1": "24-47"
            },
            "init_timestamp": "2024-06-06 05:09:27.902883009 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "dedicated_cores",
              "numa_binding": "true",
--
              "1": "24-43"
            },
            "original_topology_aware_assignments": {
              "1": "24-43"
            },
            "init_timestamp": "2024-06-06 05:58:40.67405812 +0000 UTC",
            "labels": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "annotations": {
              "katalyst.kubewharf.io/qos_level": "reclaimed_cores"
            },
            "qosLevel": "reclaimed_cores",
            "request_quantity": 42000
          }
        }
      }
    }
  },
  "pod_entries": {
    "754dc697-7658-47cc-8faa-0280c2931925": {
      "stress": {
        "pod_uid": "754dc697-7658-47cc-8faa-0280c2931925",
        "pod_namespace": "default",
        "pod_name": "dedicated-normal-pod2",
        "container_name": "stress",
        "container_type": "MAIN",
        "owner_pool_name": "dedicated",
        "allocation_result": "0-47",
        "original_allocation_result": "0-47",
        "topology_aware_assignments": {
          "0": "0-23",
          "1": "24-47"
        },
        "original_topology_aware_assignments": {
          "0": "0-23",
          "1": "24-47"
        },
        "init_timestamp": "2024-06-06 05:09:27.902883009 +0000 UTC",
        "labels": {
          "katalyst.kubewharf.io/qos_level": "dedicated_cores"
        },
        "annotations": {
...