节点的动态超分比例在增加CPU消耗后，不降反升

flpanbin commented 1 month ago

What happened?

我按照动态超分的文档体验了下动态超分功能，但是在创建 testpod1 增加 cpu的消耗后，cpu的超分比 cpu_overcommit_ratio 不降反升。

没有pod运行时，查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.74
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.15
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135351666
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>

创建 testpod1 后，再次查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.99
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.41
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135554723
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>

[root@g-master1 katalyst]# kubectl get pod -n katalyst-system
NAME                                            READY   STATUS    RESTARTS       AGE
katalyst-controller-747545d674-54d2j            1/1     Running   9 (14h ago)    6d19h
katalyst-webhook-69bdb7d5d6-jnrh5               1/1     Running   0              6d19h
overcommit-katalyst-agent-l2rdx                 1/1     Running   0              6d19h
overcommit-katalyst-agent-sb2bd                 1/1     Running   0              6d19h
overcommit-katalyst-agent-vb5wc                 1/1     Running   0              6d19h
overcommit-katalyst-scheduler-58f64f644-442lb   1/1     Running   16 (14h ago)   6d19h
testpod1                                        1/1     Running   0              12s

katalyst 版本：

panbin@panbindeMacBook-Pro ~ % helm list -n katalyst-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/panbin/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/panbin/.kube/config
NAME        NAMESPACE       REVISION    UPDATED                                 STATUS      CHART                       APP VERSION
overcommit  katalyst-system 1           2024-05-27 22:01:28.110633 +0800 CST    deployed    katalyst-overcommit-0.5.0   v0.5.0

What did you expect to happen?

创建 testpod1 后，对应节点的 cpu 超分比 katalyst.kubewharf.io/cpu_overcommit_ratio 降低。

How can we reproduce it (as minimally and precisely as possible)?

按照这个文档操作即可：https://gokatalyst.io/docs/user-guide/resource-overcommitment/dynamic-overcommitment/

Software version

```console $ version # paste output here ```

pendoragon commented 1 month ago

@WangZzzhe 帮忙看看

WangZzzhe commented 1 month ago

@flpanbin 可以提供下节点的相关信息吗？ 1、创建测试pod前节点的request总量和负载； 2、测试pod的request和负载

flpanbin commented 1 month ago

@flpanbin 可以提供下节点的相关信息吗？ 1、创建测试pod前节点的request总量和负载； 2、测试pod的request和负载

创建 pod 前节点的资源信息：

apiVersion: v1
kind: Node
metadata:
  annotations:
    katalyst.kubewharf.io/cpu_overcommit_ratio: "2.5"
    katalyst.kubewharf.io/memory_overcommit_ratio: "2.5"
    katalyst.kubewharf.io/original_allocatable_cpu: "16"
    katalyst.kubewharf.io/original_allocatable_memory: 32676068Ki
    katalyst.kubewharf.io/original_capacity_cpu: "16"
    katalyst.kubewharf.io/original_capacity_memory: 32778468Ki
    katalyst.kubewharf.io/overcommit_allocatable_cpu: 27840m
    katalyst.kubewharf.io/overcommit_allocatable_memory: 38479337676800m
    katalyst.kubewharf.io/overcommit_capacity_cpu: 27840m
    katalyst.kubewharf.io/overcommit_capacity_memory: 38599923916800m
    katalyst.kubewharf.io/realtime_cpu_overcommit_ratio: "1.74"
    katalyst.kubewharf.io/realtime_memory_overcommit_ratio: "1.15"
    ...
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    katalyst.kubewharf.io/overcommit_node_pool: overcommit-demo
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: g-master2
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    ......
  name: g-master2
status:
  addresses:
  - address: g-master2
    type: Hostname
  allocatable:
    cpu: 27840m
    ephemeral-storage: "136351265362"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 38479337676800m
    pods: "180"
  capacity:
    cpu: 27840m
    ephemeral-storage: 144483Mi
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 38599923916800m
    pods: "180"

testpod1.yaml :

apiVersion: v1
kind: Pod
metadata:
  name: testpod1
  namespace: katalyst-system
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - g-master2
  containers:
  - name: testcontainer1
    image: polinux/stress:latest
    command: ["stress"]
    args: ["--cpu", "4", "--timeout", "6000"]
    resources:
      limits:
        cpu: 8
        memory: 8Gi
      requests:
        cpu: 4
        memory: 8Gi
  tolerations:
  - effect: NoSchedule
    key: test
    value: test
    operator: Equal

WangZzzhe commented 1 month ago

@flpanbin
对于内存来说是合理的，因为内存的申请量增加了但是负载没有变化。理论上CPU在pod创建成功，但stress负载还没起来的情况下是可能上升的，但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158

flpanbin commented 1 month ago

@flpanbin 对于内存来说是合理的，因为内存的申请量增加了但是负载没有变化。理论上CPU在pod创建成功，但stress负载还没起来的情况下是可能上升的，但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158

感谢您的及时回复，我再观察下日志，不过针对您的回答有几个疑问：

为什么对于内存说是合理的呢？
为什么 stress 负载还没起来的情况下是可能上升的？
请问动态超分的算法是什么？

WangZzzhe commented 4 weeks ago

@flpanbin 在负载不变的情况下，资源申请量增加，节点可分配资源减少，导致节点需要超分更多的资源来达到目标负载值。具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286

flpanbin commented 4 weeks ago

@flpanbin 在负载不变的情况下，资源申请量增加，节点可分配资源减少，导致节点需要超分更多的资源来达到目标负载值。具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286

感谢大佬，我研究研究。

flpanbin commented 3 weeks ago

@WangZzzhe 定位了下，应该是指标采集的问题，计算超分比时参与计算的 usage 是0。

I0609 01:45:03.275865       1 realtime.go:335] resource cpu request: 11964, allocatable: 16000, usage: 0, targetLoad: 0.6, existLoad: 0.4, overcommitRatio: 2.24775

overcommit-katalyst-agent 日志：

I0609 03:01:06.734172       1 provisioner.go:84] [malachite] heartbeat
E0609 03:01:06.738246       1 provisioner.go:111] [malachite] malachite is unhealthy: invalid http response status code 500, url: http://localhost:9002/api/v1/system/compute
I0609 03:01:06.738555       1 round_trippers.go:553] GET https://10.6.202.113:10250/stats/summary?timeout=10s 403 Forbidden in 3 milliseconds
E0609 03:01:06.739508       1 provisioner.go:65] failed to update stats/summary from kubelet: "failed to get kubelet config for summary api, error: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)"
I0609 03:01:08.043645       1 realtime.go:155] [overcommitment-aware-realtime] sumUpPodsResources, cpu: 1845m, memory: 3715141632
E0609 03:01:08.043814       1 store_util.go:98] failed to get metric pod prometheus-insight-agent-kube-prometh-prometheus-0, container prometheus, metric cpu.usage.container, err: [MetricStore] empty map
E0609 03:01:08.044067       1 store_util.go:98] failed to get metric pod prometheus-insight-agent-kube-prometh-prometheus-0, container config-reloader, metric cpu.usage.container, err: [MetricStore] empty map

malachite 日志报错，应该是没有正常工作：

panbin@panbindeMacBook-Pro ~ % kubectl logs  malachite-xk8n9 -n malachite-system -f
2024-06-09T02:03:07.481004862+00:00 - [ERROR] server/src/main.rs:187 [Panic] lib/src/cpu/processor.rs:464: called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }
2024-06-09T02:03:07.489192152+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:11.271581881+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:11.271754576+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:16.338537826+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:16.338612068+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:21.407855335+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:21.408025943+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:26.450034224+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:26.451268751+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:31.459491370+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:31.459570543+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:36.486691177+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:36.486756735+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:41.575957128+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:41.589261474+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:46.624823586+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:46.624905589+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:51.695793619+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:51.695892044+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:56.827341960+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:56.827457338+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:04:01.853256781+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:04:01.853372828+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:04:06.899599297+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned

flpanbin commented 3 weeks ago

可能是和 linux 版本有关，环境信息：

[root@g-master1 ~]# uname -a
Linux g-master1 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@g-master1 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

k8s 和 containerd 版本：

[root@g-master1 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.5", GitCommit:"93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c", GitTreeState:"clean", BuildDate:"2023-08-24T00:48:26Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.5", GitCommit:"93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c", GitTreeState:"clean", BuildDate:"2023-08-24T00:42:11Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
[root@g-master1 ~]# containerd -v
containerd github.com/containerd/containerd v1.7.6 091922f03c2762540fd057fba91260237ff86acb

flpanbin commented 3 weeks ago

我另外搭建了一个环境，使用 kubewharf enhanced kubernetes, 动态超分功能验证正常，看样子是对 Linux 内核版本和 containerd 的环境有要求？环境信息如下：

root@ubuntu:~/katalyst# uname -a
Linux ubuntu 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu:~/katalyst# kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
10.6.202.170   Ready    control-plane   26m   v1.24.6-kubewharf.8

root@ubuntu:~/katalyst# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:56:31Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:51:02Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

root@ubuntu:~/katalyst# containerd -v
containerd github.com/containerd/containerd v1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d

pendoragon commented 3 weeks ago

@flpanbin malachite 依赖 ebpf，所以 3.10 的内核应该不太行。4.19+ 应该可以

kubewharf / katalyst-core