kubewharf / katalyst-core

Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components
Apache License 2.0
389 stars 91 forks source link

节点的动态超分比例在增加CPU消耗后,不降反升 #604

Open flpanbin opened 1 month ago

flpanbin commented 1 month ago

What happened?

我按照 动态超分的文档体验了下动态超分功能,但是在创建 testpod1 增加 cpu的消耗后,cpu的超分比 cpu_overcommit_ratio 不降反升。

没有pod运行时,查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.74
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.15
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135351666
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>

创建 testpod1 后,再次查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.99
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.41
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135554723
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>

[root@g-master1 katalyst]# kubectl get pod -n katalyst-system
NAME                                            READY   STATUS    RESTARTS       AGE
katalyst-controller-747545d674-54d2j            1/1     Running   9 (14h ago)    6d19h
katalyst-webhook-69bdb7d5d6-jnrh5               1/1     Running   0              6d19h
overcommit-katalyst-agent-l2rdx                 1/1     Running   0              6d19h
overcommit-katalyst-agent-sb2bd                 1/1     Running   0              6d19h
overcommit-katalyst-agent-vb5wc                 1/1     Running   0              6d19h
overcommit-katalyst-scheduler-58f64f644-442lb   1/1     Running   16 (14h ago)   6d19h
testpod1                                        1/1     Running   0              12s

katalyst 版本:

panbin@panbindeMacBook-Pro ~ % helm list -n katalyst-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/panbin/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/panbin/.kube/config
NAME        NAMESPACE       REVISION    UPDATED                                 STATUS      CHART                       APP VERSION
overcommit  katalyst-system 1           2024-05-27 22:01:28.110633 +0800 CST    deployed    katalyst-overcommit-0.5.0   v0.5.0

What did you expect to happen?

创建 testpod1 后, 对应节点的 cpu 超分比 katalyst.kubewharf.io/cpu_overcommit_ratio 降低。

How can we reproduce it (as minimally and precisely as possible)?

按照这个文档操作即可:https://gokatalyst.io/docs/user-guide/resource-overcommitment/dynamic-overcommitment/

Software version

```console $ version # paste output here ```
pendoragon commented 1 month ago

@WangZzzhe 帮忙看看

WangZzzhe commented 1 month ago

@flpanbin 可以提供下节点的相关信息吗? 1、创建测试pod前节点的request总量和负载; 2、测试pod的request和负载

flpanbin commented 1 month ago

@flpanbin 可以提供下节点的相关信息吗? 1、创建测试pod前节点的request总量和负载; 2、测试pod的request和负载

创建 pod 前节点的资源信息:

apiVersion: v1
kind: Node
metadata:
  annotations:
    katalyst.kubewharf.io/cpu_overcommit_ratio: "2.5"
    katalyst.kubewharf.io/memory_overcommit_ratio: "2.5"
    katalyst.kubewharf.io/original_allocatable_cpu: "16"
    katalyst.kubewharf.io/original_allocatable_memory: 32676068Ki
    katalyst.kubewharf.io/original_capacity_cpu: "16"
    katalyst.kubewharf.io/original_capacity_memory: 32778468Ki
    katalyst.kubewharf.io/overcommit_allocatable_cpu: 27840m
    katalyst.kubewharf.io/overcommit_allocatable_memory: 38479337676800m
    katalyst.kubewharf.io/overcommit_capacity_cpu: 27840m
    katalyst.kubewharf.io/overcommit_capacity_memory: 38599923916800m
    katalyst.kubewharf.io/realtime_cpu_overcommit_ratio: "1.74"
    katalyst.kubewharf.io/realtime_memory_overcommit_ratio: "1.15"
    ...
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    katalyst.kubewharf.io/overcommit_node_pool: overcommit-demo
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: g-master2
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    ......
  name: g-master2
status:
  addresses:
  - address: g-master2
    type: Hostname
  allocatable:
    cpu: 27840m
    ephemeral-storage: "136351265362"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 38479337676800m
    pods: "180"
  capacity:
    cpu: 27840m
    ephemeral-storage: 144483Mi
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 38599923916800m
    pods: "180"

testpod1.yaml :

apiVersion: v1
kind: Pod
metadata:
  name: testpod1
  namespace: katalyst-system
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - g-master2
  containers:
  - name: testcontainer1
    image: polinux/stress:latest
    command: ["stress"]
    args: ["--cpu", "4", "--timeout", "6000"]
    resources:
      limits:
        cpu: 8
        memory: 8Gi
      requests:
        cpu: 4
        memory: 8Gi
  tolerations:
  - effect: NoSchedule
    key: test
    value: test
    operator: Equal
WangZzzhe commented 1 month ago

@flpanbin
对于内存来说是合理的,因为内存的申请量增加了但是负载没有变化。 理论上CPU在pod创建成功,但stress负载还没起来的情况下是可能上升的,但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158

flpanbin commented 1 month ago

@flpanbin 对于内存来说是合理的,因为内存的申请量增加了但是负载没有变化。 理论上CPU在pod创建成功,但stress负载还没起来的情况下是可能上升的,但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154 https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158

感谢您的及时回复,我再观察下日志,不过针对您的回答有几个疑问:

  1. 为什么对于内存说是合理的呢?
  2. 为什么 stress 负载还没起来的情况下是可能上升的?
  3. 请问动态超分的算法是什么?
WangZzzhe commented 4 weeks ago

@flpanbin 在负载不变的情况下,资源申请量增加,节点可分配资源减少,导致节点需要超分更多的资源来达到目标负载值。 具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286

flpanbin commented 4 weeks ago

@flpanbin 在负载不变的情况下,资源申请量增加,节点可分配资源减少,导致节点需要超分更多的资源来达到目标负载值。 具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286

感谢大佬,我研究研究。

flpanbin commented 3 weeks ago

@WangZzzhe 定位了下,应该是指标采集的问题,计算超分比时参与计算的 usage 是0。

I0609 01:45:03.275865       1 realtime.go:335] resource cpu request: 11964, allocatable: 16000, usage: 0, targetLoad: 0.6, existLoad: 0.4, overcommitRatio: 2.24775

overcommit-katalyst-agent 日志:

I0609 03:01:06.734172       1 provisioner.go:84] [malachite] heartbeat
E0609 03:01:06.738246       1 provisioner.go:111] [malachite] malachite is unhealthy: invalid http response status code 500, url: http://localhost:9002/api/v1/system/compute
I0609 03:01:06.738555       1 round_trippers.go:553] GET https://10.6.202.113:10250/stats/summary?timeout=10s 403 Forbidden in 3 milliseconds
E0609 03:01:06.739508       1 provisioner.go:65] failed to update stats/summary from kubelet: "failed to get kubelet config for summary api, error: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)"
I0609 03:01:08.043645       1 realtime.go:155] [overcommitment-aware-realtime] sumUpPodsResources, cpu: 1845m, memory: 3715141632
E0609 03:01:08.043814       1 store_util.go:98] failed to get metric pod prometheus-insight-agent-kube-prometh-prometheus-0, container prometheus, metric cpu.usage.container, err: [MetricStore] empty map
E0609 03:01:08.044067       1 store_util.go:98] failed to get metric pod prometheus-insight-agent-kube-prometh-prometheus-0, container config-reloader, metric cpu.usage.container, err: [MetricStore] empty map

malachite 日志报错,应该是没有正常工作:

panbin@panbindeMacBook-Pro ~ % kubectl logs  malachite-xk8n9 -n malachite-system -f
2024-06-09T02:03:07.481004862+00:00 - [ERROR] server/src/main.rs:187 [Panic] lib/src/cpu/processor.rs:464: called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }
2024-06-09T02:03:07.489192152+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:11.271581881+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:11.271754576+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:16.338537826+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:16.338612068+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:21.407855335+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:21.408025943+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:26.450034224+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:26.451268751+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:31.459491370+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:31.459570543+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:36.486691177+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:36.486756735+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:41.575957128+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:41.589261474+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:46.624823586+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:46.624905589+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:51.695793619+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:51.695892044+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:56.827341960+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:56.827457338+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:04:01.853256781+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:04:01.853372828+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:04:06.899599297+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
flpanbin commented 3 weeks ago

可能是和 linux 版本有关,环境信息:

[root@g-master1 ~]# uname -a
Linux g-master1 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@g-master1 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

k8s 和 containerd 版本:

[root@g-master1 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.5", GitCommit:"93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c", GitTreeState:"clean", BuildDate:"2023-08-24T00:48:26Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.5", GitCommit:"93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c", GitTreeState:"clean", BuildDate:"2023-08-24T00:42:11Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
[root@g-master1 ~]# containerd -v
containerd github.com/containerd/containerd v1.7.6 091922f03c2762540fd057fba91260237ff86acb
flpanbin commented 3 weeks ago

我另外搭建了一个环境,使用 kubewharf enhanced kubernetes, 动态超分功能验证正常,看样子是对 Linux 内核版本和 containerd 的环境有要求? 环境信息如下:

root@ubuntu:~/katalyst# uname -a
Linux ubuntu 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~/katalyst# kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
10.6.202.170   Ready    control-plane   26m   v1.24.6-kubewharf.8

root@ubuntu:~/katalyst# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:56:31Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:51:02Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

root@ubuntu:~/katalyst# containerd -v
containerd github.com/containerd/containerd v1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d
pendoragon commented 3 weeks ago

@flpanbin malachite 依赖 ebpf,所以 3.10 的内核应该不太行。4.19+ 应该可以