kubewharf / katalyst-core

Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components
Apache License 2.0
389 stars 91 forks source link

CustomNodeResource status 中的 CPU allocatable 数值没有更新 #594

Closed flpanbin closed 1 month ago

flpanbin commented 1 month ago

What happened?

我按照文档 https://gokatalyst.io/docs/getting-started/colocation-quick-start/ 安装部署了 katalyst, 然后创建了 shared-normal-pod 应用,应用创建前后观察 kcnr 中 resource.katalyst.kubewharf.io/reclaimed_millicpu 的数值并没有变化。

root@ubuntu:~/katalyst/examples# kubectl get nodes -owide
NAME           STATUS   ROLES           AGE   VERSION               INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
10.6.202.151   Ready    control-plane   19h   v1.24.6-kubewharf.8   10.6.202.151   <none>        Ubuntu 20.04.5 LTS   5.4.0-125-generic   containerd://1.4.12
node1          Ready    <none>          19h   v1.24.6-kubewharf.8   10.6.202.152   <none>        Ubuntu 20.04.5 LTS   5.4.0-125-generic   containerd://1.4.12
node2          Ready    <none>          19h   v1.24.6-kubewharf.8   10.6.202.153   <none>        Ubuntu 20.04.5 LTS   5.4.0-125-generic   containerd://1.4.12

root@ubuntu:~/katalyst/examples# helm list -A
NAME                NAMESPACE           REVISION    UPDATED                                 STATUS      CHART                           APP VERSION
katalyst-colocation katalyst-system     1           2024-05-24 09:28:44.44903291 +0000 UTC  deployed    katalyst-colocation-orm-0.5.0   v0.5.0
malachite           malachite-system    1           2024-05-24 09:16:19.208333849 +0000 UTC deployed    malachite-0.1.0                 0.1.0

node2 节点的资源使用情况,确实是占用了2 core cpu.

image

shared-normal-pod 调度到了 node2 节点,节点的配置是 4核8G,该节点的 kcnr 中的 status.resources. allocatable 中的 cpu 和 memory 都没有变化。所有节点的信息都一样。

root@ubuntu:~/katalyst/examples# kubectl get kcnr node2 -oyaml
apiVersion: node.katalyst.kubewharf.io/v1alpha1
kind: CustomNodeResource
metadata:
  annotations:
    katalyst.kubewharf.io/cpu_overcommit_ratio: "1.00"
    katalyst.kubewharf.io/guaranteed_cpus: "0"
    katalyst.kubewharf.io/memory_overcommit_ratio: "1.00"
    katalyst.kubewharf.io/overcommit_cpu_manager: none
    katalyst.kubewharf.io/overcommit_memory_manager: None
  creationTimestamp: "2024-05-24T02:01:18Z"
  generation: 2
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: node2
    kubernetes.io/os: linux
  name: node2
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Node
    name: node2
    uid: 6a7c0a4b-451a-4c96-a580-c0e792772077
  resourceVersion: "131467"
  uid: 69d3adc7-7709-4e36-aa30-554ad7d6e1be
spec:
  nodeResourceProperties:
  - propertyName: numa
    propertyQuantity: "1"
  - propertyName: nbw
    propertyQuantity: 10k
  - propertyName: cpu
    propertyQuantity: "4"
  - propertyName: memory
    propertyQuantity: 8148204Ki
  - propertyName: cis
    propertyValues:
    - avx2
  - propertyName: topology
    propertyValues:
    - '{"Iface":"ens160","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.153"],"IPV6":null},"NSName":"","NSAbsolutePath":""}'
status:
  resources:
    allocatable:
      resource.katalyst.kubewharf.io/reclaimed_memory: 5Gi
      resource.katalyst.kubewharf.io/reclaimed_millicpu: 4k
    capacity:
      resource.katalyst.kubewharf.io/reclaimed_memory: 5Gi
      resource.katalyst.kubewharf.io/reclaimed_millicpu: 4k
  topologyPolicy: None
  topologyZone:
  - children:
    - attributes:
      - name: katalyst.kubewharf.io/netns_name
        value: ""
      - name: katalyst.kubewharf.io/resource_identifier
        value: ens160
      name: ens160
      resources:
        allocatable:
          resource.katalyst.kubewharf.io/net_bandwidth: 9k
        capacity:
          resource.katalyst.kubewharf.io/net_bandwidth: 9k
      type: NIC
    - name: "0"
      resources:
        allocatable:
          cpu: "4"
          memory: "8343760896"
        capacity:
          cpu: "4"
          memory: "8343760896"
      type: Numa
    name: "0"
    resources: {}
    type: Socket
image

What did you expect to happen?

node2 节点的kcnr status 数值更新。

How can we reproduce it (as minimally and precisely as possible)?

按照文档操作:https://gokatalyst.io/docs/getting-started/colocation-quick-start/

Software version

```console $ version # paste output here ```
pendoragon commented 1 month ago

@flpanbin 你是说给 shared cores pods 一个 load 之后,kcnr 上的可用 reclaimed resource 没有按预期变小是吗?

flpanbin commented 1 month ago

@pendoragon 是的,但是我升级了节点配置后(将 cpu 从 4核8G 升级到 8核16G),发现数据又更新了。

WangZzzhe commented 1 month ago

@pendoragon 是的,但是我升级了节点配置后(将 cpu 从 4核8G 升级到 8核16G),发现数据又更新了。

应该是节点规格太小的时候上报的资源始终受到默认MinReclaimedResourceForReport(4C5Gi)影响导致的

flpanbin commented 1 month ago

@pendoragon 是的,但是我升级了节点配置后(将 cpu 从 4核8G 升级到 8核16G),发现数据又更新了。

应该是节点规格太小的时候上报的资源始终受到默认MinReclaimedResourceForReport(4C5Gi)影响导致的

了解了,感谢!