kubernetes-sigs / node-feature-discovery

Node feature discovery for Kubernetes
Apache License 2.0
775 stars 240 forks source link

Allocatable CPU not updating #1312

Closed cl-rf closed 1 year ago

cl-rf commented 1 year ago

What happened: Deploying a pod with a cpu request/limit does not change the allocatable value for the noderesourcetopology.

What you expected to happen: I expect the value to decrease based on the request/limit of the pod.

How to reproduce it (as minimally and precisely as possible): kubectl get noderesourcetopology.topology.node.k8s.io -o yaml

apiVersion: v1
items:
- apiVersion: topology.node.k8s.io/v1alpha2
  attributes:
  - name: topologyManagerPolicy
    value: single-numa-node
  - name: topologyManagerScope
    value: container
  kind: NodeResourceTopology
  metadata:
    creationTimestamp: "2023-08-26T19:04:07Z"
    generation: 3242
    name: ip-1.1.1.1-west-1.compute.internal
    resourceVersion: "67906"
    uid: 3f1f8a64-edbf-459a-a51b-12404b1f40d4
  topologyPolicies:
  - SingleNUMANodeContainerLevel
  zones:
  - costs:
    - name: node-0
      value: 10
    - name: node-1
      value: 21
    name: node-0
    resources:
    - allocatable: "32"
      available: "32"
      capacity: "36"
      name: cpu
    - allocatable: "98327445504"
      available: "98327445504"
      capacity: "100474929152"
      name: memory
    type: Node
  - costs:
    - name: node-0
      value: 21
    - name: node-1
      value: 10
    name: node-1
    resources:
    - allocatable: "36"
      available: "36"
      capacity: "36"
      name: cpu
    - allocatable: "99261173760"
      available: "99261173760"
      capacity: "101408657408"
      name: memory
    type: Node
kind: List
metadata:
  resourceVersion: ""`

Deploy a test.

`apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deployment1
spec:
  selector:
      matchLabels:
        name: test
  template:
    metadata:
      labels:
        name: test
    spec:
     containers:
      - name: test-deployment-1-container-1
        image: quay.io/fromani/numalign
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh", "-c"]
        args: [ "while true; do numalign; sleep 100000; done;" ]
        resources:
          limits:
            cpu: 20
          requests:
            cpu: 20`

The available and allocatable does not decrease.

`apiVersion: v1
items:
- apiVersion: topology.node.k8s.io/v1alpha2
  attributes:
  - name: topologyManagerPolicy
    value: single-numa-node
  - name: topologyManagerScope
    value: container
  kind: NodeResourceTopology
  metadata:
    creationTimestamp: "2023-08-26T19:04:07Z"
    generation: 3386
    name: ip-1.1.1.1-west-1.compute.internal
    resourceVersion: "70926"
    uid: 3f1f8a64-edbf-459a-a51b-12404b1f40d4
  topologyPolicies:
  - SingleNUMANodeContainerLevel
  zones:
  - costs:
    - name: node-0
      value: 21
    - name: node-1
      value: 10
    name: node-1
    resources:
    - allocatable: "99261173760"
      available: "99261173760"
      capacity: "101408657408"
      name: memory
    - allocatable: "36"
      available: "36"
      capacity: "36"
      name: cpu
    type: Node
  - costs:
    - name: node-0
      value: 10
    - name: node-1
      value: 21
    name: node-0
    resources:
    - allocatable: "98327445504"
      available: "98327445504"
      capacity: "100474929152"
      name: memory
    - allocatable: "32"
      available: "32"
      capacity: "36"
      name: cpu
    type: Node
kind: List
metadata:
  resourceVersion: ""`

Anything else we need to know?:

  kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.13.3
  kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/topologyupdater?ref=v0.13.3

Environment:

cl-rf commented 1 year ago

It looks like this is when deployments are used instead of just a single pod definition.

marquiz commented 1 year ago

ping @PiotrProkop @ffromani

@cl-rf did pod(s) from your deployment get scheduled on the node you're looking at?

ffromani commented 1 year ago

Hi, the meaning of the fields (we should probably annotate [more] the CRD spec) is

  1. Capacity: what the machine reports in total, e.g. from hardware
  2. Allocatable: capacity - reserved. If there are no resources reserved for the system, it will equal capacity
  3. Available: Allocatable - consumed, the amount of resources free for allocation

Thus "allocatable" should never change if the machine/cluster config doesn't change first, which is an event we expect to happen rarely. "available" should instead change depending on the workload running. And it seems to me in the above example it is changing?

HTH,

cl-rf commented 1 year ago

It is only 1 node. If I set the requirements for both cpu and memory, it seems like it reports fine vs just setting a cpu requirement.

Working

        resources:
          limits:
            cpu: 20
            memory: "256M"
          requests:
            cpu: 20
            memory: "256M"

vs

        resources:
          limits:
            cpu: 20
          requests:
            cpu: 20

noderesourcetopology.topology.node.k8s.io reports correctly with 20 less available.

  - costs:
    - name: node-0
      value: 10
    - name: node-1
      value: 21
    name: node-0
    resources:
    - allocatable: "36"
      available: "16"
      capacity: "36"
      name: cpu
ffromani commented 1 year ago

Yes, the accouting is done only for exclusively-assigned resources (CPUs in this case), which is possible only for guaranteed QoS pods. This may be surprising but is due to how kubernetes works, we can't do it differently. There's something else not working as expected? because it seems to me this part could be documented perhaps better, but seems to work as intended.

cl-rf commented 1 year ago

Yeah, I guess a note on the documentation would help.

ffromani commented 1 year ago

fair enough, I'll review the docs and perhaps post a PR to clarify further in the coming days