kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
562 stars 185 forks source link

Karpenter behaviour with Disruption Budget #1350

Closed thangle-grabtaxi closed 2 months ago

thangle-grabtaxi commented 3 months ago

Description

Observed Behavior: When we make changes to Karpenter node pool, specifically AMI change, with a disruption budget of 100%, we expect Karpenter to rotate all nodes at once due to Drift. However, it only rotate 1 to 4 nodes at a time.

Disruption Budget is set to 100%, which comes out to be around 35-40 nodes.

image

During allowed disruption period of 10 minutes minutes, Karpenter rotate the instances in sequence, 1 to 4 nodes per batch.

image

This cause an issue of frequent restart for our applications. We have set specifically 100% disruption budget within a short time frame (10 mins) with the expectation of all nodes will be restarted only once, meaning only one restart for our applications.

Expected Behavior: Karpenter rotate all nodes at once.

Reproduction Steps (Please include YAML):

    apiVersion: karpenter.sh/v1beta1
    kind: NodePool
    metadata:
      name: <NODE_POOL_NAME>
    spec:
      template:
        metadata:
          labels:
            node_group_name: <NODE_POOL_NAME>
        spec:
          nodeClassRef:
            name: <NODE_CLASS_NAME>
          requirements:
            - key: "karpenter.k8s.aws/instance-hypervisor"
              operator: In
              values: ["nitro"]
            - key: "karpenter.sh/capacity-type"
              operator: In
              values: ["on-demand"]
            - key: kubernetes.io/os
              operator: NotIn
              values: ["windows"]
            - key: kubernetes.io/arch
              operator: In
              values: ["arm64", "amd64"]
            - key: "karpenter.k8s.aws/instance-generation"
              operator: Gt
              values: ["3"]
      disruption:
        consolidationPolicy: WhenUnderutilized
        expireAfter: Never
        budgets:
          # Disruption Window: 10 minutes from 3pm to 3.10pm SGT (7am to 7.10am UTC)
          # Disruption Impact: 100%
          # Non Disruption Window: From 3.10pm to 3pm SGT (7.10am to 7am UTC)
          - nodes: "0"
            schedule: "10 7 * * *"
            duration: "23h50m"
          - nodes: "100%"
      limits:
        cpu: "10000"
        memory: 10000Gi

Versions:

k8s-ci-robot commented 3 months ago

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
njtran commented 3 months ago

A disruption budget prescribes what the maximum amount of disruption is allowed, but there are other safeguards that are put in place while drifting nodes. We ensure that replacement nodes are online and healthy before disrupting drifted nodes, which naturally limits the total number of nodes that can be disrupting at once. I'd recommend using the observed rate of disruption you see as a way to understand how long you'd actually want to have your budgets be at 100% to get a full roll of your nodes.

github-actions[bot] commented 2 months ago

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.