kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
566 stars 188 forks source link

Does not consider daemonset resources for node pool if the daemonset matches the nodepool using a node Affininty #1337

Open myaser opened 3 months ago

myaser commented 3 months ago

Description

Observed Behavior: I craeted a daemonset that has the following nodeAffinity

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: zalando.org/nvidia-gpu
                operator: Exists
            - matchExpressions:
              - key: karpenter.k8s.aws/instance-gpu-manufacturer
                operator: In
                values:
                - nvidia

observing karpenter logs and experimenting with scheduling pods of different sizes, I could find that karpenter's calculations for daemonset resources excludes this pod

I could confirm this by checking the code.

here it is reading only the first of the affinities relying on an outer loop to remove the first affinity and continue with the next one

but, this is not happening for daemonset calculation as shown here

to validate my findings, I flipped the affinities order and the calculations were corrected

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: karpenter.k8s.aws/instance-gpu-manufacturer
                operator: In
                values:
                - nvidia
            - matchExpressions:
              - key: zalando.org/nvidia-gpu
                operator: Exists

Expected Behavior: all affinities should be considered for calculating daemonset resources

Reproduction Steps (Please include YAML):

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: datalab-workloads
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never
  template:
    metadata:
    spec:
      kubelet:
        clusterDNS:
        - 10.0.1.100
        cpuCFSQuota: false
        kubeReserved:
          cpu: 100m
          memory: 282Mi
        maxPods: 32
        systemReserved:
          cpu: 100m
          memory: 164Mi
      nodeClassRef:
        name: datalab-workloads
      requirements:
      - key: node.kubernetes.io/instance-type
        operator: In
        values:
        - g4dn.xlarge
        - g4dn.4xlarge
        - g4dn.12xlarge
        - g4dn.16xlarge
        - g4dn.metal
        - g5.xlarge
        - g5.4xlarge
        - g5.16xlarge
        - g5.24xlarge
        - g5.48xlarge
        - g3s.xlarge
        - g4dn.2xlarge
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - spot
        - on-demand
      - key: kubernetes.io/arch
        operator: In
        values:
        - arm64
        - amd64
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - eu-central-1a
        - eu-central-1b
        - eu-central-1c
      startupTaints:
      - effect: NoSchedule
        key: zalando.org/node-not-ready
      taints:
      - effect: NoSchedule
        key: dedicated
        value: datalab-workloads

Versions:

jigisha620 commented 3 months ago

Hi @myaser, I was able to track this down. It definitely is a bug since we are not considering all affinities for calculating daemonset resources. Do you have a workaround for your use-case that can get you unblocked on this?

myaser commented 3 months ago

Hi @myaser, I was able to track this down. It definitely is a bug since we are not considering all affinities for calculating daemonset resources. Do you have a workaround for your use-case that can get you unblocked on this?

for my specific case, I just flipped the order of the affinities. since the other affinity is meant for cluster autoscaler.

since we are in a middle of migration we run both CA (legacy) and Karpenter together.

jonathan-innis commented 3 months ago

/triage accepted