kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
586 stars 195 forks source link

Empty t3.xlarge spot instance not getting deleted #1690

Open indra0007 opened 1 month ago

indra0007 commented 1 month ago

Description

Observed Behavior: Empty t3.xlarge spot instance not getting deleted

Expected Behavior: Empty Nodes should be deleted

Reproduction Steps (Please include YAML):

  1. Install below nodepool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: noncriticalinfra
spec:
  disruption:
    budgets:
      - duration: 164h
        nodes: '0'
        reasons:
          - Underutilized
          - Drifted
        schedule: 0 4 * * 0
    consolidateAfter: 1m
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    metadata:
      labels:
        service-layer/role: noncriticalinfra
    spec:
      expireAfter: Never
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
            - spot
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values:
            - c
            - m
            - r
            - t
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values:
            - '2'
  1. Install below deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1 
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        service-layer/role: noncriticalinfra
      containers:
      - name: nginx
        image: nginx:latest  
        ports:
        - containerPort: 80 
        resources:
          requests:
            cpu: "64m"
            memory: 64Mi
  1. Install below deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx2
  labels:
    app: nginx2
spec:
  replicas: 3  
  selector:
    matchLabels:
      app: nginx2
  template:
    metadata:
      labels:
        app: nginx2
    spec:
      nodeSelector:
        service-layer/role: noncriticalinfra
      containers:
      - name: nginx
        image: nginx:latest  
        ports:
        - containerPort: 80  
        resources:
          requests:
            cpu: "512m"
            memory: 256Mi

Once u performed all above 3 steps, 2 nodes will be created by karpenter (m3.medium and t3.xlarge)

  1. Delete both the deployments
    kubectl delete deploy nginx;
    kubectl delete deploy nginx2;

    Once both the deployments get deleted then ideally both the nodes should also get deleted one by one as both are empty. But surprisingly t3.medium instance gets immediately deleted as expected but not the t3.xlarge one. It's very surprising that those are behaving differently as those are part of same nodepool and should behave exact same way.

No logs emitted from karpenter whatsoever. As if karpenter knows nothing about that t3.xlarge node. No relevant events even from that nodeclaim. Below are the set of events from that nodeclaim

 Type    Reason             Age   From       Message
  ----    ------             ----  ----       -------
  Normal  Launched           31m   karpenter  Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched
  Normal  DisruptionBlocked  31m   karpenter  Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim
  Normal  Registered         30m   karpenter  Status condition transitioned, Type: Registered, Status: Unknown -> True, Reason: Registered
  Normal  Initialized        30m   karpenter  Status condition transitioned, Type: Initialized, Status: Unknown -> True, Reason: Initialized
  Normal  Ready              30m   karpenter  Status condition transitioned, Type: Ready, Status: Unknown -> True, Reason: Ready

Please let me know what's going on here.

Versions: Image version: 1.0.0@sha256:dd095cdcf857c3812f2084a7b20294932f461b0bff912acf58d592faa032fbef

k8s-ci-robot commented 1 month ago

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
indra0007 commented 1 month ago

It might be related to https://github.com/aws/karpenter-provider-aws/issues/6593. Also FYI the said node has nothing but some daemonset pods so I can consider that to be empty