kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
619 stars 204 forks source link

Node is "NotReady" and waiting at "Terminating" for hours #1573

Open ibalat opened 2 months ago

ibalat commented 2 months ago

Description

Observed Behavior:

{"level":"INFO","time":"2024-08-14T12:13:23.794Z","logger":"controller","message":"pod xxxx has a preferred Anti-Affinity which can prevent consolidation","commit":"490ef94","controller":"provisioner"}

[  423.353932] [  21815]  1001 21815  1314351    45882   770048        0          1000 java
[  423.361183] [  22145] 65532 22145   475493    12653   364544        0          1000 controller
[  423.368709] [  22199]  1001 22199   914462    84514   987136        0          1000 java
[  423.376073] [  33276]     0 33276   295992      601   188416        0          -998 runc
[  423.383344] [  33288]     0 33288     3094       12    45056        0          -998 exe
[  423.390531] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1344160e_dca0_4e9d_be15_ea0b63efb5b2.slice/cri-containerd-496edffa072b6d7835989a0dfbce3c3071
1a32903c757baf4fcd460c9479f3a8.scope,task=java,pid=22199,uid=1001
[  423.412634] Out of memory: Killed process 22199 (java) total-vm:3657848kB, anon-rss:338056kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:964kB oom_score_adj:1000
[  425.563371] oom_reaper: reaped process 22199 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        2024-08-14T13:38:15+00:00

image

image

image

Expected Behavior:

Reproduction Steps (Please include YAML): I don't have any idea. It occur periodically

Versions:

k8s-ci-robot commented 2 months ago

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
ibalat commented 2 months ago

omg, after 6h later, still pods at "Terminating" status and node is "NotReady".

image

btw, instance is m5.large. And I got new instance stdout logs:

[ 8080.945657] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/config/mysql/1 supports timestamps until 2038 (0x7fffffff)
[ 8080.956982] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/template-sql/mysql/2 supports timestamps until 2038 (0x7fffffff)
[ 8080.970168] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/template-sql/mysql/3 supports timestamps until 2038 (0x7fffffff)
[ 8080.981712] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/etl-sql/mysql/4 supports timestamps until 2038 (0x7fffffff)
[ 8080.993163] xfs filesystem being remounted at /var/lib/kubelet/pods/92d74e36-0bbb-40bb-9d92-d9daa4994369/volume-subpaths/prefera-sql/mysql/5 supports timestamps until 2038 (0x7fffffff)
[ 8112.949302] pci 0000:00:1d.0: [1d0f:8061] type 00 class 0x010802
[ 8112.952794] pci 0000:00:1d.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 8112.956559] pci 0000:00:1d.0: enabling Extended Tags
[ 8112.960301] pci 0000:00:1d.0: BAR 0: assigned [mem 0xc0114000-0xc0117fff]
[ 8112.964132] nvme nvme3: pci function 0000:00:1d.0
[ 8112.967238] nvme 0000:00:1d.0: enabling device (0000 -> 0002)
[ 8112.972352] PCI Interrupt Link [LNKA] enabled at IRQ 11
[ 8112.980317] nvme nvme3: 2/0/0 default/read/poll queues
[ 8113.229053] pci 0000:00:1c.0: [1d0f:8061] type 00 class 0x010802
[ 8113.232693] pci 0000:00:1c.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 8113.236424] pci 0000:00:1c.0: enabling Extended Tags
[ 8113.240326] pci 0000:00:1c.0: BAR 0: assigned [mem 0xc0118000-0xc011bfff]
[ 8113.244141] nvme nvme4: pci function 0000:00:1c.0
[ 8113.247190] nvme 0000:00:1c.0: enabling device (0000 -> 0002)
[ 8113.256918] nvme nvme4: 2/0/0 default/read/poll queues
[ 8113.573770] EXT4-fs (nvme3n1): mounted filesystem with ordered data mode. Opts: (null)
[ 8114.159309] IPv6: ADDRCONF(NETDEV_CHANGE): enia89b8c83c9a: link becomes ready
[ 8114.163261] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 8114.319775] xfs filesystem being remounted at /var/lib/kubelet/pods/177f12fc-a42d-464f-bdf0-ad1f53080f8b/volume-subpaths/scripts/kafka/2 supports timestamps until 2038 (0x7fffffff)
[ 8114.734723] EXT4-fs (nvme4n1): mounted filesystem with ordered data mode. Opts: (null)
[ 8114.972359] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 8115.074504] xfs filesystem being remounted at /var/lib/kubelet/pods/ad839521-93a0-4010-8b3f-0980d2375063/volume-subpaths/scripts/kafka/2 supports timestamps until 2038 (0x7fffffff)
[ 8119.176023] pci 0000:00:1b.0: [1d0f:8061] type 00 class 0x010802
[ 8119.179548] pci 0000:00:1b.0: reg 0x10: [mem 0x00000000-0x00003fff]
[ 8119.183285] pci 0000:00:1b.0: enabling Extended Tags
[ 8119.187010] pci 0000:00:1b.0: BAR 0: assigned [mem 0xc011c000-0xc011ffff]
[ 8119.190838] nvme nvme5: pci function 0000:00:1b.0
[ 8119.193879] nvme 0000:00:1b.0: enabling device (0000 -> 0002)
[ 8119.203356] nvme nvme5: 2/0/0 default/read/poll queues
[ 8120.146390] EXT4-fs (nvme5n1): mounted filesystem with ordered data mode. Opts: (null)
[ 8120.658980] IPv6: ADDRCONF(NETDEV_CHANGE): eni61bfec53e4d: link becomes ready
[ 8120.662926] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 8121.038972] xfs filesystem being remounted at /var/lib/kubelet/pods/9d3cd637-f153-4200-a302-04b9e60a273c/volume-subpaths/scripts/kafka/2 supports timestamps until 2038 (0x7fffffff)
[ 8299.855030] systemd-journald[537510]: File /var/log/journal/ec23eae178c2480d1224169d16678fc2/system.journal corrupted or uncleanly shut down, renaming and replacing.
sftim commented 2 months ago

If you're willing to try Karpenter 1.0 (newly released), you might see better behavior or diagnostics. I'd give it a go, honestly.

ibalat commented 2 months ago

@sftim thanks for suggestion, I'll try it but why karpenter or K8S doesn't intervene this situation? 18h passed and they are still waiting NotReady and Terminating. Is there any parameter to force terminate notready nodes? ttlAfterNotRegistered parameter deprecated and my consolidateAfter: 5m config not working for this situation :/

image
jigisha620 commented 2 months ago

HI @ibalat, From the information that you have shared, it seems like the node registered but never got initialized. Karpenter handles registration failures by waiting for 15 minutes to check if the node registers, if it doesn't then we go ahead and delete the nodeClaim. But we still have an open issue for nodes that Karpenter never initializes at all, which should be captured by https://github.com/kubernetes-sigs/karpenter/issues/750 where we are hoping to start by introducing a static TTL for initialization to kill off nodes that don't ever go Ready on startup. Can you describe the nodeClaim for this node and share it? Can you also share the logs from the time this happened so that we can confirm that's the issue?

ibalat commented 2 months ago

hi @jigisha620 , actually, nodes had initialized because these nodes are becoming "Ready", then pods are being scheduling and finally after a while (~30-60mins later) node is passing "NotReady" status. So, they work properly for a while. I tried to upgrade v1.0.0 but still same problem occur. I am sharing my nodeclass, nodepool and nodeclaim configs. Btw, do you know why pods still waiting at "Terminating" status? K8s or karpenter can force delete them after a while? Is there any config for that?

Also I found newly events, maybe they are related with this issue. Their repeat count so much

image
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: main
spec:
  amiSelectorTerms:
    - alias: al2023@latest
  role: "KarpenterNodeRole"
  subnetSelectorTerms:
    %{~ for subnet in eks_dev_v1_subnet_ids ~}
    - id: "${subnet}"
    %{~ endfor ~}
  securityGroupSelectorTerms:
    - name: "*dev-v1-node*"
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: main-green
spec:
  template:
    metadata:
      labels:
        node-group-name: main-green
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: main
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: [ "r5", "m5", "c6i" ]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      terminationGracePeriod: 5m
      expireAfter: 720h # 30 * 24h = 720h | periodically recycle nodes due to security concerns
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 5m
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "17843341971500854913"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
  creationTimestamp: "2024-08-15T10:59:10Z"
  finalizers:
  - karpenter.k8s.aws/termination
  generation: 1
  name: main
  resourceVersion: "525655958"
  uid: 742b9052-735a-4078-b2d3-bbfe0cf883e3
spec:
  amiSelectorTerms:
  - alias: al2023@latest
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 1
    httpTokens: required
  role: KarpenterNodeRole
  securityGroupSelectorTerms:
  - name: '*dev-v1-node*'
  subnetSelectorTerms:
  - id: subnet-xx
  - id: subnet-xx
  - id: subnet-xx
status:
  amis:
  - id: ami-0d43f736643876936
    name: amazon-eks-node-al2023-arm64-standard-1.30-v20240807
    requirements:
    - key: kubernetes.io/arch
      operator: In
      values:
      - arm64
    - key: karpenter.k8s.aws/instance-gpu-count
      operator: DoesNotExist
    - key: karpenter.k8s.aws/instance-accelerator-count
      operator: DoesNotExist
  - id: ami-0d694ee9037e1f937
    name: amazon-eks-node-al2023-x86_64-standard-1.30-v20240807
    requirements:
    - key: kubernetes.io/arch
      operator: In
      values:
      - amd64
    - key: karpenter.k8s.aws/instance-gpu-count
      operator: DoesNotExist
    - key: karpenter.k8s.aws/instance-accelerator-count
      operator: DoesNotExist
  conditions:
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: AMIsReady
    status: "True"
    type: AMIsReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: InstanceProfileReady
    status: "True"
    type: InstanceProfileReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
 - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: SecurityGroupsReady
    status: "True"
    type: SecurityGroupsReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: SubnetsReady
    status: "True"
    type: SubnetsReady
  instanceProfile: dev-v1_xx
  securityGroups:
  - id: sg-xx
    name: dev-v1-xx
  - id: sg-xx
    name: dev-v1-xx
  subnets:
  - id: subnet-xx
    zone: eu-west-1c
    zoneID: euw1-az2
  - id: subnet-xx
    zone: eu-west-1a
    zoneID: euw1-az3
  - id: subnet-xx
    zone: eu-west-1b
    zoneID: euw1-az1
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  annotations:
    karpenter.sh/nodepool-hash: "14203437024067510703"
    karpenter.sh/nodepool-hash-version: v3
  creationTimestamp: "2024-08-15T10:55:03Z"
  generation: 1
  name: main-green
  resourceVersion: "525888522"
  uid: 5866c52d-bb13-479f-b034-822128ebc8f1
spec:
  disruption:
    budgets:
    - nodes: 10%
    consolidateAfter: 5m
    consolidationPolicy: WhenEmptyOrUnderutilized
  limits:
    cpu: 1000
  template:
    metadata:
      labels:
        node-group-name: main-green
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: main
      requirements:
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
       - key: karpenter.sh/capacity-type
        operator: In
        values:
        - spot
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - c
        - m
        - r
      - key: karpenter.k8s.aws/instance-family
        operator: In
        values:
        - r5
        - m5
        - c6i
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values:
        - "2"
      terminationGracePeriod: 5m
status:
  conditions:
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: NodeClassReady
    status: "True"
    type: NodeClassReady
  - lastTransitionTime: "2024-08-15T10:59:11Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-08-15T10:55:03Z"
    message: ""
    reason: ValidationSucceeded
    status: "True"
    type: ValidationSucceeded
  resources:
    cpu: "294"
    ephemeral-storage: 417873520Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 695806732Ki
    nodes: "20"
    pods: "2425"
apiVersion: karpenter.sh/v1
kind: NodeClaim
metadata:
  annotations:
    compatibility.karpenter.k8s.aws/cluster-name-tagged: "true"
    compatibility.karpenter.k8s.aws/kubelet-drift-hash: "15379597991425564585"
    karpenter.k8s.aws/ec2nodeclass-hash: "17843341971500854913"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v3
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/nodepool-hash: "14203437024067510703"
    karpenter.sh/nodepool-hash-version: v3
  creationTimestamp: "2024-08-15T12:05:33Z"
  finalizers:
  - karpenter.sh/termination
  generateName: main-green-
  generation: 1
  labels:
    karpenter.k8s.aws/instance-category: c
    karpenter.k8s.aws/instance-cpu: "32"
    karpenter.k8s.aws/instance-cpu-manufacturer: intel
    karpenter.k8s.aws/instance-ebs-bandwidth: "10000"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "true"
    karpenter.k8s.aws/instance-family: c6i
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "65536"
    karpenter.k8s.aws/instance-network-bandwidth: "12500"
    karpenter.k8s.aws/instance-size: 8xlarge
    karpenter.sh/capacity-type: spot
    karpenter.sh/nodepool: main-green
    kubernetes.io/arch: amd64
    kubernetes.io/os: linux
    node-group-name: main-green
    node.kubernetes.io/instance-type: c6i.8xlarge
    topology.k8s.aws/zone-id: euw1-az1
    topology.kubernetes.io/region: eu-west-1
    topology.kubernetes.io/zone: eu-west-1b
  name: main-green-7rncx
  ownerReferences:
  - apiVersion: karpenter.sh/v1
    blockOwnerDeletion: true
    kind: NodePool
    name: main-green
    uid: 5866c52d-bb13-479f-b034-822128ebc8f1
  resourceVersion: "525859504"
  uid: bd1aea84-18be-4d42-9c17-3936137c89a5
spec:
  expireAfter: 720h
  nodeClassRef:
    group: karpenter.k8s.aws
    kind: EC2NodeClass
    name: main
  requirements:
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - spot
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - c6i.12xlarge
    - c6i.16xlarge
    - c6i.24xlarge
    - c6i.32xlarge
    - c6i.8xlarge
    - c6i.metal
    - m5.12xlarge
    - m5.16xlarge
    - m5.24xlarge
    - m5.4xlarge
    - m5.8xlarge
    - m5.metal
    - r5.12xlarge
    - r5.16xlarge
    - r5.24xlarge
    - r5.4xlarge
    - r5.8xlarge
    - r5.metal
  - key: node-group-name
      operator: In
    values:
    - main-green
  - key: karpenter.k8s.aws/instance-generation
    operator: Gt
    values:
    - "2"
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - main-green
  - key: karpenter.k8s.aws/instance-category
    operator: In
    values:
    - c
    - m
    - r
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - c6i
    - m5
    - r5
  resources:
    requests:
      cpu: 4280m
      memory: 36152Mi
      pods: "67"
  terminationGracePeriod: 5m0s
status:
  allocatable:
    cpu: 31850m
    ephemeral-storage: 17Gi
    memory: 57691Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "84"
  capacity:
    cpu: "32"
    ephemeral-storage: 20Gi
    memory: 60620Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "84"
  conditions:
  - lastTransitionTime: "2024-08-15T12:15:35Z"
    message: ""
    reason: ConsistentStateFound
    status: "True"
    type: ConsistentStateFound
  - lastTransitionTime: "2024-08-15T15:46:53Z"
    message: ""
    reason: Consolidatable
    status: "True"
    type: Consolidatable
  - lastTransitionTime: "2024-08-15T12:06:14Z"
    message: ""
    reason: Initialized
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-08-15T12:05:35Z"
    message: ""
    reason: Launched
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-08-15T12:06:14Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-08-15T12:06:04Z"
    message: ""
    reason: Registered
    status: "True"
    type: Registered
  imageID: ami-0d694ee9037e1f937
  lastPodEventTime: "2024-08-15T15:41:53Z"
  nodeName: ip-10-xx-xx-xx.eu-west-1.compute.internal
  providerID: aws:///eu-west-1b/i-xxxxxx
jigisha620 commented 2 months ago

I think that the snippet that you have shared with "No allowed disruptions for disruption reason" is not the problem here. The nodes that you have, were already in NotReady state so they will not be considered for allowed disruptions. Can you share Karpenter controller logs from the same time?

ibalat commented 2 months ago

sure, between 05:58:24 and 06:09:12 3 nodes became NotReady and I saw them lively. But no related log :( You can see all logs between these times:

{"level":"INFO","time":"2024-08-16T05:58:24.287Z","logger":"controller","message":"created nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:26.268Z","logger":"controller","message":"launched nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:54.219Z","logger":"controller","message":"pod(s) have a preferred Anti-Affinity which can prevent consolidation",
{"level":"INFO","time":"2024-08-16T05:58:54.360Z","logger":"controller","message":"found provisionable pod(s)",
{"level":"INFO","time":"2024-08-16T05:58:54.360Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)",
{"level":"INFO","time":"2024-08-16T05:58:54.360Z","logger":"controller","message":"computed 1 unready node(s) will fit 1 pod(s)",
{"level":"INFO","time":"2024-08-16T05:58:54.376Z","logger":"controller","message":"created nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:56.599Z","logger":"controller","message":"deleted node",
{"level":"INFO","time":"2024-08-16T05:58:56.870Z","logger":"controller","message":"launched nodeclaim",
{"level":"INFO","time":"2024-08-16T05:58:56.902Z","logger":"controller","message":"deleted nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:19.838Z","logger":"controller","message":"registered nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:20.169Z","logger":"controller","message":"registered nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:24.803Z","logger":"controller","message":"pod(s) have a preferred Anti-Affinity which can prevent consolidation",
{"level":"INFO","time":"2024-08-16T05:59:37.493Z","logger":"controller","message":"initialized nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:38.378Z","logger":"controller","message":"initialized nodeclaim",
{"level":"INFO","time":"2024-08-16T05:59:49.497Z","logger":"controller","message":"deleted node",
{"level":"INFO","time":"2024-08-16T05:59:49.706Z","logger":"controller","message":"deleted nodeclaim",
{"level":"INFO","time":"2024-08-16T06:08:45.766Z","logger":"controller","message":"found provisionable pod(s)",
{"level":"INFO","time":"2024-08-16T06:08:45.766Z","logger":"controller","message":"computed new nodeclaim(s) to fit pod(s)",
{"level":"INFO","time":"2024-08-16T06:08:45.777Z","logger":"controller","message":"created nodeclaim",
{"level":"INFO","time":"2024-08-16T06:08:48.176Z","logger":"controller","message":"launched nodeclaim",
{"level":"INFO","time":"2024-08-16T06:09:12.703Z","logger":"controller","message":"registered nodeclaim",
ibalat commented 2 months ago

new update: not deletable node (although terminationGracePeriod: 5m and passed more time) show some events, maybe it can help

image

Node's nodeclaim have events below:

image

pods in node are waiting "Terminating" state and don't have any event or log at describe.

After I deleted nodeclaim manually, node deleted (But passed graceperiodtime).

jigisha620 commented 2 months ago

TerminationGracePeriod would not work if delete has not been called against the nodeClaim. In your case node went to NotReady state but nothing initiated it's deletion. I was able to reproduce something similar on my end where my node becomes NotReady due to Kubelet stopped posting node status. However, pods got rescheduled onto a different node. That makes me wonder if the pods you are running have some pre-stop hook that's preventing them from terminating?

ibalat commented 2 months ago

No prestop hook, finalizer or another thing. Just waiting like at screenshots.

jigisha620 commented 2 months ago

This is not necessarily an issue from Karpenter. To investigate further, we will have to take a look at the kubelet logs to know why pods remained stuck at Terminating. Since you are using an eks ami, you can run a script that's on your worker node at /etc/eks called log-collector-script which would help us get the kubelet logs. If you have AWS premium support then you can open a ticket to investigate those logs or you can send them over and I can try looking into them.

ibalat commented 2 months ago

when it happens, I couldn't login EC2, it doesn't response. But I could get stdout, it below.

[  423.390531] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1344160e_dca0_4e9d_be15_ea0b63efb5b2.slice/cri-containerd-496edffa072b6d7835989a0dfbce3c3071
1a32903c757baf4fcd460c9479f3a8.scope,task=java,pid=22199,uid=1001
[  423.412634] Out of memory: Killed process 22199 (java) total-vm:3657848kB, anon-rss:338056kB, file-rss:0kB, shmem-rss:0kB, UID:1001 pgtables:964kB oom_score_adj:1000
[  425.563371] oom_reaper: reaped process 22199 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        2024-08-14T13:38:15+00:00
suraj2410 commented 2 months ago

we see this too many times

JacobHenner commented 2 months ago

@ibalat @suraj2410

What do the disk IOPS, disk idle time, and memory metrics look like for the affected hosts? Could this be the problem described in https://github.com/bottlerocket-os/bottlerocket/issues/4075#issuecomment-2319361813? (applicable to Bottlerocket, but also observed with AL2).

ibalat commented 2 months ago

I had removed karpenter and reinstalled cluster autoscaler. But I can test it again in this week. After test, I will share results with you