Open dpiddock opened 4 weeks ago
This issue is currently awaiting triage.
If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Can you share your karpenter configuration application?
We install Karpenter with Helm:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/karpenter-workload
operator: Exists
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
controller:
resources:
limits:
memory: 1Gi
requests:
cpu: 0.25
memory: 1Gi
dnsPolicy: Default
logLevel: info
podAnnotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
podDisruptionBudget:
maxUnavailable: 1
name: karpenter
priorityClassName: system-cluster-critical
serviceAccount:
create: false
name: karpenter-controller
settings:
clusterEndpoint: https://[...].eks.amazonaws.com
clusterName: application-cluster
interruptionQueue: application-cluster-karpenter-interruption-handler
strategy:
rollingUpdate:
maxUnavailable: 1
tolerations:
- effect: NoSchedule
key: node.kubernetes.io/workload
operator: Equal
value: karpenter
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
A sample EC2NodeClass
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: mixed
spec:
amiFamily: AL2
amiSelectorTerms:
- id: ami-1 # amazon-eks-node-1.30-*
- id: ami-2 # amazon-eks-arm64-node-1.30-*
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
volumeSize: 128Gi
volumeType: gp3
detailedMonitoring: true
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required
role: application-cluster-node
securityGroupSelectorTerms:
- id: sg-1
- id: sg-2
subnetSelectorTerms:
- id: subnet-a
- id: subnet-b
- id: subnet-c
tags:
Edition: mixed
karpenter.sh/discovery: application-cluster
userData: |
#!/bin/bash
KUBELET_CONFIG=/etc/kubernetes/kubelet/kubelet-config.json
grep -v search /etc/resolv.conf > /etc/kubernetes/kubelet/resolv.conf
echo "$(jq '.resolvConf="/etc/kubernetes/kubelet/resolv.conf"' $KUBELET_CONFIG)" > $KUBELET_CONFIG
echo "$(jq '.registryPullQPS=10' $KUBELET_CONFIG)" > $KUBELET_CONFIG
echo "$(jq '.registryBurst=25' $KUBELET_CONFIG)" > $KUBELET_CONFIG
And the matching NodePool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: mixed
spec:
disruption:
budgets:
- nodes: 10%
- nodes: "0"
reasons:
- Drifted
consolidateAfter: 0s
consolidationPolicy: WhenEmptyOrUnderutilized
limits:
cpu: "1500"
template:
spec:
expireAfter: 168h # 1 week
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: mixed
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- c
- m
- r
- key: karpenter.k8s.aws/instance-cpu
operator: In
values:
- "8"
- "16"
- "32"
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values:
- "4"
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values:
- nitro
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
- us-east-1b
- us-east-1c
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- spot
startupTaints:
- effect: NoExecute
key: ebs.csi.aws.com/agent-not-ready
- effect: NoExecute
key: efs.csi.aws.com/agent-not-ready
terminationGracePeriod: 4h
weight: 50
If this feature will work properly, we are going to migrate to karpenter
Description
Observed Behavior: After upgrading to Karpenter 1.0, we tried to enact a policy to terminate nodes after 7d with a 4h terminationGracePeriod. However, Karpenter still refuses to terminate a pod at the deadline if the PDB does not allow for disruption. This results in us having large instances running with just a single workload pod as Karpenter has already evicted other workloads and tainted the node
karpenter.sh/disrupted:NoSchedule
💸 .Repeated events are generated against the node:
11 DaemonSet pods and 1 pod from a Deployment. The Deployment's PDB is configured to not allow normal termination of the pod.
Karpenter itself is logging:
Expected Behavior: A node owned by Karpenter reaches
expireAfter + terminationGracePeriod
and all pods are removed. Node is terminated.I'm not sure if this is actually a documentation bug? But the documentation certainly implies, to my reading, that PDBs get overridden when the grace period expires: terminationGracePeriod
Reproduction Steps (Please include YAML):
Versions:
Chart Version: 1.0.6
Kubernetes Version (
kubectl version
):Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment