Open indra0007 opened 2 weeks ago
This issue is currently awaiting triage.
If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
This is also a big loss in the availability of my production application 😢
Can we disable this validation check conditionally based on some external NodePool configuration? We are hoping that if we can disable that somehow then later we can catch message here in form of some events when nodes are in hung state and rollout restart the deployments to unblock and and let the node get deleted?
What if we proceed with a rollout restart policy if the current situation doesn't align with the PDB configuration for all workloads(Not only single replica)?
The potential downside is that the pending pods from the restart might trigger the creation of new nodes, which could result in a never unstable environment.
Out of curiosity, I forked this repo and then commented out this validation check, and deploy that customised image in my cluster. I found out that a node (with app with single replica as well as PDB) is having below set of events in its corresponding nodeclaim. We can clearly see that node deletion is blocked for PDB violation. I guess DisruptionTerminating is key event here. When I did rollout restart of the existing deployment having one replica and PDB, it just gracefully scheduled the pod in other app and after that node got deleted.
So if we can get this request accepted then we can potentially look for DisruptionTerminating event from our custom controller and then just restart only those deploy/sts which are having one replica and PDB. For rest of the deploy/sts karpenter would automatically take care.
Not a nice solution but effective one
Normal Launched 34m karpenter Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched
Normal DisruptionBlocked 34m karpenter Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim
Normal Registered 33m karpenter Status condition transitioned, Type: Registered, Status: Unknown -> True, Reason: Registered
Normal Initialized 33m karpenter Status condition transitioned, Type: Initialized, Status: Unknown -> True, Reason: Initialized
Normal Ready 33m karpenter Status condition transitioned, Type: Ready, Status: Unknown -> True, Reason: Ready
Normal DisruptionBlocked 27m (x3 over 31m) karpenter Cannot disrupt NodeClaim: state node is nominated for a pending pod
Normal Unconsolidatable 11m (x2 over 26m) karpenter SpotToSpotConsolidation is disabled, can't replace a spot node with a spot node
Normal DisruptionTerminating 9m14s karpenter Disrupting NodeClaim: Drifted/Replace
Warning FailedConsistencyCheck 4m karpenter can't drain node, PDB "cwe/nginx" is blocking evictions
Normal ConsistentStateFound 4m karpenter Status condition transitioned, Type: ConsistentStateFound, Status: True -> False, Reason: ConsistencyCheckFailed, Message: Consistency Check Failed
Normal DisruptionBlocked 35s (x6 over 10m) karpenter Cannot disrupt NodeClaim: state node is marked for deletion
/assign
I will bring this issue to the community meeting and follow the next things.
Hey, can you control this on the pod level by having maxUnavailable = 0 and maxSurge = 1 ? To make K8s create a new pod first before proceeding with removing the previous?
Description
Have zero downtime for applications having single replica during consolidation/drift of underlined node
Very important (Blocker for adopting karpenter)
Hi,
We have cluster with development environments and each pod is a single replica. During the consolidation, Karpenter is deleting this single pod so it transforms to Terminating status and new pod is in Init status. Of course it cause downtime as new connections cannot be routed to Terminating pod.
We'd like to have some option to control how pods are rescheduled during consolidation. I think that maybe after node cordon, ideally we want to do rollout restart of pods instead of draining nodes.
I could see one pull request regarding that which had been closed at the end of last year.
So can we do something regarding this? May be you can just emit a event from nodeclaim when it is about to be disrupted, we can catch the event from our own custom controller and do roleout restart of existing deployment and statefulsets which would reschedule the workloads in other nodes and then eventually karpenter would automatically taint and delete the existing node. We tried to follow this approach but what we seen that DisruptionBlocked event is being emitted continuously if app with 1 replica and PDB exist simultaneously irrespective of whether Node is disruptable or not. So we really can't run any logic based on DisruptionBlocked event and it's kind of a false alarm for us.
In a nutshell we need a event when actually node could not be disrupted (Not as a validation check like current DisruptionBlocked event ) because of presence of PDB. May be toggling the sequence of DisruptionBlocked and Unconsolidatable would help
Below is the sequence of existing events