Add configurable dedupe timeout to NodeFailedToDrain event

evq commented 4 months ago

Description

What problem are you trying to solve? Hey there, we are currently utilizing spot instances provisioned via karpenter to run some relatively expensive workloads. For cost optimization reasons we want to run as few replicas as possible, in some cases this means a single instance. This obviously has major trade-offs in terms of availability, by default these workloads would have downtime every time a spot interruption occurs. In order to mitigate this, we have a PDB with a minimum available set to 1 which blocks the node from being prematurely drained. We then have a custom scaler triggered on the NodeFailedToDrain event which temporarily scales up to 2 replicas to allow for the replacement pod to gracefully start before the spot termination occurs.

This is all fine and good and works quite well, however one of the knobs on the custom scaler essentially controls how long to wait since the last NodeFailedToDrain event before we scale back down. We currently have this set to around 2m15s based on the default event dedupe timeout and apparent max retry time on the disruption queue. It'd be nice to be able to lower this further but doing so would seem to require changing the event dedupe time on NodeFailedToDrain. I see that there is the ability to set a per-event override, wondering if it would make sense to have some sort of configuration value which controls the dedupe time ( either globally or on a per-event basis. )

How important is this feature to you? Nice to have

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment I can provide a PR if guidance is provided on how to best expose the configuration.

jonathan-innis commented 4 months ago

wondering if it would make sense to have some sort of configuration value which controls the dedupe time

What's the impact of not being able to configure this value? Is it just higher cost since you have two pods running? I would also assume that Karpenter creates a new node that has enough space for those two pods rather than just for the one because we simulate pod capacity for pods on nodes that we know are going away.

jonathan-innis commented 4 months ago

Also, it seems like this issue might be relevant to y'all. If we just waited around up until the point that the terminationGracePeriod of the pod would require us to drain it in time, would that reduce your need to even orchestrate scale-up on this event? See https://github.com/aws/karpenter-provider-aws/issues/2917

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kubernetes-sigs / karpenter

Add configurable dedupe timeout to NodeFailedToDrain event #1021

Description