Open MauroSoli opened 7 months ago
@MauroSoli Just to be sure, would you mind verifying if BATCH_MAX_DURATION
and BATCH_IDLE_DURATION
are being set properly. You can do this by running kubectl describe pod <karpenter pod>
. Also there is not much that we can see from the logs that you have shared. Can you share logs with message found provisionable pod(s)
to see when the actual scheduling for pods happened?
What is the size of the cluster that you are working with? Is it a test cluster that does not have too many pods/deployments? You have mentioned that the pod is in pending state for more than 60 sec. Given that the node can take some time to come up, do you expect to see the pods scheduled right after 10 seconds?
Just to be sure, would you mind verifying if BATCH_MAX_DURATION and BATCH_IDLE_DURATION are being set properly. You can do this by running kubectl describe pod
$ kubectl describe pod karpenter-678d69d4d5-6rpgw -n karpenter | grep BATCH_
BATCH_MAX_DURATION: 90s
BATCH_IDLE_DURATION: 10s
Also there is not much that we can see from the logs that you have shared. Can you share logs with message found provisionable pod(s) to see when the actual scheduling for pods happened?
In the logs that I've shared there is the log that you are searching for:
{"level":"INFO","time":"2024-04-08T17:45:31.936Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"17dd42b","pods":"default/pause1, default/pause2, default/pause3, default/pause4, default/pause5 and 5 other(s)","duration":"84.72464ms"}
What is the size of the cluster that you are working with? Is it a test cluster that does not have too many pods/deployments?
We use Karpenter only to manage some specific workloads, like building or running cron jobs. The cluster-autoscaler manages all other workload.
You have mentioned that the pod is in pending state for more than 60 sec. Given that the node can take some time to come up, do you expect to see the pods scheduled right after 10 seconds?
That's not what I was meaning.\ After 60 seconds, while the pods were pending, karpenter was not scheduling new nodeClaim yet. Which instead happened after 90 seconds.
Let's say you scheduled a single pod and until 10 seconds there is no new pod then Karpenter will begin scheduling new nodeClaim after 10 seconds. However if a pod comes up before 10 seconds the batching window will be extended up to the maxDuration.
From the logs that you have shared it seems like there were 10 pods and in that case Karpenter would wait until the batchMaxDuration
of 90s before it begins scheduling a nodeClaim.
I've been able to reproduce this using the latest Karpenter v0.35.4 and the latest EKS v1.29.1-eks-508b6b3 and it seems to me there could be some misunderstanding between what is stated above, what is stated in the documentation and what is the actual Karpenter behavior.
If I set high and far enough times for IDLE_DURATION and MAX_DURATION (let's say 10 seconds and 300 seconds) and run a script like for I in $(seq 10); do kubectl run <single_pod>; sleep 2; done
, the following happens:
In other words: the batch window gets immediately extended to BATCH_MAX_DURATION
after the creation of the second pod is interrupting the first BATCH_IDLE_DURATION
interval.
Personally, I understood the current documentation in a different way. It says "_BATCH_IDLE_DURATION
The maximum amount of time with no new pending pods that if exceeded ends the current batching window. If pods arrive faster than this time, the batching window will be extended up to the maxDuration. If they arrive slower, the pods will be batched separately._". I understood it as follows:
BATCH_IDLE_DURATION
, jump to 2 (starting a new wait for IDLE duration)now - time of first pod created
) exceeds BATCH_MAX_DURATION
BATCH_MAX_DURATION
, it is immediatly closed, node claims are computed, and the whole process starts again from 1In other words: BATCH_IDLE_DURATION will always be the maximum "inactivity time" after which Karpenter starts computing node claims. To me, this makes much more sense because it allows shorter latencies between workload demands (pods creation) and its execution (node claims computed, nodes created, pods start running) while the MAX_DURATION still guarantees a maximum limit to said latency by closing batch windows even if new pods keep arriving.
To me, the documentation seems correct and describes the behavior I would expect, but Karpenter is misbehaving and actually uses the IDLE_DURATION "only the first time", then skipping directly to the MAX_DURATION and thus causing higher latencies between the pods creation and the nodes starting.
I had probably misunderstood what was being implied earlier. I was able to reproduce this. Looking into creating a fix for this. Thanks.
/assign @jigisha620
/label v1
@billrayburn: The label(s) /label v1
cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor
. Is this label configured under labels -> additional_labels
or labels -> restricted_labels
in plugin.yaml
?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
Description
Observed Behavior:
Karpenter always wait to the
batchMaxDuration
value and ignore thebatchIdleDuration
value when different pods are in pending status.I change the above values to batchIdleDuration=10s and batchMaxDuration=90s so it's clearer the behaviour.
As you can see in the following image, pods are in pending state after more than 60 seconds and karpenter was not scheduling new nodeClaim yet.
Here are the controller logs:
Expected Behavior:
The NodeClaim should be created when the
batchIdleDuration
time is passed and no new pending pods have been scheduled on cluster.Versions:
Chart Version:
0.35.4
Kubernetes Version (
kubectl version
):v1.29.1-eks-508b6b3
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment