Closed itaib-AF closed 1 month ago
What's your Provisioner look like as well as your karpenter-global-settings
? Karpenter uses a concept called vmMemoryOverheadPercent
since all EC2 instances come with some unknown overhead that is consumed by the OS/fabric layer that can't be known through the API, so we skim some capacity off the top to better estimate what the actual capacity for the instance will be.
Hey @jonathan-innis this is the provisioner
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
labels:
usage: apps-spot
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: [t, c, m, r]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
consolidation
enabled: true
providerRef:
name: default
And karpenter global settings -
data:
aws.clusterEndpoint: ''
aws.clusterName: nightly-backend
aws.defaultInstanceProfile: KarpenterNodeInstanceProfile-nightly-backend
aws.enableENILimitedPodDensity: 'true'
aws.enablePodENI: 'false'
aws.interruptionQueueName: nightly-backend-karpenter
aws.isolatedVPC: 'false'
aws.nodeNameConvention: ip-name
aws.vmMemoryOverheadPercent: '0.075'
batchIdleDuration: 1s
batchMaxDuration: 10s
featureGates.driftEnabled: 'false'
Would you suggest to increase aws.vmMemoryOverheadPercent? Shouldn't karpenter be aware of the overhead which is somewhat fixed per instance type in the default ami?
Would you suggest to increase aws.vmMemoryOverheadPercent
I was able to repro this and I'd recommend to bump this up to a higher value 0.08
as a workaround.
Shouldn't karpenter be aware of the overhead which is somewhat fixed per instance type in the default ami
We are working on improving this rough estimate to be more accurate since you're correct that it does seem to be somewhat fixed per instance. There's an issue that's currently tracking improving this to move it away from a rough percentage: aws/karpenter-core#716
Thank you very much! 🙏🏻
0.08
was not enough for us.
We had to bump it to 0.1
.
was not enough for us
Which instance types are you using that required you to bump it up to 0.1
?
A little bit of everything to be honest. I'm not sure which instance type was causing the flapping, but it stopped. Our provisioner spec's requirement is quite broad:
requirements:
- key: "karpenter.k8s.aws/instance-category"
operator: In
values: ["c", "m", "r"]
This can happen if you enable hugepages on the worker node. In that case karpenter cannot estimate the available RAM on the newly created node properly, and scheduler fails to run the pod on the new node.
@project-administrator Linking https://github.com/aws/karpenter-core/issues/751 since it has the details of extended resource support for a bunch of different things, including hugepages
.
I'm not as familiar with hugepages and how it affects memory so can you provide an example of how one affects the other in this case?
@jonathan-innis hugepages is a Linux kernel feature which is recommended to be enabled for some memory-consuming products like DBs, java apps, etc. Enabling hugepages for such products (where recommended) usually improves performance.
After enabling transparent huge pages with sysctl there might be some undesired effects as well like this: Suboptimal memory usage: THP may promote smaller pages to huge pages even when it is not beneficial, which can lead to increased memory usage.
I believe this is what happens: usual OS processes suddenly start using much more RAM and karpenter can't no longer estimate the amount of available RAM that newly created EC2 node has after the startup.
For example,
In our case karpenter spins up a bottlerocket-based node with 8Gb of RAM, scheduler is no longer able to fit a workload with requests: memory: 3Gi
.
That's obviously not a good idea to try use such small node with transparent huge pages, but I don't know any way to give karpenter a hint that nodes now have less available RAM in the OS after hugepages are enabled.
Node with THP enabled might have significantly less available RAM than karpenter expects it to have.
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.
/unassign @jonathan-innis
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Description
Observed Behavior: Karpenter provisions a node that doesn't fit the pending pod and the daemonsets. We have a pending pod with
In addition these daemon sets- 1 daemon set (filebeat) with 100Mi memory requests. and other daemon sets that have no requests/limits set.
We see in karpenter logs that its choosing c5.large with 3788Mi capacity -
once the node becomes ready we see that the allocatable capacity isn't enough for the pending pods that need a sum of 3100Mi for their requests. c5.large has allocatable capacity 3106640Kib == 3033.828125 Mib which is < 3100Mib so the pending pod doesn't get schedueled, the filebeat daemonset does.
Expected Behavior: For karpenter to provision a node with 3100Mib of memory allocatable and not ~3033Mib and for the pending pod to succeed scheduling on it
Reproduction Steps (Please include YAML): pending pod yaml
daemonset yaml
Versions:
Chart Version: v0.27.5
Kubernetes Version (
kubectl version
): 1.23Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment