Open sergkondr opened 1 year ago
Can you explain the use case of why you care about the number of nodes? It's a tricky metric, since nodes are vastly different sizes. You could potentially pay way more with a small number of massive nodes.
Can you explain the use case of why you care about the number of nodes? It's a tricky metric, since nodes are vastly different sizes. You could potentially pay way more with a small number of massive nodes.
I've been thinking for a while, and it looks like you are right. The only reason to think in the node category is a habit of thinking about servers in data centers or on-prem environments.
But anyway, I think it is a nice-to-have feature, maybe someone will implement it in the future.
I have a use-case for this, if I buy a specific amount of reserved instances and I want a provisioner with only the related instance type and the number of nodes, it will be easier to read/write instead of having to read the CPU amount * the nodes count.
That's more a nice to have than a really fundamental feature.
I'm curious -- is there a reason you don't use an EKS managed node group for your RI? If you're already paying for the instances, is there a reason to not have them online and ready to go?
Sorry for the formulation, we went for savings plans instead a few weeks back, but at that time we were thinking about RI and I thought that would be an easy way to write directly limits.nodes: X
in the provisioner.
We are a big company with multiple accounts so if a RI is not used in one account, another one will benefit of the cost reduction, so having all of them up is not that important.
One usecase for limiting number of nodes could be licensing… maybe you only paid for a max number of nodes with some specific agent for example for monitoring…
Also if you have large scale/complicated IP addressing like Custom Networking with secondary subnets, you may want to limit node count per host/primary AZ to ensure you always have IP addresses available.
Just my 2 cents: Considering Karpenter could be used to provision a wide range of instance types of various sizes and resource ratios, I think specifying number of nodes could be somewhat counter intuitive. We of course used to specify min & max node numbers with nodegroups / ASGs where it made sense. To leverage RIs or capacity reservations, would adding comments to indicate number of nodes help?
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: node.kubernetes.io/instance-type
operator: In
values:
- m6i.2xlarge
limits:
resources:
cpu: "80" # 10 instances * 8 vCPU
memory: 320Gi # 10 instances * 32Gi
Combining restricted requirements + a node count limit might be easy enough to manage.
At scale, using comments to denote instance sizes/classes/resource sets breaks down really quickly/is immediately out of date and is generally a pain to maintain. Also, many of the reasons for wanting to restrict based on node count have nothing to do with resourcing and everything to do with either the physical node count and/or IP/networking limits that are also disconnected from cpu or memory sizing.
Ah, yes. Apologies, I hadn't noticed your earlier comment about IP addressing. I concede that it's not a problem that's addressed by existing resource requirements.
Labeled for closure due to inactivity in 10 days.
Up
you may want to limit node count per host/primary AZ to ensure you always have IP addresses available
@sidewinder12s How would you achieve this if each instance type can have a different number of ENIs and a different number of IPs would be allocated for each?
At least in our case, our issues were in a large batch environment where pod density per node was not too bad, so we could generally allocate X IPs per node (vs lots of small pods where we might hit the per ENI IP assignment limits).
We're also using custom networking settings with the aws-vpc-cni to control IP usage, though this increases API Calls against AWS EC2 APIs.
From a licensing standpoint this would be extremely useful feature. For example: an org purchases 3 licenses, would be nice if karpenter could scale the cluster by replacing a smaller node with a larger one when workloads are added, keeping at 3 license limit. In our case the cost of the licenses heavily outweigh the cost of the nodes.
Max nodes could benefit the total cluster resource utilization. Ideally, I would like to schedule most of my pods on the least number of nodes ( depends of HA of course) to have simple control over the pods memory limit and since there isn't CPU limit already, more pods could use the unutilized CPU.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Better explanation here : https://github.com/kubernetes-sigs/karpenter/issues/745
we have a similar need (but as a global limit) which is to limit the max number of nodes per cluster in our setup; we only assign a single IP to the node (always). and we have limited pool of IPs. so the most straightforward way to do this is by having a global limit on number of nodes. similar to the flag --max-nodes-total on cluster autoscaler
this was also explained here https://github.com/aws/karpenter-provider-aws/issues/4462
I've started on adding a global with #1151 but still need to test
/lifecycle frozen
/assign @jukie
similar use case here: limited set of preallocated IPs and I want to make sure we never provision more instances then we have IPs. These IPs are static in the sense that they are communicated to customer upfront so they can allow our service in their firewalls.
I'd expect karpenter to honor this limit and if more pods are coming, it would consolidate to larger instances to keep the number at the limit.
Tell us about your request
It would be nice to have the ability to limit the number of nodes created by certain provisioner. For example:
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Let's say our application spawns pods for some tasks, and each pod requires a separate node. Now it limits by the max size of the node group. All these nodes are Spot instances, and there is a problem that sometimes there are no instances of the current family in the region, so we use different instance families: m4, m5, m5a, r5, r5a, etc. These instances could have a different amount of CPU and mem.
It would be nice to have the ability to limit the number of nodes by their count, not by their resources. It is clear that we have max pods in our app, but it is a bit of a synthetic example.
Are you currently working around this issue?
We use
limits.resources.cpu
with approx number of CPUs, but it is not accurate and not transparent a little bit.Additional Context
No response
Attachments
No response
Community Note