Open shubhamrai1993 opened 6 months ago
Triage - Going through the code, specifically this part - link, the algorithm seems to be -
I'm looking at the issue and will raise a proposed PR in some time
Can you share your NodePools? Can you also share your logs?
@njtran This is the node pool -
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gpu-nodepool
spec:
disruption:
budgets:
- nodes: 10%
consolidateAfter: 300s
consolidationPolicy: WhenEmpty
expireAfter: 168h
template:
metadata:
labels:
nvidia.com/gpu.deploy.operands: "true"
spec:
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: gpu-default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- on-demand
- key: topology.kubernetes.io/zone
operator: In
values:
- eu-west-1a
- eu-west-1b
- eu-west-1c
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- p2
- p3
- g4dn
- g5
- p4d
- p4de
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values:
- nano
- micro
- metal
taints:
- effect: NoSchedule
key: nvidia.com/gpu
weight: 10
These are the relevant logs -
{"level":"INFO","time":"2024-04-05T12:22:46.848Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"17dd42b","pods":"shub-ws/test-g5-66d4744f96-89sb8, shub-ws/test-g5-66d4744f96-bhpnh","duration":"1.548947822s"}
{"level":"INFO","time":"2024-04-05T12:22:51.794Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"17dd42b","nodeclaim":"gpu-nodepool-tsll8","provider-id":"aws:///eu-west-1a/i-0202813a4076f1b31","node":"ip-10-2-20-175.eu-west-1.compute.internal","allocatable":{"cpu":"47810m","ephemeral-storage":"143870061124","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"187109032Ki","nvidia.com/gpu":"4","pods":"737"}}
{"level":"INFO","time":"2024-04-05T12:23:02.393Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"17dd42b","pods":"shub-ws/test-g5-66d4744f96-89sb8, shub-ws/test-g5-66d4744f96-bhpnh","duration":"7.041627143s"}
{"level":"INFO","time":"2024-04-05T12:23:02.393Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"17dd42b","nodeclaims":1,"pods":2}
{"level":"INFO","time":"2024-04-05T12:23:02.686Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"17dd42b","nodepool":"gpu-nodepool","nodeclaim":"gpu-nodepool-6h4qd","requests":{"cpu":"14525m","ephemeral-storage":"39324644Ki","memory":"49456861Ki","nvidia.com/gpu":"2","pods":"10"},"instance-types":"g5.12xlarge, g5.24xlarge, g5.48xlarge"}
{"level":"INFO","time":"2024-04-05T12:23:07.594Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"17dd42b","nodeclaim":"gpu-nodepool-6h4qd","provider-id":"aws:///eu-west-1a/i-0015a5b2578cc50be","instance-type":"g5.12xlarge","zone":"eu-west-1a","capacity-type":"on-demand","allocatable":{"cpu":"47810m","ephemeral-storage":"134Gi","memory":"173400Mi","nvidia.com/gpu":"4","pods":"737","vpc.amazonaws.com/pod-eni":"107"}}
/kind feature
Hey @shubhamrai1993, sorry for the extensive delay on the response here. You're right in that our current algorith does:
This is a core part of the First Fit Decreasing bin-packing algorithm, where what you're suggesting is definitely a possible extension/modification to the algorithm. I'd want to explore this in an RFC so we can understand how this cost check changes our assumptions with our current algorithm, which currently short-circuits when we find a match. We try our best to be right, but bin-packing is an NP hard problem, and we can't guarantee we're perfectly optimal, like you're seeing now. Each modification can have an impact in other cases.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Description
Observed Behavior: I scheduled a deployment with two pods requiring single a10g GPUs (g5 machines) on AWS in a single zone. Karpenter created one g5.12xlarge with 4 A10G GPUs
Expected Behavior: Karpenter should have created two separate machines with 1 a10g GPU cards, so 2 separate g5.2xlarge machines looking at the cost difference which comes to around $2338.56 per month for this case.
Reproduction Steps (Please include YAML): deployment.yaml -
Versions:
Chart Version: v0.27.2
Kubernetes Version (
kubectl version
):Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment