kubernetes-sigs / karpenter

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
Apache License 2.0
614 stars 204 forks source link

Consolidation with spot by default is not appropriate #1605

Open leoryu opened 2 months ago

leoryu commented 2 months ago

Description

Observed Behavior:

From :

https://github.com/kubernetes-sigs/karpenter/blob/372b9c82eeb86efd7bbb1bbba0b55f230ab65c98/pkg/controllers/disruption/consolidation.go#L192-L205

Karpenter will set nodeclaim's CapacityType with spot if reqs allow CapacityType with [OD, spot].

This logic will let karpenter always creates spot machine, even the cheapest machine is OD but not the spot one.

And the worst case is that there is no spot machine avaliabel, the karpenter wiill report err:

{"level":"ERROR","time":"2024-09-23T06:27:00.080Z","logger":"controller","message":"failed launching nodeclaim","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"test-njk7n"},"namespace":"","name":"test-njk7n","reconcileID":"333a7dec-4354-4076-9015-6db7eb5f69bf","error":"insufficient capacity, all requested instance types were unavailable during launch"}

Since the created nodeclaim has reqs with 'spot', the consolidation will not sucess even we have cheaper OD machine.

Expected Behavior:

What I expected in consolidation is that:

Do not modify the CapacityType, just choose the chpeast machine if my reqs has ignore the capacity type.

k8s-ci-robot commented 2 months ago

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
leoryu commented 2 months ago

@njtran Hi, I found the code is commit by you 2 yeases ago. Could you explain why nodeclaim is always spot in consolidation? Since the real world spot machine might not be available, I think karpenter should choose the cheapeast one, even the machine is on-demand.