kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.49k stars 267 forks source link

Introduce ResourceFlavor fallback mechanism #2560

Open PBundyra opened 4 months ago

PBundyra commented 4 months ago

What would you like to be added: A mechanism that would allow fallback to a different ResourceFlavor, if Kueue cannot start a job on assigned flavor.

Why is this needed: Currently, there is no fallback mechanism to a different flavor in Kueue. It means that if there is a free capacity in Kueue, but there are stockouts on the cloud provider side, Kueue will assign the same flavor over and over to a given Workload. This results in wasteful assignments VMs to a Workload that will not start (e.g. Workload will get repeatedly 5 VMs, when it needs 10 of them to start)

Users would like to be able to configure Kueue in a way, so that in case there are stockouts, Kueue will try a different flavor.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

PBundyra commented 1 month ago

/remove-lifecycle stale