kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.33k stars 232 forks source link

Configurable mechanism for Resource Abstraction #2937

Open dgrove-oss opened 2 weeks ago

dgrove-oss commented 2 weeks ago

What would you like to be added:

The capability for a cluster admin to configure Kueue to customize the computation it does to derive the Resource requirements of a Workload from the Resource requests/limits in the PodSpecs of the submitted Job.

Why is this needed:

Configurable Resource transformations would enable more flexible definitions of Quotas that can be both simpler and more powerful than those possible via simple mirroring of the PodSpec Resources of Jobs into Workloads. It would support at least the following scenarios:

  1. Reducing multiple complex related accelerator resources into a simpler resource that is more suitable for quota management. The motivation example here is the various MIG resources created by the NVIDIA CPU Operator when it is operating in a mixed strategy.

  2. Mapping multiple resources into an abstract currency that can be used to define quotas in terms of the relative cost of the resources (eg cheap vs. expensive GPUs or spot vs normal cloud VMs).

Both scenarios were discussed in the Batch WG call of 8/29/24 (https://www.youtube.com/watch?v=5nb_Ut-PLac), resulting in a decision to open a KEP to refine a design for this capability. The presentation is attached here: BatchWG-MIGResourceAbstraction.pdf

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

dgrove-oss commented 2 weeks ago

/assign I'll work on a KEP next week.