Open mimowo opened 1 month ago
/assign @trasc
/cc @mwielgus @tenzen-y @dgrove-oss
/cc @liurupeng @ahg-g for LWS.
For LWS, would including a suspend field be a better forward thinking strategy?
For LWS, would including a suspend field be a better forward thinking strategy?
For now complete "suspend" for serving workload isn't a use case we hear about. The preference is to reduce capacity by preempting individual pods, so that stopping a serving workload completely is the last resort option.
However, it is hard to say "never" in the long run, but I would keep it out of scope for this enhancement.
Sounds good. I guess in LWS case preemptiong would be the entire leader-worker group? Or preempting some workers?
For now, the entire group.
/assign
It looks like that this contains LWS and StatefulSet. /reopen
@tenzen-y: Reopened this issue.
What would you like to be added:
I would like to make sure we have basic support for running serving workloads for the use case of running AI inference. In particular I would like to have support for Deployments, StatefulSets, and LeaderWorkerSets.
In the MVP work the integrations are based on single Plain pods (for Deployments) or Pod Groups (for StatefulSets).
What is needed:
What is needed:
introduce a dedicated StatefulSet integration, and validate that it can only be enabled when pod integration is enabled
copy the queue-name from StatefulSet down to PodTemlates
set the PodTemaplate labels for the PodGroup:
kueue.x-k8s.io/queue-name - from STS
kueue.x-k8s.io/pod-group-name - STS_ + STS name (+ probably some hash to avoid collisions as for workloads)
kueue.x-k8s.io/pod-group-total-count - STS replica count
In the longer run to support scaling of stateful sets we may need to do https://github.com/kubernetes-sigs/kueue/issues/77, but this is out of scope for the issue,
The API is based on StatefulSets, so the integration would also use Pod Groups, similarly as for regular StatefulSets. Each LeaderWorkerGroup creates a new Pod Group. I a single pod group we will have:
The size of the group will be taken from LeaderWorkerSet.Spec.LeaderWorkerTemplate.Size and increased by 1 (to include the leader).
Why is this needed:
To support use cases of running AI training and inference in the same clusters, where the access to GPU is constrained by Kueue.
Completion requirements:
The API changes required are minimal (just potentially new labels / annotations), so I believe a new KEP is not required, but we need a proper documentation.
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.