MVP support / extension of support for serving workloads

mimowo commented 1 month ago

What would you like to be added:

I would like to make sure we have basic support for running serving workloads for the use case of running AI inference. In particular I would like to have support for Deployments, StatefulSets, and LeaderWorkerSets.

In the MVP work the integrations are based on single Plain pods (for Deployments) or Pod Groups (for StatefulSets).

Deployments This is a follow up to https://github.com/kubernetes-sigs/kueue/issues/2677.

What is needed:

introduce a dedicated Deployment integration, and validate that it can only be enabled when pod integration is enabled
copy the queue-name from Deployment down to PodTemlates

StatefulSets

What is needed:

introduce a dedicated StatefulSet integration, and validate that it can only be enabled when pod integration is enabled
copy the queue-name from StatefulSet down to PodTemlates
set the PodTemaplate labels for the PodGroup:
kueue.x-k8s.io/queue-name - from STS
kueue.x-k8s.io/pod-group-name - STS_ + STS name (+ probably some hash to avoid collisions as for workloads)
kueue.x-k8s.io/pod-group-total-count - STS replica count

In the longer run to support scaling of stateful sets we may need to do https://github.com/kubernetes-sigs/kueue/issues/77, but this is out of scope for the issue,

LeaderWorkerSet

The API is based on StatefulSets, so the integration would also use Pod Groups, similarly as for regular StatefulSets. Each LeaderWorkerGroup creates a new Pod Group. I a single pod group we will have:

Leader pod, controller by Leader’s STS
Worker pods, controller by unique, dedicated STS

The size of the group will be taken from LeaderWorkerSet.Spec.LeaderWorkerTemplate.Size and increased by 1 (to include the leader).

Why is this needed:

To support use cases of running AI training and inference in the same clusters, where the access to GPU is constrained by Kueue.

Completion requirements:

The API changes required are minimal (just potentially new labels / annotations), so I believe a new KEP is not required, but we need a proper documentation.

This enhancement requires the following artifacts:

[ ] Docs update

The artifacts should be linked in subsequent comments.

mimowo commented 1 month ago

/assign @trasc

mimowo commented 1 month ago

/cc @mwielgus @tenzen-y @dgrove-oss

kannon92 commented 1 month ago

/cc @liurupeng @ahg-g for LWS.

kannon92 commented 1 month ago

For LWS, would including a suspend field be a better forward thinking strategy?

mimowo commented 1 month ago

For LWS, would including a suspend field be a better forward thinking strategy?

For now complete "suspend" for serving workload isn't a use case we hear about. The preference is to reduce capacity by preempting individual pods, so that stopping a serving workload completely is the last resort option.

However, it is hard to say "never" in the long run, but I would keep it out of scope for this enhancement.

kannon92 commented 1 month ago

Sounds good. I guess in LWS case preemptiong would be the entire leader-worker group? Or preempting some workers?

mimowo commented 1 month ago

For now, the entire group.

vladikkuzn commented 1 month ago

/assign

tenzen-y commented 1 week ago

It looks like that this contains LWS and StatefulSet. /reopen

k8s-ci-robot commented 1 week ago

@tenzen-y: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/kueue/issues/2717#issuecomment-2321945727): >It looks like that this contains LWS and StatefulSet. >/reopen > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / kueue

MVP support / extension of support for serving workloads #2717