knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.59k stars 1.16k forks source link

Support for Resource Claims and DRA #15332

Open skonto opened 5 months ago

skonto commented 5 months ago

Describe the feature

As K8s community is actively working on moving to resource claims it would be great to add support for it at some point (adding this for future reference). This is Alpha since 1.26, will stay so in 1.31 but it is being worked agressively see https://github.com/kubernetes/enhancements/issues/3063#issuecomment-1915852197 so soon it will move to Beta and GA it seems.

In Knative, right now trying to set a resource claim fails validation as expected:

Error from server (BadRequest): error when creating "service.yaml": admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.containers[0].resources.claims

/area API

References

Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU Slicing What Can I Get You? An Introduction to Dynamic Resource Allocation - Freddy Rolland & Adrian Chiris Deploy vLLM server on Kubernetes using NVIDIA Kubernetes DRA driver KCSEU 2024 - Dynamic Resource Allocation - the path towards GA - Kevin Klues Patrick Ohly Meeting notes from K8s Serving WG K8s issues/KEPs: Dynamic Resource Allocation with Control Plane Controller DRA: structured parameters

skonto commented 5 months ago

cc @dprotaso @ReToCode

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.