kubernetes / enhancements

Enhancements tracking repo for Kubernetes
Apache License 2.0
3.46k stars 1.49k forks source link

DRA: Extend PodResources to include resources from Dynamic Resource Allocation #3695

Open klueska opened 1 year ago

klueska commented 1 year ago

Enhancement Description

klueska commented 1 year ago

/milestone v1.27 /label lead-opted-in

k8s-ci-robot commented 1 year ago

@klueska: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility.

In response to [this](https://github.com/kubernetes/enhancements/issues/3695#issuecomment-1414178160): >/milestone v1.27 >/label lead-opted-in Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
k8s-ci-robot commented 1 year ago

@klueska: Can not set label lead-opted-in: Must be member in one of these teams: [release-team-enhancements release-team-leads sig-api-machinery-leads sig-apps-leads sig-architecture-leads sig-auth-leads sig-autoscaling-leads sig-cli-leads sig-cloud-provider-leads sig-cluster-lifecycle-leads sig-contributor-experience-leads sig-docs-leads sig-instrumentation-leads sig-k8s-infra-leads sig-multicluster-leads sig-network-leads sig-node-leads sig-release-leads sig-scalability-leads sig-scheduling-leads sig-security-leads sig-storage-leads sig-testing-leads sig-windows-leads]

In response to [this](https://github.com/kubernetes/enhancements/issues/3695#issuecomment-1414178160): >/milestone v1.27 >/label lead-opted-in Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
SergeyKanzhelev commented 1 year ago

/milestone v1.27 /label lead-opted-in

dchen1107 commented 1 year ago

/label lead-opted-in

I had trouble to add lead-opted-in last couple of days. Trying it one more time ...

SergeyKanzhelev commented 1 year ago

/stage alpha

marosset commented 1 year ago

Hello @klueska 👋, Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PDT Thursday 9th February 2023.

This enhancement is targeting for stage alpha for v1.27 (correct me, if otherwise)

Here's where this enhancement currently stands:

For this enhancement, it looks like https://github.com/kubernetes/enhancements/pull/3738 will address the remaining requirements.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

marosset commented 1 year ago

This enhancement meets all of the requirements to be tracked in v1.27. Thanks!

marosset commented 1 year ago

Hi @klueska :wave:,

Checking in as we approach 1.27 code freeze at 17:00 PDT on Tuesday 14th March 2023.

Please ensure the following items are completed:

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

Please let me know if there are any other PRs in k/k I should be tracking for this KEP.

As always, we are here to help should questions come up. Thanks!

klueska commented 1 year ago

This is a dependent PR for the one you listed -- I have updated the description to include it: https://github.com/kubernetes/kubernetes/pull/115912

Rishit-dagli commented 1 year ago

Hi @klueska :wave:, I’m reaching out from the 1.27 Release Docs team. This enhancement is marked as ‘Needs Docs’ for the 1.27 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.27 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by March 16. For more information, please take a look at Documenting for a release to familiarize yourself with the documentation requirements for the release. Please feel free to reach out with any questions. Thanks!

klueska commented 1 year ago

Docs placeholder added in description

SergeyKanzhelev commented 1 year ago

Based on SIG Node meeting on 05/02/2023 we do NOT plan this for 1.28 release. Please comment otherwise.

klueska commented 1 year ago

@moshe010 I don't think we can progress this to beta until DRA itself progresses to beta.

moshe010 commented 1 year ago

@klueska I wasn't in the SIG Node meeting on 05/02/2023 which it was discussed and I never request this to be beta in 1.28. In the kep we stated that following: [1] alpha: "v1.27" beta: "v1.30" stable: "v1.32"

[1] - https://github.com/kubernetes/enhancements/pull/3915/files#diff-11e83115a85d63622d7dcdb3732b43f918caa972ff7f034a431f642227d22b2aL30-L32

SergeyKanzhelev commented 1 year ago

@klueska I wasn't in the SIG Node meeting on 05/02/2023 which it was discussed and I never request this to be beta in 1.28. In the kep we stated that following: [1] alpha: "v1.27" beta: "v1.30" stable: "v1.32"

[1] - https://github.com/kubernetes/enhancements/pull/3915/files#diff-11e83115a85d63622d7dcdb3732b43f918caa972ff7f034a431f642227d22b2aL30-L32

Updated this issue description

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pacoxu commented 10 months ago

/sig network

As https://github.com/kubernetes/enhancements/pull/3915#issuecomment-1868322774, @aojea mentions that

we have this debate with accelerator network devices and several proposal on how to do it , it will not be nice if kubernetes ends with two different ways of configuring these devices

/remove-lifecycle stale

As this is still alpha, we should reach an agreement before promoting it to beta.

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

SergeyKanzhelev commented 6 months ago

/remove-lifecycle rotten

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

klueska commented 3 months ago

/remove-lifecycle stale

haircommander commented 2 months ago

/milestone v1.32 /label lead-opted-in

aojea commented 2 months ago

@haircommander this has to also go through SIG Network, it is clear stated in the KEP this is required for implementing networking functionalities, we have also a working group and a proposal https://github.com/kubernetes/enhancements/pull/4861 and we want to be sure both things are not conflicting and are aligned with the overall strategy

haircommander commented 2 months ago

@aojea would you like me to remove the milestone?

aojea commented 2 months ago

no, absolutely not, just allow us to participate in the review, so we can align work that happens in parallel

haircommander commented 2 months ago

absolutely! thanks for making explicit the dependency :)

jenshu commented 2 months ago

Hello @klueska 👋, Enhancements team here.

Just checking in as we approach enhancements freeze on 02:00 UTC Friday 11th October 2024 / 19:00 PDT Thursday 10th October 2024.

This enhancement is targeting for stage beta for v1.32 (correct me, if otherwise)

Here's where this enhancement currently stands:

For this KEP, we would just need to update the following:

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well.

If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!

ffromani commented 2 months ago

Hello! I'm among the KEP wranglers that sig-node set up to help sig-node KEPs make progress smoothly during the 1.32 timeframe. Hello @moshe010 / @klueska / @adrianchiris, by reading the comments above it seems to me this KEP requires coordination with sig-network, is my understanding correct? In addition, does this KEP depend on other DRA features planned for the 1.32 cycle or is it independent wrt other DRA work? Thanks!

ffromani commented 2 months ago

Hello! I'm among the KEP wranglers that sig-node set up to help sig-node KEPs make progress smoothly during the 1.32 timeframe. Hello @moshe010 / @klueska / @adrianchiris, by reading the comments above it seems to me this KEP requires coordination with sig-network, is my understanding correct? In addition, does this KEP depend on other DRA features planned for the 1.32 cycle or is it independent wrt other DRA work? Thanks!

Sorry I forgot previously: PRR freeze on 3rd Oct! If you plan to move forward please post an update for this KEP so you opt in for PRR review! @moshe010 @klueska @adrianchiris

ffromani commented 2 months ago

ping about opting in the PRR review @moshe010 @klueska @adrianchiris because the deadline is looming

klueska commented 2 months ago

@ArangoGutierrez had said he was going to take this one over.

That said, there is no implementation work needed at the moment. All that is needed is to update the KEP to be in line with the latest code that is already merged.

johnbelamaric commented 1 month ago

I am confused, is this going to beta or staying in alpha?

klueska commented 1 month ago

There's not been much discussion this review cycle, but I'd argue that at it this point it needs to stay in alpha -- especially given the lack of communication with SIG Networking on its design.

That said, we should at least strive to get the text of the KEP aligned with the current implementation (as well as update its beta criteria) in this cycle. We can then talk about moving it to beta in 1.33, assuming it has had proper discussion in SIG Networking.

aojea commented 1 month ago

my understanding is that this KEP seems to target a networking out of band plugin approach, that makes sense on a DRA classic environment IIUIC.

Since then, DRA evolved and we are now making networking requirements part of the DRA efforts https://github.com/LionelJouin/kubernetes/commits/KEP-4817/ and making easy to integrate networking plugins directly into DRA https://github.com/aojea/dra-network-driver-template .

My questions is, with the new approach , removing DRA classic and that networking plugins can hook directly into DRA, is this still needed?

klueska commented 1 month ago

From my perspective, this KEP is important independent of its (possible) usage for networking. The PodResourcesAPI is designed to surface the full set of resources that are allocated to a pod, and thus including DRA allocated resources is a natural extension of this. For example, NVIDIA's DCGM prometheus exporter relies on the information provided via this API to link GPU metrics back to the pods that are consuming those GPUs.

ArangoGutierrez commented 1 month ago

/cc

aojea commented 1 month ago

I don't disagree on exposing the resources through the PodResourcesAPI and let the consumers work with that.

I was commenting on the second goal that is related to networking, we are making the integration with networking native, so encouraging that as a goal sounds contradictory with the current efforts, so may complain is not about the KEP is about this paragraph ...

To allow the DRA feature to work with CNIs that require complex network devices such as RDMA. DRA resource drivers will allocate the resources, and the meta-plugin will read the allocated CDI Devices using the PodResources API. The meta-plugin will then inject the device-id of these CDI Devices as CNI arguments and invoke other CNIs (just as it does for devices allocated by the device plugin today).

If you remove that goal and use a more generic text as "allow node components to use the PodResourcesAPI to use the DRA information to develop new features and integrations", then this is a SIG Node only KEP 😄

johnbelamaric commented 1 month ago

@haircommander can you un-opt-in this one? I will remove from my PRR board. Thanks

ArangoGutierrez commented 1 month ago

@ArangoGutierrez had said he was going to take this one over.

That said, there is no implementation work needed at the moment. All that is needed is to update the KEP to be in line with the latest code that is already merged.

I'm working on an update PR for the KEP, hope to have the PR link by tomorrow

haircommander commented 1 month ago

/remove-milestone v1.32 /remove-label lead-opted-in

ArangoGutierrez commented 1 month ago

https://github.com/kubernetes/enhancements/pull/4913

pacoxu commented 1 month ago

According to https://github.com/kubernetes/enhancements/pull/4913, beta is planned to be "v1.33"and stable: "v1.36".

So this is not milestone v1.32?

klueska commented 1 month ago

Correct -- https://github.com/kubernetes/enhancements/pull/4913 was just an update to the KEP to reflect the current state of the code base while remaining in alpha.

tjons commented 1 month ago

/milestone clear

pohly commented 1 week ago

/wg device-management