KEP-4943 - Githubissues

bernot-dev commented 2 weeks ago

- One-line PR description: Adding new KEP

Issue link: #4943

Other comments:

k8s-ci-robot commented 2 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bernot-dev Once this PR has been reviewed and has the lgtm label, please assign maciekpytel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[keps/sig-autoscaling/OWNERS](https://github.com/kubernetes/enhancements/blob/master/keps/sig-autoscaling/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

k8s-ci-robot commented 2 weeks ago

Hi @bernot-dev. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

adrianmoisey commented 2 weeks ago

I know this is very much still in draft, but I thought I'd ask now. Does this work depend on in-place pod resizing?

bernot-dev commented 2 weeks ago

@adrianmoisey This does not depend on in-place pod resizing any more than existing VPA. However, it will benefit from it when that feature lands.

raywainman commented 2 weeks ago

This is super cool! I think this could definitely tackle a gap that has existed in vertically scaling daemon sets.

Some questions from me:

What is some of your high level thinking here around how to configure this?
Would we keep a histogram of usage per replica in this case?
What do we do about new pods being added to the daemonset?

(Feel free to also tell me to wait until this is captured in the KEP :), I realize this is still a draft proposal!)

bernot-dev commented 1 week ago

@raywainman Thanks for the questions! These are closely aligned with the first questions I am looking to tackle as I start filling in the details. I am going to be looking for ways to leverage existing code to whatever extent makes sense. I will have to make myself a bit more familiar with existing code before I can find those opportunities.

I'm hoping to sidestep the question of, "What do we do about new pods being added to the daemonset?"

Retaining existing behavior seems to be the best way forward for now. I can imagine reasonable arguments for a variety of behaviors in different situations. Perhaps it could become configurable in the future, but I would rather keep that out of scope for now.

adrianmoisey commented 1 week ago

@adrianmoisey This does not depend on in-place pod resizing any more than existing VPA. However, it will benefit from it when that feature lands.

Good to know. I know I may be jumping the gun here, so apologies for that, I know you are still working on the KEP.

The reason I asked about in-place is because I just assumed the solution you would use, so let me take a step back.

At the moment VPA admission-controller receives a Pod before it's scheduled to a node (to my understanding, please correct me if I am wrong), meaning that the VPA admission-controller won't know which node a specific Pod will land on. Do you have any ideas on how to get around this limitation for this KEP?

EDIT: I'm entirely wrong! Turns out that DaemonSets use node affinity to tell the scheduler where to schedule a Pod. I didn't know that! See https://github.com/kubernetes/kubernetes/blob/v1.31.2/pkg/controller/daemon/daemon_controller.go#L1014-L1019 and https://github.com/kubernetes/kubernetes/blob/v1.31.2/pkg/controller/daemon/util/daemonset_util.go#L173-L221

So I guess that's the answer, the node can be fetched from there.

SECOND EDIT: What about other workloads, besides DaemonSets?

kubernetes / enhancements

KEP-4943 #4947