kubernetes-sigs / azuredisk-csi-driver

Azure Disk CSI Driver
Apache License 2.0
147 stars 193 forks source link

feat: add VerticalPodAutoscaler to csi controller pod #2536

Open umagnus opened 2 months ago

umagnus commented 2 months ago

What type of PR is this?

/kind feature

What this PR does / why we need it:

feat: add VerticalPodAutoscaler to csi controller pod

Which issue(s) this PR fixes:

Fixes #

Requirements:

Special notes for your reviewer:

Release note:

none
k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: umagnus Once this PR has been reviewed and has the lgtm label, please assign andyzhangx for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
k8s-ci-robot commented 2 months ago

Hi @umagnus. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
umagnus commented 1 month ago

AKS need to add a label for vpa admission controller webhook to impact csi controller pod in kube-system namespace, referenced to Can admission controller webhooks impact kube-system and internal AKS namespaces, wait for pr:https://github.com/kubernetes/autoscaler/pull/7402 to completed

voelzmo commented 1 month ago

Hey, I came here via the reference in the VPA PR. I see that you're about to use VPA for all the containers in the CSI controller Pod. My understanding is that the resource usage in those containers is usually pretty low – but I don't know the details of the Azure CSI controller.

I'm not sure if you tried this for a longer period of time already in some beta env or have long-term experience with other components with many containers in a single pod, but keep in mind that this change will lead to many additional interruptions for the Pod, as VPA will evict it when the recommendation for a container changes by more than 10% (which happens quickly for these small absolute values). This also happens if e.g. your initial requests are too high and the containers will be scaled down step by step. I'm not sure if those disruptions could become an issue – but if resource usage isn't expected to change much, it may just be better to not have VPA enabled for this and have fewer disruptions.

andyzhangx commented 1 month ago

@voelzmo thanks for the tips. we would like to use VPA mainly for CSI driver controller sidecar containers, the memory usage of those containers would grow dramatically when PV num grows in the cluster, current memory limit 500Mi is far from enough when there are a few thousand PVs in the cluster, in that case, the CSI driver would be OOM, that's the reason we want to use VPA.