kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.99k stars 3.94k forks source link

Implement Leader Elections for VPA Updater and Recommender #6846

Closed lliu8080 closed 2 months ago

lliu8080 commented 4 months ago

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Vertical Pod Autoscaler (VPA) has 3 parts: Recommender, Updater and Admission Controller. Currently only the Admission Controller can run with multiple instances, but not the Recommender and Updater.

It can be problematic to run Recommender and Updater with a single replica as the replica can go into ImagePullBackOff, crashloopbackoff, OOM killed or other bad states after a new release deployment or node patching. Recommender and Updater should support multiple replicas and HA.

We want to enable VPA for large multi-tenant Kubernetes cluster fleet with thousands of EKS clusters. Multiple replicas will help ensure availability and make our lives easier.

Describe the solution you'd like.: This feature request is to implement leader elections for VPA recommender and updater so VPA recommender and updater can support multiple replicas.

Describe any alternative solutions you've considered.:

Additional context.:

adrianmoisey commented 4 months ago

/area vertical-pod-autoscaler

nikimanoledaki commented 4 months ago

This would be a great feature addition. We also run VPA in large Kubernetes clusters so having a HA VPA would be great.

apy-liu commented 4 months ago

Agree this is important for running VPA at a large scale, would be interested to see this get prioritized.

rolland-zhang commented 4 months ago

Single replicas of critical components for updater/recommender that contribute to managing cluster state always have a higher chance of failure. This is one of the features for VPA our team would be looking forward to the most

adrianmoisey commented 3 months ago

Another thought here: A users normal expectation is to be able to run multiple replicas of an application (I know this was my assumption when starting with the VPA). For a while we ran too many replicas until we figured out that it was not the correct thing to do. Adding a leader election can help users run the correct number of "active" VPAs (ie: 1).

client-go seems to support Leader Election: https://github.com/kubernetes/client-go/blob/c415c7650f46e6852e429307b5ab4f9435797b74/examples/leader-election/main.go

ialidzhikov commented 3 months ago

/assign