hellofresh / eks-rolling-update

EKS Rolling Update is a utility for updating the launch configuration of worker nodes in an EKS cluster.
Apache License 2.0
361 stars 81 forks source link

feat: WIP: allow user to over-scale a buffer of instances in an ASG #100

Closed js-timbirkett closed 2 years ago

js-timbirkett commented 3 years ago

Hello 👋 - A few weeks ago I opened #96 but promptly closed it as I could have solved the problem with a different tool. After looking back at this, I think it'd be simpler if solved in eks-rolling-update directly.

This PR adds a new env variable: ASG_BUFFER_INSTANCES which allows an arbitrary number to be given to eks_rolling_update.py and will cause each ASG to be over-scaled by that number.

But why?

The past few rolling upgrades I've done have resulted in some things like workloads with PV/PVC getting stuck in pending as other pods had started, scaleout of HPAs causing pods to get stuck in pending, deployments during rollout causing issues...

Since I've been pre-scaling each ASG by a few instances it hasn't been an issue and cluster-autoscaler takes care of scaling in unused compute after rollout.

As always, open to any feedback or ideas 😸

Thanks for an awesome tool!

pysysops commented 3 years ago

I wonder if this would help with #84 🤔

js-timbirkett commented 3 years ago

After looking at this again, I need to understand the behaviours of scaling in various cases first. I don't think it's as simple as the changes that have been made in this PR... I will look more closely at this 🔜

adkafka commented 3 years ago

Any update on this thread? We could use this extra parameter as well.

js-timbirkett commented 2 years ago

Closing due to lack of time.