Open siarhei-karanets-epam opened 4 years ago
Thanks for the highlighting this request! The current idea is to leverage PodDisruptionBudgets to avoid downtime in your apps. This is in general a better approach as it better respects the fine grained requirements of each app while balancing the speed of roll out over large clusters.
That said, having a configuration parameter to control how many nodes to drain and shutdown at a time in a rolling fashion is good to have.
The current approach with "kubergrunt eks deploy" causes downtime due to the fact that all nodes in a group receive drain simultaneously and all replicas of affected apps (the same for ingress controller, stateful sets, etc) start migration in the same time. Need to improve the update procedure to avoid downtimes.