during our node rotation az rebalanced the nodes, at this time desired and running nodes were 6
ASG added an extra node to balance the nodes across AZs regardless of desired count
as soon as the new node started, ASG found 7 nodes running but desired is 6 so it killed an old node (because of OldestLaunchConfiguration termination policy) to match the desired count
after the last worker node rotation eks-rolling-update changed ASG from 6 to 4 thinking the activity is completed, it caused 1 node abruptly terminate
so for next cluster upgrade we updated the script locally to include AZRebalance in suspend
it would be good if that fix is included here as well
during our node rotation az rebalanced the nodes, at this time desired and running nodes were 6 ASG added an extra node to balance the nodes across AZs regardless of desired count as soon as the new node started, ASG found 7 nodes running but desired is 6 so it killed an old node (because of OldestLaunchConfiguration termination policy) to match the desired count
after the last worker node rotation eks-rolling-update changed ASG from 6 to 4 thinking the activity is completed, it caused 1 node abruptly terminate
so for next cluster upgrade we updated the script locally to include
AZRebalance
in suspendit would be good if that fix is included here as well