Closed dmitry-mightydevops closed 2 years ago
I think i have experienced something similar :eyes:
@dmitry-mightydevops this is most likely due to the rolling update config for EKS managed node groups. Can you try setting max_unavailable
or max_unavailable_percentage
to 0 https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_node_group#update_config-configuration-block
The valid range for max_unavailable
is 1 - 100. A cluster with desired instance of 1, min 0 and max 3 will hike to desired instance of 6 and max 8 instances! Then it eventually (~30mins) shrink back to desired instance of 1 and max 3 - all for the sake of updating 1 node in this case.
Any idea how to limit the number of excessive instance?
Closing since this is not a module issue but a configuration setting. Please see https://docs.aws.amazon.com/eks/latest/userguide/managed-node-update-behavior.html
So I tested and max_unavailable
and it has no effect - still getting a lot of nodes added and then removed. Changing new group (update ops) becomes very time consuming =~20-30m and it's much faster to just add a new entry in the eks_managed_node_groups
, then kordon/drain and then remove the original nodegroup.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Description
Modification of the values
eks_managed_node_groups
results in constantly growing number of MAX capacity and desired capacity in ASG.node group is created with
Initial creation - all is good
The change
Modify
block_device_mappings.xvda.ebs.volume_size
50 -> 60 and performterraform apply
results in iterative updates to the ASG for the nodegroup:The plan
ASG iteratively being updated with the following values:
max increased to 6 max increased to 7 max increased to 8 max increased to 9
instances:
Then it starts cooling down
It this a normal and expected behavior? If I change the "key name" in the
eks_managed_node_groups
- then the old nodegroup is deleted and new one is created. And overall the procedure is much faster.I also got the same issue when my tags are updated. So if I change just tags, let's say
then I get a lot of EC2 instances spawned and it is very time consuming operation if
max_size > 1
Overall the "apply" took 20minutes.
Versions
Module version [Required]: 18.20.5
Terraform version:
this creates:
EC2
ASG
Launch Templates
Expected behavior
A new and single EC2 instance to be added if current number of instances is 1.
Actual behavior
Many EC2 instances added incrementally by updating the ASG, which is very time consuming, especially if max_size is >1