hellofresh / eks-rolling-update

EKS Rolling Update is a utility for updating the launch configuration of worker nodes in an EKS cluster.
Apache License 2.0
362 stars 80 forks source link

Fix DRY_RUN behavior #90

Closed nairb closed 3 years ago

nairb commented 3 years ago

When using DRY_RUN environment variable instead of the -p flag ASG modification and tag modification is skipped but it goes into a loop of waiting for the ASG to scale and checking cluster health which eventually fails. This change makes DRY_RUN function the same as the -p flag.

To reproduce run DRY_RUN=true eks_rolling_update.py --cluster_name CLUSTER_NAME

2021-02-04 09:25:58,123 INFO     Describing autoscaling groups...
2021-02-04 09:25:58,753 INFO     Pausing k8s autoscaler...
2021-02-04 09:25:58,987 INFO     K8s autoscaler modified to replicas: 0
2021-02-04 09:25:59,332 INFO     *** Checking autoscaling group test-all-pods2021011314330308490000001c ***
2021-02-04 09:25:59,332 INFO     Describing launch template for test-all-pods20210113143259732300000019...
2021-02-04 09:25:59,651 INFO     Instance id i-0a23421e229ace4 launch template version of '14' does not match asg launch template version of '15'
2021-02-04 09:25:59,652 INFO     Describing launch template for test-all-pods20210113143259732300000019...
2021-02-04 09:25:59,756 INFO     Instance id i-0ad4560ff5ce launch template version of '14' does not match asg launch template version of '15'
2021-02-04 09:25:59,756 INFO     Found 2 outdated instances
2021-02-04 09:25:59,757 INFO     *** Checking autoscaling group test-other-pods2021011314330333550000001d ***
2021-02-04 09:25:59,757 INFO     Describing launch template for test-other-pods20210113143258745100000017...
2021-02-04 09:25:59,872 INFO     Instance id i-0108f2345ce143c launch template version of '7' does not match asg launch template version of '8'
2021-02-04 09:25:59,872 INFO     Found 1 outdated instances
2021-02-04 09:26:00,474 INFO     Getting k8s nodes...
2021-02-04 09:26:00,729 INFO     Current k8s node count is 3
2021-02-04 09:26:00,730 INFO     Setting the scale of ASG test-all-pods2021011314330308490000001c based on 2 outdated instances.
2021-02-04 09:26:00,730 INFO     Modifying asg test-all-pods2021011314330308490000001c autoscaling to resume ...
2021-02-04 09:26:00,730 INFO     Skipping asg modification due to dry run flag set
2021-02-04 09:26:00,730 INFO     No previous capacity value tags set on ASG; setting tags.
2021-02-04 09:26:00,730 INFO     Saving tag to asg key: eks-rolling-update:original_capacity, value : 2...
2021-02-04 09:26:00,730 INFO     Skipping asg tag modification due to dry run flag set
2021-02-04 09:26:00,730 INFO     Saving tag to asg key: eks-rolling-update:desired_capacity, value : 4...
2021-02-04 09:26:00,730 INFO     Skipping asg tag modification due to dry run flag set
2021-02-04 09:26:00,730 INFO     Saving tag to asg key: eks-rolling-update:original_max_capacity, value : 10...
2021-02-04 09:26:00,730 INFO     Skipping asg tag modification due to dry run flag set
2021-02-04 09:26:00,730 INFO     Setting asg desired capacity from 2 to 4 and max size to 10...
2021-02-04 09:26:00,730 INFO     Skipping asg scaling due to dry run flag set
2021-02-04 09:26:00,730 INFO     Waiting for 90 seconds for ASG to scale before validating cluster health...
2021-02-04 09:27:30,737 INFO     Checking asg test-all-pods2021011314330308490000001c instance count...
2021-02-04 09:27:31,078 INFO     Asg test-all-pods2021011314330308490000001c does not have enough running instances to proceed
2021-02-04 09:27:31,078 INFO     Actual instances: 2 Desired instances: 4
2021-02-04 09:27:31,078 INFO     Validation failed for asg test-all-pods2021011314330308490000001c. Not enough instances online.
2021-02-04 09:27:31,078 INFO     Waiting for 90 seconds for ASG to scale before validating cluster health...
nairb commented 3 years ago

@crhuber could you take a look at this?

crhuber commented 3 years ago

@nairb thanks for the contribution!