hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

[Enhancement]: Revise https://github.com/hashicorp/terraform-provider-aws/pull/34929 to allow omitting maxHealthyPercentage #39872

Open hurriyetgiray-ping opened 1 month ago

hurriyetgiray-ping commented 1 month ago

Description

MaxHealthyPercentage is an optional preference as per AWS documentation, so we should be able to omit it in terraform.

https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_RefreshPreferences.html

MaxHealthyPercentage Specifies the maximum percentage of the group that can be in service and healthy, or pending, to support your workload when replacing instances. The value is expressed as a percentage of the desired capacity of the Auto Scaling group. Value range is 100 to 200.

If you specify MaxHealthyPercentage, you must also specify MinHealthyPercentage, and the difference between them cannot be greater than 100. A larger range increases the number of instances that can be replaced at the same time.

If you do not specify this property, the default is 100 percent, or the percentage set in the instance maintenance policy for the Auto Scaling group, if defined.

Type: Integer

Valid Range: Minimum value of 100. Maximum value of 200.

Required: No

Affected Resource(s) and/or Data Source(s)

aws_autoscaling_group

null is ignored in max_healthy_percentage and is set to 100 all the time.

 instance_refresh {
    preferences {
      max_healthy_percentage = null 
   }

https://github.com/hashicorp/terraform-provider-aws/pull/34929 set maxHealthyPercentage to 100 by default, but that breaks our instant refresh automation.

We used to omit this preference and terminate our instances one at a time, and detach its volume. The new instance would then be launched, the volume would attach to it. It worked as we wanted, but from hashicorp/aws v5.32.0 onwards (where this PR was added), it is not terminating an instance first.

Now we get a new instance launched first which cannot pass the health checks without a volume attachment. As a result we get infinitely looping instance refreshes.

We need to be able to set max_healthy_percentage to null and have the API not generate max HealthyPercentage in StartInstanceRefresh API. It works via AWS CLI and AWS console, so Terraform should support it too.

This is an example API generated by AWS Console when we choose cost based terminate and launch option and enable Violate min healthy percentage. We can see in CloudTrail that StartInstanceRefresh did not include maxHealthyPercentage.

We cannot recreate this API via Terraform anymore, unless we pin Terraform/AWS provider tohashicorp/aws v5.31.0 or earlier versions.

 "preferences": {
        "minHealthyPercentage": 100,
        "skipMatching": true,
        "autoRollback": false,
        "scaleInProtectedInstances": "Ignore",
        "standbyInstances": "Ignore"
    } 

Potential Terraform Configuration

terraform aws_autoscaling_group

There are two possible options.

Option 1: do not generate maxHealthyPercentage preference unless specified explicitly

No max_healthy_percentage is specified in terraform instance_refresh for ASG

 instance_refresh {
    preferences {

   }

Do not set max_healthy_percentage to anything and do not generate maxHealthyPercentage=100 in AWS API.

This should then behave as before (ie. before https://github.com/hashicorp/terraform-provider-aws/pull/34929) and omit maxHealthyPercentage in the StartInstanceRefresh API.

Option 2: null should we allowed for max_healthy_percentage

 instance_refresh {
    preferences {
      max_healthy_percentage = null 
   }

Omit maxHealthyPercentage in the StartInstanceRefresh API when it is set to null in terraform.

This should then give us a chance to override the default behaviour changed with (ie. before https://github.com/hashicorp/terraform-provider-aws/pull/34929). This option is probably more preferable for users who requested and use https://github.com/hashicorp/terraform-provider-aws/pull/34929.

References

This is what broke our automation PR https://github.com/hashicorp/terraform-provider-aws/pull/34929 https://github.com/hashicorp/terraform-provider-aws/blob/main/CHANGELOG.md#5320-january-11-2024

Would you like to implement a fix?

None

github-actions[bot] commented 1 month ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

dddsjn commented 1 month ago

This impacts me as well. The solution should fix our issue.