hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.82k stars 9.17k forks source link

Error when OnDemandBaseCapacity is greater than 0 when auto scaling group is being created #9841

Closed fingerquote closed 1 year ago

fingerquote commented 5 years ago

Community Note

Terraform Version

Affected Resource(s)

Terraform Configuration Files

resource "aws_launch_template" "webserver" {
  name_prefix            = "${var.environment_name}-web-"
  image_id               = data.aws_ami.web.id
  instance_type          = var.web_instance_priority_1

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "webserver-scaling-group" {
  name = "asg-${aws_launch_template.webserver.id}-${aws_launch_template.webserver.latest_version}"

  vpc_zone_identifier = sort(data.aws_subnet_ids.private_subnets.ids)

  desired_capacity = 2
  min_size         = 2
  max_size         = 4

  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.webserver.id
        version            = "$Latest"
      }

      override {
        instance_type = var.web_instance_priority_1
      }

      override {
        instance_type = var.web_instance_priority_2
      }

      override {
        instance_type = var.web_instance_priority_3
      }

      override {
        instance_type = var.web_instance_priority_4
      }
    }

    instances_distribution {
      on_demand_base_capacity                  = 2
      on_demand_percentage_above_base_capacity = 0
    }
  }

  target_group_arns         = [aws_alb_target_group.web.arn]
  health_check_type         = "ELB"
  wait_for_elb_capacity     = 2
  wait_for_capacity_timeout = "15m"

  lifecycle {
    create_before_destroy = true
  }
}

Expected Behavior

When the launch template gets updated due to an AMI change, the auto scaling group should get destroyed and recreated successfully.

Actual Behavior

Receive the error below:

Error: Error creating AutoScaling Group: ValidationError: Max bound, 0, must be greater than or equal to OnDemandBaseCapacity, 2. status code: 400, request id: e560b0d0-aa2d-11e9-af61-6f9caecbaf19

on main.tf line 482, in resource "aws_autoscaling_group" "webserver-scaling-group": 482: resource "aws_autoscaling_group" "webserver-scaling-group" {

Steps to Reproduce

  1. terraform apply

Important Factoids

When we used launch configurations, the ASG would get destroyed and recreated perfectly. We switched to launch templates in order to begin using the mixed instance policy, and that is when we started encountering this. Interestingly, if you set the on_demand_base_capacity to zero, it works fine.

References

gregmoy commented 4 years ago

I'm getting this when I have a initial_lifecycle_hook in my ASG. Works fine without it.

arq-anthonyw commented 4 years ago

While not directly related to this I just switched to a mix instance policy and now I get this exact error on scheduled scaling actions when I attempt to set min, max and desired to 0. If I get keep my ASG max value above or equal to the OnDemandBaseCapacity value it has no issues. So it's probably not a provider specific thing but an AWS API thing. Could probably just add some error checking to the provider for this particular scenario I guess?

Executing scheduled action ScaleDown. Status Reason: Max bound, 0, must be greater than or equal to OnDemandBaseCapacity, 1.

lowjoel commented 3 years ago

I think there's a similar post on the forum with a minimum repro: https://discuss.hashicorp.com/t/cannot-create-asg-with-mixed-instances-and-lifecycle-hooks/13400

fingerquote commented 3 years ago

I can confirm that the issue goes away if I simply remove the lifecycle hook from the ASG. This still exists in Terraform v0.13.5 with AWS Provider v3.15.0. Can anyone take a look at this? It is quite inconvenient for our pipeline automation.

icicimov commented 3 years ago

I see this with v0.14.7 and AWS provider 3.29.1 as well. Anyone understands what that error even means? What parameter Max bound is related to?

icicimov commented 3 years ago

I tried to manually reproduce the error using AWSCLI. I created the following config.json file:

{
    "AutoScalingGroupName": "my-asg",
    "MixedInstancesPolicy": {
        "LaunchTemplate": {
            "LaunchTemplateSpecification": {
                "LaunchTemplateName": "<my-launch-template>",
                "Version": "$Latest"
            },
            "Overrides": [
                {
                    "InstanceType": "t3.micro"
                },
                {
                    "InstanceType": "t2.micro"
                }
             ]
        },
        "InstancesDistribution": {
            "OnDemandBaseCapacity": 1,
            "OnDemandPercentageAboveBaseCapacity": 0,
            "SpotAllocationStrategy": "capacity-optimized"
        }
    },
    "LifecycleHookSpecificationList": [
    {
        "LifecycleHookName": "my-hook-terminate",
        "LifecycleTransition": "autoscaling:EC2_INSTANCE_TERMINATING",
        "NotificationTargetARN": "<my-sns-topic>",
        "RoleARN": "<my-as-notification-role>",
        "HeartbeatTimeout": 30,
        "DefaultResult": "CONTINUE"
    },
    {
        "LifecycleHookName": "my-hook-launch",
        "LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
        "NotificationTargetARN": "<my-sns-topic>",
        "RoleARN": "<my-as-notification-role>",
        "HeartbeatTimeout": 30,
        "DefaultResult": "CONTINUE"
    }],
    "MinSize": 1,
    "MaxSize": 5,
    "DesiredCapacity": 3,
    "VPCZoneIdentifier": "<my-vpc-subnets-list>"
}

utilizing an existing LaunchTemplate. Then created the ASG:

$ aws autoscaling create-auto-scaling-group --cli-input-json file://~/config.json --region eu-west-2

which was successful, no complaints from the CLI at all regarding the error Error creating Auto Scaling Group: ValidationError: Max bound, 0, must be greater than or equal to OnDemandBaseCapacity, 1. I saw in terraform.

I could see the ASG and the instances it launched as per the config:

$ aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names my-asg --region eu-west-2
{
    "AutoScalingGroups": [
        {
            "AutoScalingGroupName": "my-asg",
            "AutoScalingGroupARN": "arn:aws:autoscaling:eu-west-2:012345678901:autoScalingGroup:05b9c9a4-96c5-4050-a3a0-87fcc957eabc:autoScalingGroupName/my-asg",
            "MixedInstancesPolicy": {
                "LaunchTemplate": {
                    "LaunchTemplateSpecification": {
                        "LaunchTemplateId": "lt-0aaxxxxxxxxxxxxxxx",
                        "LaunchTemplateName": "<my-launch-template>",
                        "Version": "$Latest"
                    },
                    "Overrides": [
                        {
                            "InstanceType": "t3.micro"
                        },
                        {
                            "InstanceType": "t2.micro"
                        }
                    ]
                },
                "InstancesDistribution": {
                    "OnDemandAllocationStrategy": "prioritized",
                    "OnDemandBaseCapacity": 1,
                    "OnDemandPercentageAboveBaseCapacity": 0,
                    "SpotAllocationStrategy": "capacity-optimized"
                }
            },
            "MinSize": 1,
            "MaxSize": 5,
            "DesiredCapacity": 3,
            "DefaultCooldown": 300,
            "AvailabilityZones": [
                "eu-west-2c",
                "eu-west-2a",
                "eu-west-2b"
            ],
            "LoadBalancerNames": [],
            "TargetGroupARNs": [],
            "HealthCheckType": "EC2",
            "HealthCheckGracePeriod": 0,
            "Instances": [
                {
                    "InstanceId": "i-048xxxxxxxxxxxxxxx",
                    "InstanceType": "t2.micro",
                    "AvailabilityZone": "eu-west-2b",
                    "LifecycleState": "Pending:Wait",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0aaxxxxxxxxxxxxxxx",
                        "LaunchTemplateName": "<my-launch-template>",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                },
                {
                    "InstanceId": "i-049xxxxxxxxxxxxxxx",
                    "InstanceType": "t2.micro",
                    "AvailabilityZone": "eu-west-2c",
                    "LifecycleState": "Pending:Wait",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0aaxxxxxxxxxxxxxxx",
                        "LaunchTemplateName": "<my-launch-template>",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                },
                {
                    "InstanceId": "i-0bfxxxxxxxxxxxxxxx",
                    "InstanceType": "t3.micro",
                    "AvailabilityZone": "eu-west-2a",
                    "LifecycleState": "Pending:Wait",
                    "HealthStatus": "Healthy",
                    "LaunchTemplate": {
                        "LaunchTemplateId": "lt-0aaxxxxxxxxxxxxxxx",
                        "LaunchTemplateName": "<my-launch-template>",
                        "Version": "1"
                    },
                    "ProtectedFromScaleIn": false
                }
            ],
            "CreatedTime": "2021-05-11T06:41:00.139Z",
            "SuspendedProcesses": [],
            "VPCZoneIdentifier": "<my-vpc-subnets-list>",
            "EnabledMetrics": [],
            "Tags": [],
            "TerminationPolicies": [
                "Default"
            ],
            "NewInstancesProtectedFromScaleIn": false,
            "ServiceLinkedRoleARN": "arn:aws:iam::012345678901:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling"
        }
    ]
}

and here are the lifecycle hooks as well attached to the ASG as requested:

$ aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name my-asg --region eu-west-2
{
    "LifecycleHooks": [
        {
            "LifecycleHookName": "my-hook-launch",
            "AutoScalingGroupName": "my-asg",
            "LifecycleTransition": "autoscaling:EC2_INSTANCE_LAUNCHING",
            "NotificationTargetARN": "<my-sns-topic>",
            "RoleARN": "<my-as-notification-role>",
            "HeartbeatTimeout": 30,
            "GlobalTimeout": 3000,
            "DefaultResult": "CONTINUE"
        },
        {
            "LifecycleHookName": "my-hook-terminate",
            "AutoScalingGroupName": "my-asg",
            "LifecycleTransition": "autoscaling:EC2_INSTANCE_TERMINATING",
            "NotificationTargetARN": "<my-sns-topic>",
            "RoleARN": "<my-as-notification-role>",
            "HeartbeatTimeout": 30,
            "GlobalTimeout": 3000,
            "DefaultResult": "CONTINUE"
        }
    ]
}

Based on the above testing I would say there is something wrong with the API call that terraform makes to AWS in this scenario.

gudlyf commented 3 years ago

Confirmed this is still happening with:

Terraform 15.1 AWS provider v3.44.0

justinretzolk commented 2 years ago

Hey y'all :wave: Thank you for taking the time to file this issue and for the ongoing discussion! Given that there's been a number of AWS provider releases since the last update, can anyone confirm whether you're still experiencing this behavior?

igoratencompass commented 2 years ago

@justinretzolk I can confirm this is still broken with v1.0.11

davidjh7 commented 2 years ago

Have run into this recently, my guess is that it is caused by the twoPhases creation step when an initial lifecycle hook is in use. The create phase overrides the MaxSize to zero until the update phase is applied, but this causes validation to fail if a mixed instance policy is set with OnDemandBaseCapacity greater than zero.

igoratencompass commented 2 years ago

Someone with better knowledge of Go then me should submit a patch otherwise this issue will probably get a decade old before we get any support here.

johnsonaj commented 1 year ago

Hi @igoratencompass I've tried a few different configurations, with launch_template and initial_lifecycle_hook, to reproduce this but I am unable to. Would you be able to give an example configuration that is currently having this issue? Also, what version of the provider are you currently experiencing this with?

igoratencompass commented 1 year ago

@johnsonaj I just did a test with terraform v1.3.4 and aws provider v4.25.0 and I can confirm I did not get the ASG error this time around.

johnsonaj commented 1 year ago

@johnsonaj I just did a test with terraform v1.3.4 and aws provider v4.25.0 and I can confirm I did not get the ASG error this time around.

@igoratencompass 👋🏾 thanks for the update. I will go ahead and close out this issue since we cannot reproduce.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.