hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

[Bug]: `aws_cloudwatch_metric_alarm` doesn't apply on first try with `aws_appautoscaling_policy` arn in `alarm_actions` #31329

Open jamie-chapman opened 1 year ago

jamie-chapman commented 1 year ago

Terraform Core Version

1.3.3

AWS Provider Version

4.66.1

Affected Resource(s)

aws_appautoscaling_policy

aws_cloudwatch_metric_alarm

Expected Behavior

Expected an aws_appautoscaling_policy to be applied and then a aws_cloudwatch_metric_alarm to be applied and attached to the autoscaling policy to monitor some MQ metrics and scale up/down based on the metrics.

Actual Behavior

When applying an autoscaling policy for an ECS service, the service itself deploys fine, and the autoscaling policy gets added as well, but the aws_cloudwatch_metric_alarm is meant to also be applied and attached to the autoscaling policy. Note that there are 2 policies and 2 cloudwatch metric alarms for each service to scale in and out.

There is no error at the end of the apply however the aws_cloudwatch_metric_alarm never gets attached to the autoscaling policy even with a depends_on to the aws_appautoscaling_policy resource and the arn of the autoscaling policy added in the alarm_actions input like so:

alarm_actions = [aws_appautoscaling_policy.scale_in.arn]

Then when running a 2nd apply there will be the aws_cloudwatch_metric_alarm in the plan and the aws_appautoscaling_policy wants to be replaced with a new arn and finally gets applied.

Screenshot 2023-05-10 at 17 18 09

This is an issue as on first glance it looks like autoscaling has been added correctly when it hasn't. Perhaps it is due to the aws_appautoscaling_policy not supporting aws_cloudwatch_metric_alarms as the example in the docs for aws_cloudwatch_metric_alarm gives a aws_autoscaling_policy as the example, not aws_appautoscaling_policy.

Relevant Error/Panic Output Snippet

There was no error message just aws_cloudwatch_metric_alarm resources aws_cloudwatch_metric_alarm that silently didn't get attached to the autoscaling policy.

Terraform Configuration Files

We are using terragrunt.

Steps to Reproduce

Create an ECS service with 2 aws_appautoscaling_policy resources attached to scale the DesiredCount of tasks in the service up or down, with a aws_cloudwatch_metric_alarm attached to the alarm_actions of each autoscaling policy. Then try to apply these resources and then apply again. If you don't see the issue at first try updating the task definition of the service and then apply, it seems that the arns are not updated properly.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_policy#alarm_arns

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Hey @jamie-chapman 👋 Can you supply a sample Terraform configuration and debug logs (redacted as needed) so that we have the necessary information in order to look into this?

jamie-chapman commented 1 year ago

Hey @justinretzolk I've just looked at some other issues and I see what is meant by configuration now, sorry here you go:

resource "aws_appautoscaling_policy" "scale_in" {
  policy_type        = "StepScaling"
  name               = "mq-scaling-in-example-service"
  resource_id        = "service/example-env/example-service"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_upper_bound = 0
      scaling_adjustment          = -1
    }
    cooldown = 300
  }
}

resource "aws_cloudwatch_metric_alarm" "mq_scale_in" {
  for_each = toset(["queue1", "queue2"])

  alarm_name          = "StepScaling-MQQueueScaleIn-example-service-${each.key}"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "3"
  threshold           = "20"
  alarm_description   = "AmazonMQ autoscale IN alarm example-service for queue ${each.key}"
  alarm_actions       = [aws_appautoscaling_policy.scale_in.arn]

  metric_query {
    id          = "messageready"
    label       = "Messages Ready in '${each.key}' queue"
    return_data = true
    metric {
      dimensions = {
        Broker      = "example-broker"
        Queue       = each.key
        VirtualHost = "/"
      }
      namespace   = "AWS/AmazonMQ"
      metric_name = "MessageReadyCount"
      period      = 60
      stat        = "Average"
    }
  }

  depends_on = [aws_appautoscaling_policy.scale_in]
}

As for debug logs I'll see if I can get those for you asap

jamie-chapman commented 1 year ago

Hi @justinretzolk unfortunately I can't get my hands on any logs besides the screenshots that we posted above. But reading around the current issues, this one is almost exactly the same behaviour that I'm seeing

https://github.com/hashicorp/terraform-provider-aws/issues/31261

jamie-chapman commented 1 year ago

I suspect that something similar is happening to our configuration. We did have this in place but I don't think that would have fixed the issue at all, and we aren't using aws_appautoscaling_target, we are using aws_appautoscaling_policy in conjunction with aws_cloudwatch_metric_alarm where perhaps this same issue is happening and hasn't been fixed by the update in the issue above.

lifecycle { #TODO: Remove this once THE BUG IS FIXED
    ignore_changes = [step_scaling_policy_configuration]
  } 
jamie-chapman commented 11 months ago

Hi @justinretzolk is there any progress with this bug, I wonder if it has been seen by anyone else? We have removed these resources from out Terraform state for now as the deployment is still not applying the resources correctly as expected.

xpl-m-bocian commented 1 month ago

Hello everyone, Issue seems to still exist with the current AWS Provider version (5.61.0). :c @jamie-chapman , any news perhaps? Cheers!