ECS Autoscaling policies are not recreated when the ECS service is recreated

ophintor commented 5 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

$ terraform version
Terraform v0.12.4
+ provider.archive v1.2.2
+ provider.aws v2.19.0
+ provider.random v2.1.2
+ provider.template v2.1.2
+ provider.tls v2.0.1

Terraform Configuration Files


resource "aws_ecs_service" "sf" {
  name                               = var.service_finder_ecr_name
  cluster                            = aws_ecs_cluster.service-finder.id
  task_definition                    = aws_ecs_task_definition.sf.arn
  desired_count                      = var.desired_count
  launch_type                        = "FARGATE"
  deployment_maximum_percent         = 100
  deployment_minimum_healthy_percent = 50

  network_configuration {
    subnets         = var.private_subnets
    security_groups = var.task_secgroups
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.ecs_alb_target_group.arn
    container_name   = upper(var.component)
    container_port   = 8443
  }

  depends_on = [aws_lb_listener.ecs_alb_listener_https]
}

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = var.desired_count * 10
  min_capacity       = var.desired_count
  resource_id        = local.resource_id
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  depends_on = [aws_ecs_service.sf]
}

resource "aws_appautoscaling_policy" "ecs_policy_up_cpu" {
  name               = "cpu-scale-up"
  policy_type        = "StepScaling"
  resource_id        = local.resource_id
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown                = 60
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_lower_bound = 0
      scaling_adjustment          = 1
    }
  }

  depends_on = [aws_appautoscaling_target.ecs_target]
}

resource "aws_appautoscaling_policy" "ecs_policy_down_cpu" {
  name               = "cpu-scale-down"
  policy_type        = "StepScaling"
  resource_id        = local.resource_id
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  step_scaling_policy_configuration {
    adjustment_type         = "ChangeInCapacity"
    cooldown                = 300
    metric_aggregation_type = "Average"

    step_adjustment {
      metric_interval_upper_bound = 0
      scaling_adjustment          = -1
    }
  }

  depends_on = [aws_appautoscaling_target.ecs_target]
}

Expected Behavior

When there is a change that forces the recreation of the ECS service, the autoscaling policies that are attached to it should be recreated too.

Actual Behavior

I have an ECS service with an autoscaling target and a couple of policies. If I make a change that forces the recreation of the service/target then the autoscaling policy is still in the state file but it doesnt get recreated after recreating the ECS service. When I look in AWS, the service is there but the autoscaling policies and the target are just gone.

When I run terraform a second time, then they get recreated.

Steps to Reproduce

terraform apply (To create all the resources for the first time)
Make a change that forces the ECS service recreation (ie. LB port change)
Run terraform apply again: ECS service is recreated but not the policies. When I look in AWS the policies are not there. I check the state file and the autoscaling policies are there (!!!)
Run terraform apply one more time: this time the policies get created (AWS seems to have deleted them when the ECS service was recreated).

References

Old, unsolved related issue (there is a workaround that seem to work with 0.11 but it doesnt work in 0.12): https://github.com/terraform-providers/terraform-provider-aws/issues/240

raskad commented 4 years ago

I just ran into this problem. You can fix it by referencencing the id of the aws_ecs_service in the resource_id fields of aws_appautoscaling_policy and aws_appautoscaling_target.

The following example should fix the aws_appautoscaling_policy in your example:

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = var.desired_count * 10
  min_capacity       = var.desired_count
  resource_id        = "service/${aws_ecs_cluster.service-finder.name}/${split("/", aws_ecs_service.sf.id)[2]}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

AdamJCavanaugh commented 4 years ago

@raskad I tried your change on our customized module and am getting an error. I tried a few different methods of splitting and assigning the split as a variable without much luck. Can you please give guidance? My basic code looks very close to OP's, but I can redact and provide more as needed.

  on modules/ecs/main.tf line 77, in resource "aws_appautoscaling_target" "target":
  77:   resource_id        = "service/${var.cluster_name}/${split("/", aws_ecs_service.service.id)[2]}"
    |----------------
    | aws_ecs_service.service.id is "arn:aws:ecs:us-east-1:<id>:service/<service_name>"

The given key does not identify an element in this collection value.

 73 resource "aws_appautoscaling_target" "target" {
 74   max_capacity = var.max_capacity
 75   min_capacity = var.min_capacity
 76   #resource_id        = "service/${var.cluster_name}/${var.service_name}"
 77   resource_id        = "service/${var.cluster_name}/${split("/", aws_ecs_service.service.id)[2]}"
 78   scalable_dimension = "ecs:service:DesiredCount"
 79   service_namespace  = "ecs"

eikebartels commented 4 years ago

I also have the problem that the Scalable Target do not appear in the console. Im using Terraform v0.11.15-oci

kumadee commented 3 years ago

Hi team, any updates when this issue will be fixed?

For now I am using the below workaround.

resource "aws_appautoscaling_target" "this" {
  # Workaround due to https://github.com/hashicorp/terraform-provider-aws/issues/9473
  resource_id  = reverse(split(":", aws_ecs_service.this.id))[0]
  max_capacity = var.task_max_count
  min_capacity = var.task_desired_count

5n00p4eg commented 2 years ago

Same thing, any mentioned workaround not working for me.

5n00p4eg commented 2 years ago

The way to debug for me was: aws application-autoscaling describe-scaling-policies --service-namespace ecs | grep "ResourceId"

After that, I understood all the logic behind workarounds.

variable "cluster-id" {
  type = string
}

data "aws_ecs_cluster" "main" {
  cluster_name = var.cluster-id
}

locals {
  cluster_name = split("/", reverse(split(":", data.aws_ecs_cluster.main.cluster_name))[0])[1]
}

resource_id        = "service/${local.cluster_name}/${aws_ecs_service.api.name}"

And then you have few ways to pass cluster name:

  cluster-id = aws_ecs_cluster.cluster.id // ARN, need to be parser for the issue

OR

  cluster-id = aws_ecs_cluster.cluster.name // simple name, can be passed as is

sabbari commented 9 months ago

I am still experiencing the same issue. The workaround of using the ID works, but it would be nice if this could be fixed at the provider level.

nathanhruby commented 3 months ago

The replace_triggered_by lifecycle options also solves this problem

justinretzolk commented 2 months ago

As mentioned above, with the introduction of replace_triggered_by, this should be resolved. WIth that in mind, I'm going to close this issue.

github-actions[bot] commented 2 months ago

[!WARNING] This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions[bot] commented 1 month ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform-provider-aws