ECS Service always wants to be recreated due to capacity provider.

spatel96 commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform -v
Terraform v0.13.6
+ provider.aws v3.73.0

Affected Resource(s)

aws_ecs_service

Terraform Configuration Files

Terraform Plan:

  # module.my_service.aws_ecs_service.ecs_service must be replaced
+/- resource "aws_ecs_service" "ecs_service" {
        cluster                            = "arn:aws:ecs:us-west-1:***:cluster/ecs-related-tapir"
        deployment_maximum_percent         = 200
        deployment_minimum_healthy_percent = 100
        desired_count                      = 2
        enable_ecs_managed_tags            = false
        enable_execute_command             = false
        health_check_grace_period_seconds  = 120
      ~ iam_role                           = "aws-service-role" -> (known after apply)
      ~ id                                 = "arn:aws:ecs:us-west-1:***:service/my-cluster/my-service-5e" -> (known after apply)
      ~ launch_type                        = "EC2" -> (known after apply)
        name                               = "my-service-service-5e"
      + platform_version                   = (known after apply)
      - propagate_tags                     = "NONE" -> null
        scheduling_strategy                = "REPLICA"
      - tags                               = {} -> null
      ~ tags_all                           = {} -> (known after apply)
      ~ task_definition                    = "arn:aws:ecs:us-west-1:***:task-definition/my-service-:23" -> "arn:aws:ecs:us-west-1:***:task-definition/my-service:1"
        wait_for_steady_state              = false

      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "ecs-capacity-provider-related-tapir"
          + weight            = 100
        }

        deployment_controller {
            type = "CODE_DEPLOY"
        }

        load_balancer {
            container_name   = "my-service"
            container_port   = 7171
            target_group_arn = "arn:aws:elasticloadbalancing:us-west-1:***:targetgroup/abcdef/abcdef"
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Terraform Apply error:

Error: error creating ECS service (my-service): InvalidParameterException: Creation of service was not idempotent.

Expected Behavior

No infrastructure changes should be made

Actual Behavior

The ECS Service resource will be recreated, but the apply with fail with the error logs specified above.

Steps to Reproduce

Provision an ECS service with a capacity provider
terraform apply

gvwirth commented 2 years ago

FYI we are still seeing this bug in the provider version 4.9.

anGie44 commented 2 years ago

Possibly related to existing issue: https://github.com/hashicorp/terraform-provider-aws/issues/2283 (destroy/create behavior)

*Correction -- as the update was not expected behavior, i'm guessing the capacity_provider_strategy is inherited from the aws_ecs_cluster where it is defined. Do you mind confirming @spatel96 ?

a-nych commented 2 years ago

This issue is very destructive.

When an ECS cluster has a default_capacity_provider_strategy setting defined, Terraform will mark all services that don't have

  lifecycle {
    ignore_changes = [
      capacity_provider_strategy
    ]
  }

to be recreated.

nitrocode commented 2 years ago

It's the only differences I can see when comparing capacity_provider_strategy and deployment_controller are MaxItems and DiffSuppressFunc. I wonder if that is what's causing this recreation... I would have thought that the removing the ForceNew would have also removed recreating capacity_provider_strategy...

https://github.com/hashicorp/terraform-provider-aws/blob/611b4737168f4f0051bb63ef221f0e76f156f392/internal/service/ecs/service.go#L96-L107

https://github.com/hashicorp/terraform-provider-aws/blob/611b4737168f4f0051bb63ef221f0e76f156f392/internal/service/ecs/service.go#L44-L47

anGie44 commented 2 years ago

Hi @nitrocode thanks for looking through the code! My initial thinking was that @spatel96 is using both the aws_ecs_capacity_provider and aws_ecs_service resources so while capacity_provider_strategy is not explicitly configured in the aws_ecs_service terraform configuration, the value is inherited from the separate aws_ecs_capacity_provider resource after an initial terraform apply, so the next apply or plan will show that diff (though this still just my conjecture as the original configuration is not yet known). And then that diff is handled with this portion of the code https://github.com/hashicorp/terraform-provider-aws/blob/a2843eb5d274b2fe3598cf863d228e715dacc343/internal/service/ecs/service.go#L354-L372 which is forcing the new resource. The logic needs to account for cases where the provider strategy is inherited from an outside configuration or simply mark the capacity_provider_strategy as Computed so that the diff is ignored.

relsqui commented 2 years ago

I was seeing this same issue and can confirm that adding a capacity_provider_strategy block in my aws_ecs_service, duplicating my default_capacity_provider_strategy, resolved it.

ericdahl commented 2 years ago

This has been a big annoyance for us. We have many production ECS Services that are using LaunchType: EC2 and we'd like to convert them to using a newly defined default Capacity Provider strategy on the cluster.

If we simply set the capacity provider, it will force the re-create of the ECS Service leading to temporary disruption/downtime. This isn't necessary as AWS supports the graceful transition of LaunchType: EC2 to Capacity Provider (but not the other way around). It does a "force new deployment" of the ECS Tasks, but it uses the standard ECS rollout mechanism (e.g., minHealthy) so there's no disruption.

Our current workaround is to use the ignore_changes as above, plus converting ECS Services to Capacity Provider via separate CLI type automation.

(Also, tangentially related is #26533 - for transitioning existing ECS Services to use the Cluster's default capacity provider strategy)

remil1000 commented 1 year ago

if I may add, empty capacity_provider_strategy list could be useful also it seems this support was added to the AWS cli and API - https://github.com/aws/containers-roadmap/issues/838#issuecomment-1159092125 so that

$ aws ecs update-service --cluster cluster-name --service service-name --capacity-provider-strategy '[]' --force-new-deployment

removes strategy from a ECS service (when inherited from default defined at the ECS cluster level) which is useful if you're planning to remove the default capacity provider strategy from the ECS cluster

It seems that currently if no capacity_provider_strategy is defined in the aws_ecs_service resource the AWS API call will not have any value set and the default strategy will be used

vishwa-trulioo commented 1 year ago

It's sad to see that It's been over 1 year and still not fixed. :-( AWS has to do a better job than this if they want people to keep using ECS and keep it stay alive.

bbratchiv commented 1 year ago

any updates on this? I see the PR is pending

rmccarthy-ellevation commented 1 year ago

Any update on this?

1oglop1 commented 1 year ago

@breathingdust Hi, is this something you can look into? The AWS side has been fixed, and now Terraform incorrectly causes replacement.

claudiosf commented 11 months ago

Issue still exists.

Luis-3M commented 11 months ago

Issue still exists.

Yep we're facing the same problem too

harbinder-kleene commented 10 months ago

When the fix would be released? It is affecting my team too.

ZilvinasKucinskas commented 8 months ago

+1

This is a major issue. We are running many FARGATE instances and would like to increase the capacity further by adding FARGATE SPOT instances. However, it is not possible to do without downtime (it destroys the whole ECS service and recreates it).

dejanzele commented 3 weeks ago

Hi all,

I am interested in submiting a fix for this issue as it is impacting our internal usage also.

Is the community in agreement what are the latest requirements on how the update should work, as in the comments a couple of ideas are mentioned?

hashicorp / terraform-provider-aws