AWS ECS service is replaced when changing adding a capacity provider

josiegd-verrency commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v0.12.31

provider.aws v3.70.0
provider.template v2.2.0

Affected Resource(s)

aws_ecs_service

Expected Behavior

The ECS service should have been updated/modified in place.

Actual Behavior

The ECS service was scheduled for replacement, which would cause an outage.

Plan output:

  # module.service.aws_ecs_service.service must be replaced
-/+ resource "aws_ecs_service" "service" {
        cluster                            = <cluster_arn>
        deployment_maximum_percent         = 200
        deployment_minimum_healthy_percent = 100
      ~ desired_count                      = 1 -> 10
        enable_ecs_managed_tags            = false
        enable_execute_command             = false
        health_check_grace_period_seconds  = 67
      ~ iam_role                           = "aws-service-role" -> (known after apply)
      ~ id                                 = <id> -> (known after apply)
      ~ launch_type                        = "FARGATE" -> (known after apply)
        name                               = "my-service"
      ~ platform_version                   = "LATEST" -> (known after apply)
      - propagate_tags                     = "NONE" -> null
        scheduling_strategy                = "REPLICA"
      - tags                               = {} -> null
      ~ tags_all                           = {} -> (known after apply)
        task_definition                    = <task_definition>
        wait_for_steady_state              = false

      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "FARGATE"
          + weight            = 1
        }
      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "FARGATE_SPOT"
          + weight            = 1
        }
...

Steps to Reproduce

Create a vanilla aws_ecs_service without a capacity provider specified, launch_type = "FARGATE"
terraform apply

Change aws_ecs_service to specify a capacity_provider_strategy as follows:

  capacity_provider_strategy {
      capacity_provider = "FARGATE"
      weight            = 1
    }
  capacity_provider_strategy {
      capacity_provider = "FARGATE_SPOT"
      weight            = 1
    }

terraform apply

Important Factoids

References

AWS docs: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_UpdateService.html

11351

mkielar commented 2 years ago

Observing the same thing. with:

Terraform v1.1.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.74.0

I can successfully switch from launchType to capacityProviders using AWS CLI:

aws ecs update-service --cluster dev --service hello-world \
    --capacity-provider-strategy=capacityProvider=FARGATE,base=0,weight=100 \
    --force-new-deployment

but if I try doing this with terraform, I get service recreation warning.

We were just about to introduce FARGATE_SPOT in our services, and now we need to push these plans back a bit, because we cannot possible downtime like this on our production environments.

mkielar commented 2 years ago

Looks like the bug is in this line: https://github.com/hashicorp/terraform-provider-aws/blob/41fe8a96949b8096e01ab6edbbb0026092e5bc6d/internal/service/ecs/service.go#L401

The (ol == 0 && nl > 0) part of the if statement condition is unnecessary - AWS is capable of applying change from launchType to capacityProviders. The only additional requirement is to stop passing explicit launchType = "FARGATE" from terraform.

Also, the whole thing requires force_new_deployment flag to be set to true, otherwise terraform will recreate the service on any change to capacity providers: https://github.com/hashicorp/terraform-provider-aws/blob/41fe8a96949b8096e01ab6edbbb0026092e5bc6d/internal/service/ecs/service.go#L389-L394 I'm not entirely sure what kind of "backward compatibility" this commend mentions, but if that's something that terraform offered, and now would not, I think that the previous behaviour was a bug, and persisting it like this is not the greatest idea. In general, you don't want your ECS Service to get recreated when changes are applied, because that causes downtime.

djakielski commented 1 year ago

I see this behavior also in one of the latest versions. Is it possible to fix it in the provider?

$: terraform version
Terraform v1.5.4
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.13.1
+ provider registry.terraform.io/hashicorp/http v3.4.0
+ provider registry.terraform.io/hashicorp/random v3.5.1
+ provider registry.terraform.io/hashicorp/tls v4.0.4

ash5gfrankie commented 10 months ago

I'm wondering if it's worth adding a new option, something like the below. Pretty much the same as the code block for force_new_deployment but limited to just allowing capacity provider updates without destroying the service if there are no other destructive changes

if v := d.Get("non_destructive_capacity_provider_update").(bool); !v { return capacityProviderStrategyForceNew(d) }

dejanzele commented 2 weeks ago

Hi all,

I am interested in submiting a fix for this issue as it is impacting our internal usage also.

Is the community in agreement what are the latest requirements on how the update should work, as in the comments a couple of ideas are mentioned?

hashicorp / terraform-provider-aws