Open spatel96 opened 2 years ago
FYI we are still seeing this bug in the provider version 4.9.
Possibly related to existing issue: https://github.com/hashicorp/terraform-provider-aws/issues/2283 (destroy/create behavior)
*Correction -- as the update was not expected behavior, i'm guessing the capacity_provider_strategy is inherited from the aws_ecs_cluster
where it is defined. Do you mind confirming @spatel96 ?
This issue is very destructive.
When an ECS cluster has a default_capacity_provider_strategy
setting defined, Terraform will mark all services that don't have
lifecycle {
ignore_changes = [
capacity_provider_strategy
]
}
to be recreated.
It's the only differences I can see when comparing capacity_provider_strategy
and deployment_controller
are MaxItems
and DiffSuppressFunc
. I wonder if that is what's causing this recreation... I would have thought that the removing the ForceNew
would have also removed recreating capacity_provider_strategy
...
Hi @nitrocode thanks for looking through the code! My initial thinking was that @spatel96 is using both the aws_ecs_capacity_provider
and aws_ecs_service
resources so while capacity_provider_strategy
is not explicitly configured in the aws_ecs_service
terraform configuration, the value is inherited from the separate aws_ecs_capacity_provider
resource after an initial terraform apply, so the next apply or plan will show that diff (though this still just my conjecture as the original configuration is not yet known). And then that diff is handled with this portion of the code
https://github.com/hashicorp/terraform-provider-aws/blob/a2843eb5d274b2fe3598cf863d228e715dacc343/internal/service/ecs/service.go#L354-L372 which is forcing the new resource. The logic needs to account for cases where the provider strategy is inherited from an outside configuration or simply mark the capacity_provider_strategy
as Computed
so that the diff is ignored.
I was seeing this same issue and can confirm that adding a capacity_provider_strategy
block in my aws_ecs_service
, duplicating my default_capacity_provider_strategy
, resolved it.
This has been a big annoyance for us. We have many production ECS Services that are using LaunchType: EC2
and we'd like to convert them to using a newly defined default Capacity Provider strategy on the cluster.
If we simply set the capacity provider, it will force the re-create of the ECS Service leading to temporary disruption/downtime. This isn't necessary as AWS supports the graceful transition of LaunchType: EC2
to Capacity Provider (but not the other way around). It does a "force new deployment" of the ECS Tasks, but it uses the standard ECS rollout mechanism (e.g., minHealthy) so there's no disruption.
Our current workaround is to use the ignore_changes
as above, plus converting ECS Services to Capacity Provider via separate CLI type automation.
(Also, tangentially related is #26533 - for transitioning existing ECS Services to use the Cluster's default capacity provider strategy)
if I may add, empty capacity_provider_strategy
list could be useful also
it seems this support was added to the AWS cli and API - https://github.com/aws/containers-roadmap/issues/838#issuecomment-1159092125 so that
$ aws ecs update-service --cluster cluster-name --service service-name --capacity-provider-strategy '[]' --force-new-deployment
removes strategy from a ECS service (when inherited from default defined at the ECS cluster level) which is useful if you're planning to remove the default capacity provider strategy from the ECS cluster
It seems that currently if no capacity_provider_strategy
is defined in the aws_ecs_service
resource the AWS API call will not have any value set and the default strategy will be used
It's sad to see that It's been over 1 year and still not fixed. :-( AWS has to do a better job than this if they want people to keep using ECS and keep it stay alive.
any updates on this? I see the PR is pending
Any update on this?
@breathingdust Hi, is this something you can look into? The AWS side has been fixed, and now Terraform incorrectly causes replacement.
Issue still exists.
Issue still exists.
Yep we're facing the same problem too
When the fix would be released? It is affecting my team too.
+1
This is a major issue. We are running many FARGATE instances and would like to increase the capacity further by adding FARGATE SPOT instances. However, it is not possible to do without downtime (it destroys the whole ECS service and recreates it).
Hi all,
I am interested in submiting a fix for this issue as it is impacting our internal usage also.
Is the community in agreement what are the latest requirements on how the update should work, as in the comments a couple of ideas are mentioned?
Community Note
Terraform CLI and Terraform AWS Provider Version
Affected Resource(s)
Terraform Configuration Files
Terraform Plan:
Terraform Apply error:
Expected Behavior
No infrastructure changes should be made
Actual Behavior
The ECS Service resource will be recreated, but the apply with fail with the error logs specified above.
Steps to Reproduce
terraform apply