aws_ecs_service InvalidParameterException: Creation of service was not idempotent

hashibot commented 6 years ago

This issue was originally opened by @simoneroselli as hashicorp/terraform#16635. It was migrated here as a result of the provider split. The original body of the issue is below.

Terraform Version

v0.10.8

Hi,

terraform is failing on modify the "placement strategy" for ECS/Service resources. Since such value can only be set at Service creation time, the expected beahviour would be "destroy 1, add 1", like the terraform plan correctly reports. However on terraform apply it fails.

Fail Output

Error: Error applying plan:

1 error(s) occurred:

module..aws_ecs_service.: 1 error(s) occurred:
aws_ecs_service.main: InvalidParameterException: Creation of service was not idempotent. status code: 400, request id: xxxxxxxxxxxxxxxx "..."

Expected Behavior

destroy service, add service.

Actual Behavior

Failure of terraform without modification.

Steps to Reproduce

Define a ECS/service with a placement strategy and apply
Change the placement strategy values to something else
terraform plan Plan: 1 to add, 0 to change, 1 to destroy.
terraform apply InvalidParameterException: Creation of service was not idempotent

sarjuymd commented 6 years ago

Getting this on TF v0.9.11 without placement strategy configured.

sroze commented 6 years ago

I can confirm, I have the same issue. The workaround is to remove the service manually.

ethicalmohit commented 6 years ago

Same issue.

pb0101 commented 6 years ago

Same issue

davidminor commented 6 years ago

Another workaround I found is to rename the service at the same time that the placement strategy is modified.

peteroruba commented 6 years ago

Same issue here

ckaatz-here commented 6 years ago

Same again

amboowang commented 6 years ago

Same issue also, any conclusion?

mrf commented 6 years ago

Note that in the most recent provider versions, this has been changed to ordered_placement_strategy, it would be good to confirm if this bug still persists following that change.

oanasabau commented 6 years ago

Hello, just changed a service to use ordered_placement_strategy instead of placement strategy and the terraform apply fails.


-/+ module.xxxx.aws_ecs_service.api-gateway (new resource required)
      id:                                        "arn:aws:ecs:us-west-2:XXXX:service/api-gateway" => <computed> (forces new resource)
      cluster:                                   "cluster" => "cluster"
      deployment_maximum_percent:                "200" => "200"
      deployment_minimum_healthy_percent:        "100" => "100"
      desired_count:                             "2" => "2"
      health_check_grace_period_seconds:         "180" => "180"
      iam_role:                                  "arn:aws:iam::xxxx:role/ecs_service_role" => "arn:aws:iam::xxxx:role/ecs_service_role"
      launch_type:                               "EC2" => "EC2"
      load_balancer.#:                           "1" => "1"
      load_balancer.3428707558.container_name:   "api-gateway" => "api-gateway"
      load_balancer.3428707558.container_port:   "8080" => "8080"
      load_balancer.3428707558.elb_name:         "" => ""
      load_balancer.3428707558.target_group_arn: "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/API/4440036037fbdee4" => "arn:aws:elasticloadbalancing:us-west-2:xxxx:targetgroup/API/4440036037fbdee4"
      name:                                      "api-gateway" => "api-gateway"
      ordered_placement_strategy.#:              "" => "1" (forces new resource)
      ordered_placement_strategy.0.field:        "" => "instanceId" (forces new resource)
      ordered_placement_strategy.0.type:         "" => "spread" (forces new resource)
      placement_strategy.#:                      "1" => "0" (forces new resource)
      placement_strategy.2750134989.field:       "instanceId" => "" (forces new resource)
      placement_strategy.2750134989.type:        "spread" => "" (forces new resource)
      task_definition:                           "arn:aws:ecs:us-west-2:xxxx:task-definition/api-gateway:58" => "${aws_ecs_task_definition.api-gateway_definition.arn}"

Result for terraform apply:

Error: Error applying plan:

1 error(s) occurred:

* module.xxxx.aws_ecs_service.api-gateway: 1 error(s) occurred:

* aws_ecs_service.api-gateway: InvalidParameterException: Creation of service was not idempotent.
    status code: 400, request id: 3524862f-6e38-11e8-87e1-b1ef6c6b4c93 "api-gateway"

Provider version: Downloading plugin for provider "aws" (1.22.0)

mipasco commented 6 years ago

same.

mr-bo-jangles commented 6 years ago

Getting this on the docker image hashicorp/terraform:light as well @mrf It started happening for me when I switched from placement_strategy to ordered_placement_strategy

mr-bo-jangles commented 6 years ago

module.website.aws_ecs_service.webserver: Creating...
  cluster:                                   "" => "arn:aws:ecs:eu-west-2::cluster/ecs-cluster-prod"
  deployment_maximum_percent:                "" => "200"
  deployment_minimum_healthy_percent:        "" => "34"
  desired_count:                             "" => "2"
  iam_role:                                  "" => "arn:aws:iam:::role/ecs_iam_role_prod"
  launch_type:                               "" => "EC2"
  load_balancer.#:                           "" => "1"
  load_balancer.4258226585.container_name:   "" => "app-server"
  load_balancer.4258226585.container_port:   "" => "80"
  load_balancer.4258226585.elb_name:         "" => ""
  load_balancer.4258226585.target_group_arn: "" => "arn:aws:elasticloadbalancing:eu-west-2::targetgroup/prod-ecs-cluster-prod-web/3cb792881eee3a61"
  name:                                      "" => "webserver"
  ordered_placement_strategy.#:              "" => "1"
  ordered_placement_strategy.0.field:        "" => "memory"
  ordered_placement_strategy.0.type:         "" => "binpack"
  task_definition:                           "" => "arn:aws:ecs:eu-west-2::task-definition/webserver:375"

From what I can see, it's deciding that the existing service doesn't exist and tries to create it which Amazon isn't allowing. It should be modifying the existing service for sure

oanasabau commented 6 years ago

As it turns out in our case the problem was caused by create_before_destroy flag from the resource's lifecycle policy. After removing that the terraform apply succeeded.

vchan2002 commented 6 years ago

I think the same issue applies if you have the load_balancer, and likely, the target_group_arn, to the ECS service as well, as those settings can only be applied when creating the service.

In my use case (just the load_balancer block, no ordered_placement_strategy block), the service gets provisioned properly, but its state never gets recorded, not even partially. So, in subsequent TF runs, it said that it wants to add a brand new ECS service, but it would error out with the same "Creation of service was not idempotent" message.

bm1549 commented 6 years ago

same issue

bploetz commented 6 years ago

Ran into the same issue, removing

lifecycle {
  create_before_destroy = true
}

From the ecs resource as @oanasabau noted worked for us too.

zioalex commented 5 years ago

The best way I found is to change the name ( +1 @davidminor ) leaving however the lifecycle create_before_destroy = true

In this way a new service is created without interrupting the service and the old one is deposed only when the new one is active.

bkate commented 5 years ago

I have same issue but got solved by rerunning terraform apply on the same resource. I think terraform cannot destroy and create the service at same time so it need a two steps resource apply or just remove the lifecycle would also solve the issue.

danielrive commented 4 years ago

@bkate i had the same issue, terraform deleted the service but after launched this error: "InvalidParameterException: Creation of service was not idempotent."

i rerunning the apply and that works fine

jperez3 commented 4 years ago

I'm running into the same problem, but I'm not getting a description of the error like others are, I've also tried enabling debug via export TF_LOG=DEBUG. Here is what my error looks like:

Error: Error applying plan:
1 error occurred:
        * aws_ecs_service.app: 1 error occurred:
        * aws_ecs_service.app: InvalidParameterException:  "testecsservice"

Edit: found the solution to my problem, the name in my task definition did not match the container_name variable in the aws_ecs_service resource's load balancer definition

dekimsey commented 4 years ago

I ran into this today and I'm wondering if it could be side-stepped with support for name_prefix. I want the new service to be created (and running okay) before removing the old one as it's still attached to the old load-balancer target group. If I allow Terraform to destroy the service resource before creating the new one, I'll have an outage. At least, that's my understanding of the situation.

        load_balancer {
            container_name   = "haproxy"
            container_port   = 80
            target_group_arn = "arn:aws:elasticloadbalancing:us-east-2:560758033722:targetgroup/pafnt20200108220931040600000001/87cb1ba8f5d4f6bc"
        }
      + load_balancer { # forces replacement
          + container_name   = "haproxy"
          + container_port   = 81
          + target_group_arn = "arn:aws:elasticloadbalancing:us-east-2:560758033722:targetgroup/stg-ui-api/ff397f52073bb72c"
        }
      + load_balancer { # forces replacement
          + container_name   = "haproxy"
          + container_port   = 82
          + target_group_arn = "arn:aws:elasticloadbalancing:us-east-2:560758033722:targetgroup/stg-ui-partner/f0ef791d2187d3a9"
        }

terraform -v
Terraform v0.12.20
+ provider.akamai v0.1.4
+ provider.aws v2.44.0
+ provider.azuread v0.6.0
+ provider.null v2.1.2
+ provider.random v2.2.1
+ provider.template v2.1.2
+ provider.tls v2.1.1

relgames commented 4 years ago

Ran into it today. Indeed, if create_before_destroy is used, we need name_prefix instead of name.

Has anyone found a workaround of generating a new name? Tried to use random_id but not sure what to use for "keepers" section.

marcohutzsch1234 commented 4 years ago

due to this issue and missing name_prefix I tried:

name = "myservice_${replace(timestamp(), ":", "-")}"

and create_before_destroy in the service. This helps particulary, but is does not have the effect of beeing zero downtime deployment. The service is now beeing recreated every terraform run.

melvinkcx commented 4 years ago

Having the same issue here. Any good solution from the crowd so far? I think the support for zero-downtime deployment is essentially warranted

barttemmerman-tomtom commented 4 years ago

I had this issue when I renamed a directory a terraform module (of a fargate service that was already deployed) was located in and I tried to redeploy with that new directory name. After naming it back to the previous name, destroying the service, renaming the directory again to the new name and deploying I no longer had the issue.

tobypinder commented 4 years ago

Adding to the chorus suggesting that (with terraform 0.13.4, aws provider 3.11.0) this issue persists when replacing a service in place.

The first tf apply will complete, the old resource will (eventually) be deposed and the creation will fail

module.foo.aws_ecs_service.service[0]: Destruction complete after 6m18s

Error: InvalidParameterException: Creation of service was not idempotent. "foo"

The second attempt at applying this will result in a traditional success

As such it's my belief there is some kind of a race condition or internal state issue that is not allowing the subsequent creation of a service after the API returns from the deletion. Is there a means of injecting a manual "sleep" here? We would benefit from resolving this so our CI/CD pipeline can manage this transition rather than requiring "babysitting" through two deploys, with increased downtime.

Zogoo commented 3 years ago

terraform 0.12.26 and aws provider 3.18 also happening same issue. Does anybody know which aws provider stable with Terraform 12.

Update: When I check plan output it shows that # forces replacement for capacity_provider_strategy, maybe this causing the issue.

# module.ecs_cluster.aws_ecs_service.service must be replaced
+/- resource "aws_ecs_service" "service_name" {
      - health_check_grace_period_seconds  = 0 -> null
      ~ iam_role                           = "aws-service-role" -> (known after apply)
      ~ id                                 = "****" -> (known after apply)
      + launch_type                        = (known after apply)
        name                               = "***"
      + platform_version                   = (known after apply)
      - propagate_tags                     = "NONE" -> null
        scheduling_strategy                = "REPLICA"
      - tags                               = {} -> null
      ~ task_definition                    = "***:6" -> (known after apply)
        wait_for_steady_state              = false
      - capacity_provider_strategy { # forces replacement
          - base              = 0 -> null
          - capacity_provider = "name_of_capacity_provider" -> null
          - weight            = 1 -> null
        }
      - deployment_controller {
          - type = "ECS" -> null
        }
     ....
}

Update 2:

Workaround: If someone has same issue with my case. You can fix it like follow

  lifecycle {
    ignore_changes = [
      capacity_provider_strategy
    ]
  }

Like mentioned in here: https://github.com/hashicorp/terraform-provider-aws/issues/11351

I suspect that when default_capacity_provider set in aws_ecs_cluster it's automatically replicated in all aws_ecs_services by Terraform. But AWS has a different validation for this.

Update: 3

Workaround: If you want to use it with "create_before_destroy" you need to change ECS service's name every time you need to update it.

resource "aws_ecs_service" "service_name" {
      name="service_deploy_num_1"
}

Still this will lead about 20 seconds down time for your service if you running ECS with load balancer.

seifolah-ghaderi commented 3 years ago

Ran into the same issue, removing
lifecycle {
  create_before_destroy = true
}
From the ecs resource as @oanasabau noted worked for us too.

worked for me

Nyamador commented 3 years ago

Deleting the service and running terraform apply fixed it for me.

nick-baggott commented 2 years ago

I just hit this issue. My problem was that the service already existed. I don't think anything could have created the service out-of-band. The service name includes the workspace name.

My best guess is that terraform somehow dropped the service from it's state without deleting the underlying resource.

I was able to recover by importing the service and doing a fresh apply.

c0mput3rj0n3s commented 2 years ago

We ran into this today. We weren't setting any lifecycle rules on the aws_ecs_service resource, but repeatedly applies would fail due to what looks like a race condition between destruction of the aws_ecs_service and the re-creation of it.

We attempted all of the solutions listed above:

deleted the service manually and re-ran: same problem
re-imported the service into terraform state and re-ran: same problem

I randomly got it working by just... trying again. What's interesting is the same plan/apply was run without issue in our lower environments: identical configuration, identical changeset, but those worked flawlessly.

I suspect it was pure luck that it worked fine twice before, and again when it worked after just trying again.

Terraform version: 1.1.4 AWS Provider version: 4.8.0

JoseAlban commented 2 years ago

We ran into issue today as well:

    * failed creating ECS service (x-svc): InvalidParameterException: Creation of service was not idempotent.

In our case, we have another layer (pulumi), but quite sure this is due to the terraform provider. no "create_before_destroy" set explicitly by us, and we do have an ALB + target group pointing at service:

        loadBalancers: [
          {
            targetGroupArn: targetGroup.arn,
            containerName: containerName,
            containerPort: containerPort,
          },
        ],

EDIT: the solution here was to remove the service manually on AWS side also. I suspect root cause is that terraform was defensive and bailed when service already existed. Maybe a better error message eg "Service already exists - please remove or import properly" would communicate the problem/solution straight away and could be the fix for this issue.

hashicorp / terraform-provider-aws