hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.8k stars 9.15k forks source link

Cycle error for replacement of aws_api_gateway_deployment with lifecycle create_before_destroy set to true and API Gateway resources in depends_on section #11344

Closed martyna-autumn closed 7 months ago

martyna-autumn commented 4 years ago

Community Note

Terraform Version

Terraform v0.12.18
+ provider.aws v2.42.0

Affected Resource(s)

Terraform Configuration Files

I'm not copying all API Gateway resources' configuration as it's pretty standard but happy to share configuration of whole API Gateway if requested

resource "aws_api_gateway_deployment" "deployment" {
  depends_on = [
    aws_api_gateway_rest_api.api,
    aws_api_gateway_resource.api_email_health,
    aws_api_gateway_method.api_email_health_get,
    aws_api_gateway_integration.api_email_health_get_integration,
    aws_api_gateway_method.api_email_health_options,
    aws_api_gateway_integration.api_email_health_options_integration,
    aws_api_gateway_integration_response.api_email_health_options_integration_response,
    aws_api_gateway_method_response.api_email_health_options_response,
    aws_api_gateway_resource.api_email_templates,
    aws_api_gateway_method.api_email_templates_get,
    aws_api_gateway_integration.api_email_templates_get_integration,
    aws_api_gateway_method.api_email_templates_options,
    aws_api_gateway_integration.api_email_templates_options_integration,
    aws_api_gateway_integration_response.api_email_templates_options_integration_response,
    aws_api_gateway_method_response.api_email_templates_options_response,
    aws_api_gateway_resource.api_email_emails,
    aws_api_gateway_method.api_email_emails_post,
    aws_api_gateway_integration.api_email_emails_post_integration,
    aws_api_gateway_method.api_email_emails_options,
    aws_api_gateway_integration.api_email_emails_options_integration,
    aws_api_gateway_integration_response.api_email_emails_options_integration_response,
    aws_api_gateway_method_response.api_email_emails_options_response,
    aws_api_gateway_resource.api_email
  ]

  rest_api_id = aws_api_gateway_rest_api.api.id

  stage_description = "Deployed at ${timestamp()}"

  stage_name = var.aws_spotlight_environment

  lifecycle {
    create_before_destroy = true
  }
}

Expected Behavior

As resource aws_api_gateway_deployment is configured as depends_on all API Gateway resources/methods/integrations/responses, it shouldn't be created before all resources in API Gateway are provisioned so outcome should be (and was this way till recently): old API Gateway resources are destroyed, new are created, new deployment created, old deployment destroyed We force replacement of aws_api_gateway_deployment so current API Gateway state is always deployed to main stage

This was behaviour in Terraform 0.11.x

Actual Behavior

Cycle Error

Error: Cycle: aws_api_gateway_integration.api_email_health_get_integration (destroy), aws_api_gateway_integration.api_email_health_options_integration (destroy), aws_api_gateway_integration_response.api_email_health_options_integration_response (destroy),
aws_api_gateway_method_response.api_email_health_options_response (destroy), aws_api_gateway_method.api_email_health_options (destroy), aws_api_gateway_resource.api_email_health (destroy), aws_api_gateway_deployment.deployment, aws_api_gateway_deployment.deployment (destroy deposed 359e79c1),
aws_api_gateway_method.api_email_health_get (destroy)

Removal off create_before_destroy = true in lifecycle of resource aws_api_gateway_deployment helps but causes it to fail anyway on different error:

Error: error deleting API Gateway Deployment (bdq86u): BadRequestException: Active stages pointing to this deployment must be moved or deleted

If I remove depends_on section instead, I have situations that deployment happens before all API methods are properly configured. Example:

Error: Error creating API Gateway Deployment: BadRequestException: No integration defined for method

I tried adding separate resource for stage aws_api_gateway_stage but problem persists

Steps to Reproduce

  1. Create API Gateway with aws_api_gateway_deployment which depends on API Gateway resources and is recreated with every terraform apply
  2. Run terraform apply
  3. Change one or more API Gateway resources which forces them to be destroyed and recreated (ie change API Gateway resource path)
  4. Run terraform apply
razinlightyear commented 1 year ago

I finally resolved this error (destroy deposed) for 2 aws_api_gateway_resource resources. I commented out resources and ran the plan locally (after downloading tfstate from TF cloud) until the error disappeared. Then I uncommented the remaining resources in the following commit/plan/apply. I did implemented the suggestions by @bflad https://github.com/hashicorp/terraform-provider-aws/issues/11344#issuecomment-765138213 but I was still getting the same error. Hopefully by implementing the suggestions we will see less of these type of issues in the future.

hannes-ucsc commented 1 year ago

I believe the complete fix for this is to upgrade to Terraform 1.3 or later and

1) not have lifecycle.create_before_destroy on aws_api_gateway_deployment 2) not have stage_name or stage_description on aws_api_gateway_deployment (to disable the implicit stage creation) 3) specify a stage explicitly by adding a aws_api_gateway_stage resource 4) set lifecycle.replace_triggered_by to ["aws_api_gateway_deployment.YOUR_DEPLOYMENT.id"] in the aws_api_gateway_stage resource 5) set lifecycle.replace_triggered_by to ["aws_api_gateway_stage.YOUR_STAGE.id"] in the aws_api_gateway_base_path_mapping, aws_api_gateway_method_settings resources, and any other resource that's "downstream" from the stage, such as aws_wafv2_web_acl_association if you have it

Note for Chalice users: As a fix for https://github.com/aws/chalice/issues/1237, Chalice added lifecycle.create_before_destroy to aws_api_gateway_deployment in its generated TF config. I think that was a mistake. You either have to post-process the Chalice-generated TF config or modify Chalice to revert that fix. Also, the Chalice-generated TF config sets stage_description to a hash so as to trigger redeployment on source code changes. To satisfy step 2 above, you need to ensure that stage_description is absent and move its value to triggers.redeployment instead, which is the official mechanism for triggerring redeployment in TF 1.3.

ricoli commented 1 year ago

If like me you've come to this issue because you got a cycle error while having implemented the recommended way of doing things in the docs (summarised here), then here follows how I solved things. I was getting this cycle error when running a terraform plan to remove a resource from the body of the API gateway REST API resource (openAPI definition). I spotted the cause of the cycle fairly easily - a lambda behind a separate API Gateway (let's call it B) referenced the invoke_url of the current stage of the main API Gateway (let's call it A) in its environment variables. The deployment resource of both API Gateways had a lifecycle policy of create_before_destroy, which is a must have to ensure uptime. This caused the cycle, and as such I broke apart the cycle by manually assembling the invoke_url based on the ID of the REST API resource of API Gateway A and the variable that was used as the stage name in API Gateway A. Great stuff, but unfortunately I simply had a new cycle to contend with, though a shorter one and one where only resources for the API Gateway A were present, mentioning some deposed resources. Basically what I had here is that the remote state still had this coupling between the two API Gateway deployments (because of the stage invoke_url reference), whereas locally I didn't have it. To solve this, what I did was to change the lifecycle policy of the aws_api_gateway_deployment resource of API Gateway A in the same PR as the change to remove the resource from it:

  lifecycle {
    create_before_destroy = true
    ignore_changes = [
      triggers
    ]
  }

What the above does is to simply not trigger a deployment while still removing the resource from the API Gateway in remote state. PR merged, terraform apply executed and in the next PR I simply removed the ignore_changes block to go back to normal :tada:

jcano-chwy commented 1 year ago

Somehow ended up with this cycle error through modification of the IAM policy document attached to our REST API policy resource.

00:02:04.010 β”‚ Error: error waiting for API Gateway Stage (*******) to be updated: unexpected state 'NOT_AVAILABLE', wanted target 'AVAILABLE, DELETE_IN_PROGRESS'. last error: %!s(<nil>) 00:02:04.010 β”‚ 00:02:04.010 β”‚ with aws_api_gateway_stage.rest_api, 00:02:04.010 β”‚ on api_main.tf line 352, in resource "aws_api_gateway_stage" "rest_api": 00:02:04.010 β”‚ 352: resource "aws_api_gateway_stage" "rest_api" {

Tried destroying, but ran into deposed API Gateway stages error. Was able to successfully destroy it by commenting out this block from my api_gateway_deployment resource:

lifecycle { create_before_destroy = true }

And then running a full build from scratch again.

hjfitz commented 1 year ago

Still dealing with this years later. I feel like my code is pretty bog standard, too.

terraform code ```hcl resource "aws_api_gateway_rest_api" "api" { name = "rtb" } # ───────────────────────────────────────────────── # Secret reader # ───────────────────────────────────────────────── resource "aws_api_gateway_resource" "resource" { path_part = "" parent_id = aws_api_gateway_rest_api.api.root_resource_id rest_api_id = aws_api_gateway_rest_api.api.id depends_on = [aws_api_gateway_rest_api.api] } resource "aws_api_gateway_method" "read_method" { rest_api_id = aws_api_gateway_rest_api.api.id resource_id = aws_api_gateway_resource.resource.id http_method = "GET" authorization = "NONE" request_parameters = { "method.request.querystring.id" = true } } resource "aws_api_gateway_integration" "read_integration" { rest_api_id = aws_api_gateway_rest_api.api.id resource_id = aws_api_gateway_method.read_method.resource_id http_method = aws_api_gateway_method.read_method.http_method integration_http_method = "POST" type = "AWS_PROXY" uri = aws_lambda_function.read_lambda.invoke_arn request_parameters = { "integration.request.querystring.id" = "method.request.querystring.id" } } resource "aws_api_gateway_deployment" "read" { rest_api_id = aws_api_gateway_rest_api.api.id triggers = { # NOTE: The configuration below will satisfy ordering considerations, # but not pick up all future REST API changes. More advanced patterns # are possible, such as using the filesha1() function against the # Terraform configuration file(s) or removing the .id references to # calculate a hash against whole resources. Be aware that using whole # resources will show a difference after the initial implementation. # It will stabilize to only change when resources change afterwards. redeployment = sha1(jsonencode([ aws_api_gateway_resource.resource.id, aws_api_gateway_method.read_method.id, aws_api_gateway_integration.read_integration.id, ])) } lifecycle { create_before_destroy = false } } resource "aws_api_gateway_stage" "secret_reader" { stage_name = "secret_reader" rest_api_id = aws_api_gateway_rest_api.api.id deployment_id = aws_api_gateway_deployment.read.id } # Lambda resource "aws_lambda_permission" "apigw_read_lambda" { statement_id = "AllowExecutionFromAPIGateway" action = "lambda:InvokeFunction" function_name = aws_lambda_function.read_lambda.function_name principal = "apigateway.amazonaws.com" # More: http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-control-access-using-iam-policies-to-invoke-api.html source_arn = aws_lambda_function.read_lambda.arn } ```

Where the only difference is, I'm using Localstack. I don't think that's hugely important in this though.

code/rtb/iac                                                                                                ⍉
β–Ά tf --version
Terraform v1.5.6
on darwin_arm64
+ provider registry.terraform.io/hashicorp/archive v2.4.0
+ provider registry.terraform.io/hashicorp/aws v4.67.0
+ provider registry.terraform.io/hashicorp/null v3.2.1

Your version of Terraform is out of date! The latest version
is 1.5.7. You can update by downloading from https://www.terraform.io/downloads.html

Edit: I've got a hacky workaround for my issue -

resource "aws_api_gateway_resource" "resource" {
  path_part   = ""
  parent_id   = aws_api_gateway_rest_api.api.root_resource_id
  rest_api_id = aws_api_gateway_rest_api.api.id
  depends_on  = [aws_api_gateway_rest_api.api]
  lifecycle {
    ignore_changes = [ parent_id ]
  }
}
YakDriver commented 7 months ago

As maintainers of the Terraform AWS Provider, we’ve reached a decision to close this longstanding issue. We want to assure you that this decision was made after careful consideration, and we’re committed to transparency in our actions.

Over time, this issue has seen numerous attempts at resolution (#17230, #17099, #17209) and workarounds (such as this, this, this, and, of course, this), but its complexity and longevity present significant challenges. We lack clarity on how many users are still affected and the precise nature of the remaining issues. Given these uncertainties and our limited resources, it’s difficult for us to effectively address the problem in its current state.

However, we value community feedback immensely. If you’re still encountering issues, we encourage you to open a new, focused issue outlining the specific problems you’re facing. We understand the frustration of having to restart the discussion, but the convoluted history of this particular issue necessitates a fresh approach.

While we’ve received reports from community members in the past year, it’s unclear how these relate to the broader context of this issue’s history. Moving forward, a new, well-defined problem statement will greatly increase the likelihood of prompt attention from maintainers or fellow community members.

Ultimately, our goal is to ensure that the Terraform AWS Provider remains a dependable tool for realizing your infrastructure goals. Regrettably, this prolonged issue no longer contributes to that objective. By closing it, we aim to clear the path for more effective problem-solving and a smoother experience for all users. We appreciate your understanding and continued support as we work towards a better future for our provider.

github-actions[bot] commented 6 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.