hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.65k stars 9.03k forks source link

Cycle error for replacement of aws_api_gateway_deployment with lifecycle create_before_destroy set to true and API Gateway resources in depends_on section #11344

Closed martyna-autumn closed 4 months ago

martyna-autumn commented 4 years ago

Community Note

Terraform Version

Terraform v0.12.18
+ provider.aws v2.42.0

Affected Resource(s)

Terraform Configuration Files

I'm not copying all API Gateway resources' configuration as it's pretty standard but happy to share configuration of whole API Gateway if requested

resource "aws_api_gateway_deployment" "deployment" {
  depends_on = [
    aws_api_gateway_rest_api.api,
    aws_api_gateway_resource.api_email_health,
    aws_api_gateway_method.api_email_health_get,
    aws_api_gateway_integration.api_email_health_get_integration,
    aws_api_gateway_method.api_email_health_options,
    aws_api_gateway_integration.api_email_health_options_integration,
    aws_api_gateway_integration_response.api_email_health_options_integration_response,
    aws_api_gateway_method_response.api_email_health_options_response,
    aws_api_gateway_resource.api_email_templates,
    aws_api_gateway_method.api_email_templates_get,
    aws_api_gateway_integration.api_email_templates_get_integration,
    aws_api_gateway_method.api_email_templates_options,
    aws_api_gateway_integration.api_email_templates_options_integration,
    aws_api_gateway_integration_response.api_email_templates_options_integration_response,
    aws_api_gateway_method_response.api_email_templates_options_response,
    aws_api_gateway_resource.api_email_emails,
    aws_api_gateway_method.api_email_emails_post,
    aws_api_gateway_integration.api_email_emails_post_integration,
    aws_api_gateway_method.api_email_emails_options,
    aws_api_gateway_integration.api_email_emails_options_integration,
    aws_api_gateway_integration_response.api_email_emails_options_integration_response,
    aws_api_gateway_method_response.api_email_emails_options_response,
    aws_api_gateway_resource.api_email
  ]

  rest_api_id = aws_api_gateway_rest_api.api.id

  stage_description = "Deployed at ${timestamp()}"

  stage_name = var.aws_spotlight_environment

  lifecycle {
    create_before_destroy = true
  }
}

Expected Behavior

As resource aws_api_gateway_deployment is configured as depends_on all API Gateway resources/methods/integrations/responses, it shouldn't be created before all resources in API Gateway are provisioned so outcome should be (and was this way till recently): old API Gateway resources are destroyed, new are created, new deployment created, old deployment destroyed We force replacement of aws_api_gateway_deployment so current API Gateway state is always deployed to main stage

This was behaviour in Terraform 0.11.x

Actual Behavior

Cycle Error

Error: Cycle: aws_api_gateway_integration.api_email_health_get_integration (destroy), aws_api_gateway_integration.api_email_health_options_integration (destroy), aws_api_gateway_integration_response.api_email_health_options_integration_response (destroy),
aws_api_gateway_method_response.api_email_health_options_response (destroy), aws_api_gateway_method.api_email_health_options (destroy), aws_api_gateway_resource.api_email_health (destroy), aws_api_gateway_deployment.deployment, aws_api_gateway_deployment.deployment (destroy deposed 359e79c1),
aws_api_gateway_method.api_email_health_get (destroy)

Removal off create_before_destroy = true in lifecycle of resource aws_api_gateway_deployment helps but causes it to fail anyway on different error:

Error: error deleting API Gateway Deployment (bdq86u): BadRequestException: Active stages pointing to this deployment must be moved or deleted

If I remove depends_on section instead, I have situations that deployment happens before all API methods are properly configured. Example:

Error: Error creating API Gateway Deployment: BadRequestException: No integration defined for method

I tried adding separate resource for stage aws_api_gateway_stage but problem persists

Steps to Reproduce

  1. Create API Gateway with aws_api_gateway_deployment which depends on API Gateway resources and is recreated with every terraform apply
  2. Run terraform apply
  3. Change one or more API Gateway resources which forces them to be destroyed and recreated (ie change API Gateway resource path)
  4. Run terraform apply
sparvia commented 4 years ago

We also ran into this problem, and solved it by removing create_before_destroy from the deployment, and manually running terraform taint on the stage resource to force it to be recreated, which got rid of the other error you mention.

Glen-Moonpig commented 4 years ago

We also ran into this problem, and solved it by removing create_before_destroy from the deployment, and manually running terraform taint on the stage resource to force it to be recreated, which got rid of the other error you mention.

If you taint the resource, does that mean that the deployment will be destroyed before a new one created and so the API be unavailable for the period of time in between destroy and create?

martyna-autumn commented 4 years ago

We also ran into this problem, and solved it by removing create_before_destroy from the deployment, and manually running terraform taint on the stage resource to force it to be recreated, which got rid of the other error you mention.

Isn't it manual wrangling to solve problem? We use CD software to deploy our TF code so we would prefer avoid such workarounds. Plus our stage is active as its attached to Custom Domain Name so we can't have it destroyed or have not existing deployment.

Currently we use null resource with some sleep command and deployment resource explicitly set to depends on that null resource as form of workaround. Deployment resource itself isn't set to depend on any API Gateway resources but delay gives time to all of required resources (methods, integrations and so on) to be provisioned before deployment is created (example below uses PowerShell as language for command because that's what we use in our company mostly)

resource "null_resource" "wait_for_all_resources" {
  triggers = {
    timestamp = timestamp()
  }
  provisioner "local-exec" {
    command     = "Start-Sleep -Seconds 60"
    interpreter = ["PowerShell", "-Command"]
  }
}
edbighead commented 4 years ago

Having same issue with

Terraform v0.12.19
+ provider.aws v2.45.0
lordz-md commented 4 years ago

Having same issue with

Terraform v0.12.19
+ provider.aws v2.45.0

Same with Terraform v0.12.19

lordz-md commented 4 years ago

Does anyone know if there is any work on this issue?

katherinel commented 4 years ago

Same issue here, with terraform 0.11

kromol commented 4 years ago

I am experiencing the same issue with terraform 0.12 and with new triggers argument. Removing create_before_destroy solves the problem, even though it's not ideal solution.

nakamasato commented 4 years ago

I also encountered the same issue. I tried two possible compromise solutions.

  1. Wait for a while until all the dependent resources are created

    I tried the following solution and I could change method and resource at least. The drawback is that this will trigger deployment every time you apply even if you don't have any change in the dependent resources.

    resource "aws_api_gateway_deployment" "deployment" {
    - depends_on = [
    -   module.method.lambda-integration
    - ]
    
      rest_api_id = aws_api_gateway_rest_api.api.id
    
      triggers = {
    -   redeployment = sha1(join(",", list(
    -     jsonencode(module.method.lambda-integration), # I was using lambda integration as a trigger of deployment.
    -   )))
    +   redeployment = timestamp()
      }
    
      provisioner "local-exec" {
        command = "sleep 30"
      }
    
      lifecycle {
        create_before_destroy = true
      }
    }
  2. Pass variable for trigger In this way, we can control when to recreate deployment, but you need to separate the resource update and deployment trigger. If you put them in one apply, creating and destroying deployment will start before completing to update the dependent resources.

    resource "aws_api_gateway_deployment" "deployment" {
      rest_api_id = aws_api_gateway_rest_api.api.id
    
      triggers = {
        redeployment = var.release-date
      }
    
      lifecycle {
        create_before_destroy = true
      }
    }
hdryx commented 4 years ago

So the bug is still there ? There is no fix ? We have to do workarounds ?

I'm using 0.12.26 and having the same issue.

vladcar commented 4 years ago

I also had this issue, the following solution worked well for me. I'm using random_uuid resource to produce a value that is passed to triggers block in aws_api_gateway_deployment resource. The random_uuid is re-generated when keepers values change, which can be set to anything e.g jsonencode(aws_api_gateway_method.method) and jsonencode(aws_api_gateway_integration.integration). It is important to make sure that aws_api_gateway_deployment is created after everything, I achieved it by extracting it into a module and using mandatory variable.

variable "required_resources" {
  type        = list(string)
  description = "Change in these values trigger redeployment"
}

resource "aws_api_gateway_deployment" "deployment" {
  rest_api_id = var.rest_api_id
  stage_name  = var.stage

  # hack to force redeployment every time this hash changes
  triggers = {
    redeployment = sha1(join(",", var.required_resources)
  }

  # false by default, just for clarity
  lifecycle {
    create_before_destroy = false
  }
}

The above resource is placed in its own module.

resource "random_uuid" "deployment_trigger" {
  depends_on = [aws_api_gateway_integration.integration, aws_api_gateway_method.method]
  keepers = {
    # Generate a new id every time something happens to these resources
    method      = jsonencode(aws_api_gateway_method.method)
    integration = jsonencode(aws_api_gateway_integration.integration)
    path        = var.resource_path
  }
}

# some other gateway stuff...

module "deployment" {
  source      = "../modules/api-gateway-deployment"
  rest_api_id = aws_api_gateway_rest_api.api.id
  stage       = var.stage

  required_resources = [
    random_uuid.deployment_trigger.id,
    random_uuid.deployemnt_trigger_for_another_method.id,
# add random uuid for each method/integration
  ]
}

I placed stuff required for adding new method into its own module as well so I don't have to write "random_uuid" "deployment_trigger" multiple times. This seems to be working fine for consecutive deployments and changes to api gateway integration/method.

I published modules I use, they are very basic and might not work for all projects but code can be adapted for your needs. https://github.com/vladcar/terraform-aws-serverless-common-api-gateway-method https://github.com/vladcar/terraform-aws-serverless-common-api-gateway-deployment

walidmansia commented 4 years ago

hello everybody i did find a solution, terraform handel resources in singleton mode, it means on resource with a specific name should exist only one time in a tf state, in the case of apigateway deployment, a deployment cant be modified, its a partucularity of aws, and it is quite normal it is like a tag. my solution is to remove the resource from the tfstate after each apply terraform state rm aws_api_gateway_deployment.gw_deploy_dev and now i can see the history of terrform deployments on my Api
i hope it will help you, corona virus is a mess but thanks to the time that i had i could made a reverse engineering of the apigw, but in the end i think that Terraform should add new type of ressource based of the design pattern Prototype

andrewp-sf commented 3 years ago

This is not a valid solution. One - you're doing manual work around configuration. Two - when you remove this from state, it means deployment will be created on next apply automatically (even if you don't need/want to). What I see here is a way of tainting/abandoning deployment on destroy. Can't we have some parameter that simply removes deployment from state instead of running API call to delete deployment?

shederman commented 3 years ago

Guys, this bug means that Terraform CANNOT work with API Gateway in Production. Is there ANY view to when this CRITICAL defect in the AWS Provider will be fixed? Alternatively we will have to move away from Terraform.

walidmansia commented 3 years ago

i dont agree, you can split your infra in two module, one special for the deployment. each time you ask for deployment its a new one, like the design pattern PROTOTYPE

shederman commented 3 years ago

Completely wiping out the value of declarative Infrastructure as Code. If we should manually do a whole bunch of extra work to the top level every single time some minor change in a bottom level happens, what's the point of Terraform?

shederman commented 3 years ago

Since it seems this code has zero value, I will post the code that is not working. We have made numerous changes to try and get this working, and not one has worked. This particular variation builds the API Gateway just fine, but any slight change (e.g. to what parameters we validate) results in "Error: error deleting API Gateway Deployment (ufn1gl): BadRequestException: Active stages pointing to this deployment must be moved or deleted"

The only way to make this work in tooling (fully automated) is to entirely destroy the entire API gateway and recreate it, resulting in a completely new URL. I would not be happy with that solution in a Development environment; in a Production one it's a joke.

The defects related to our issue are:

locals {
  private_config_map = { type = "PRIVATE", vpc_endpoint_ids = var.vpc_endpoint_ids }
  regional_config_map = { type = "REGIONAL", vpc_endpoint_ids = null }
}

/* ---------------------------
 * API GATEWAY
 * --------------------------- */
resource "aws_api_gateway_rest_api" "main" {
  name            = var.name

  dynamic "endpoint_configuration" {
    for_each = var.private == true ? list(local.private_config_map) : list(local.regional_config_map)

    content {
      types             = [endpoint_configuration.value["type"]]
      vpc_endpoint_ids  = endpoint_configuration.value["vpc_endpoint_ids"]
    }
  }

  api_key_source  = "HEADER"
  body            = var.body
  tags            = var.tags

  lifecycle {
    ignore_changes = [
      policy
    ]
  }
}

/* ---------------------------
 * SETTINGS
 * --------------------------- */
resource "aws_api_gateway_method_settings" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name  = aws_api_gateway_deployment.deploy.stage_name
  method_path = "*/*"

  settings {
    metrics_enabled    = true
    logging_level      = "INFO"
    data_trace_enabled = true
  }
}

/* ---------------------------
 * MAIN STAGE
 * --------------------------- */
resource "aws_api_gateway_stage" "main" {
  stage_name            = "main"
  description           = "Main Stage for deploying functionality"
  rest_api_id           = aws_api_gateway_rest_api.main.id
  deployment_id         = aws_api_gateway_deployment.deploy.id
  xray_tracing_enabled  = var.xray_tracing_enabled

  variables             = var.variables

  access_log_settings {
    destination_arn = var.cloudwatch_log_arn
    format          = "\"{\"requestId\":\"$context.requestId\",\"ip\":\"$context.identity.sourceIp\",\"caller\":\"$context.identity.caller\",\"user\":\"$context.identity.user\",\"requestTime\":$context.requestTimeEpoch,\"httpMethod\":\"$context.httpMethod\",\"resourcePath\":\"$context.resourcePath\",\"status\":$context.status,\"protocol\":\"$context.protocol\",\"path\":\"$context.path\",\"stage\":\"$context.stage\",\"xrayTraceId\":\"$context.xrayTraceId\",\"userAgent\":\"$context.identity.userAgent\",\"responseLength\":$context.responseLength}\""
  }

  lifecycle {
    ignore_changes = [
      deployment_id
    ]
  }

  tags = var.tags

  depends_on = [aws_api_gateway_deployment.deploy]
}

/* ---------------------------
 * DEPLOYMENT
 * --------------------------- */
resource "aws_api_gateway_deployment" "deploy" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name  = "deploy"
  stage_description = "Deployed at ${timestamp()}"

  triggers = {
    redeployment = sha1(join(",", list(
      jsonencode(var.body)
    )))
  }

  lifecycle {
    create_before_destroy = true
  }
}
shederman commented 3 years ago

@mitchellh Are you aware of this issue? Essentially, Terraform does not support AWS API Gateway. How exactly do we get developer focus on this, seems like an issue like this just gets closed and reopened and closed and reopened in cycle forever.

Glen-Moonpig commented 3 years ago

@mitchellh Are you aware of this issue? Essentially, Terraform does not support AWS API Gateway. How exactly do we get developer focus on this, seems like an issue like this just gets closed and reopened and closed and reopened in cycle forever.

Terraform does support API Gateway. I think the solution is not to be found in Terraform, but in the AWS Terraform Provider. Rather than have all of the elements of the API Gateway as separate Terraform resources, they should be blocks on the aws_api_gateway_rest_api resource, then when any part of the gateway resource changes the provider would know to create a new deployment. I would do the work but I do not code in go, but could design the solution.

resource "aws_api_gateway_rest_api" "my_gateway" {
  name        = "my-gateway"

  endpoint_configuration {
    types = ["REGIONAL"]
  }

  resource {
    path_part = "products"
    method {
        http_method = "GET"
        authorization = "NONE"
        integration = {
            integration_http_method = "POST"
            type                  = "AWS_PROXY"
            uri                     = aws_lambda_alias.example.invoke_arn
        }
    }
    resource {
        path_part = "toys"
        method {
            http_method = "GET"
            authorization = "NONE"
            integration = {
                integration_http_method = "POST"
                type                  = "AWS_PROXY"
                uri                     = aws_lambda_alias.example.invoke_arn
            }
        }
    }
  }
}
shederman commented 3 years ago

@Glen-Moonpig your solution sounds interesting. The one piece I would dispute is that Terraform supports API Gateway. Terraform is supposed to be a tool to manage infrastructure as code - this is a production focused tool. If Terraform cannot create and manage components like API Gateway without causing production outages not required in normal operation of the component, then I would argue quite vehemently that it is not in fact supported.

Especially since this has been unresolved in one shape or form for over a year.

Glen-Moonpig commented 3 years ago

@Glen-Moonpig your solution sounds interesting. The one piece I would dispute is that Terraform supports API Gateway. Terraform is supposed to be a tool to manage infrastructure as code - this is a production focused tool. If Terraform cannot create and manage components like API Gateway without causing production outages not required in normal operation of the component, then I would argue quite vehemently that it is not in fact supported.

Especially since this has been unresolved in one shape or form for over a year.

I am using Terraform to deploy and maintain API Gateways in numerous projects. I have not had any production outages. There are very simple ways to handle this particular scenario. You can just break your changes down into multiple applys and they will go through fine. Terraform 0.13.3/0.14 might resolve the cycle issue as there are various changes around cycles and plans.

shederman commented 3 years ago

We're using Terraform Cloud for this, which does not appear to support multiple apply's as you state; and even if I try this manually, the multiple applies always result in the same underlying error. I believe it may be because we are using OpenAPI Import instead of manually specifying each resource independently; but that's a key feature of API Gateway.

riley-clarkson commented 3 years ago

@shederman My team is having the same issue (posting from my personal github however); most of these workarounds are not ideal and some won't work if you have both a deployment and a stage resource.

We currently workaround by

This should not be necessary though. I hope that this is indeed resolved in .13

shederman commented 3 years ago

@riley-clarkson Do you get any service interruptions like that? We have mission-critical services running on API Gateway and the idea of destroying stages on every deploy is not a popular one I can tell you!

riley-clarkson commented 3 years ago

@shederman We do have service interruptions, which is okay for us, but still not ideal (and will not be possible for some projects/teams). We tried most of the workarounds in this thread before resorting to tainting the stage every deployment. Would like to see this fixed

shederman commented 3 years ago

Yeah, that clearly shows that this is not Production ready for mission critical systems

shederman commented 3 years ago

Does anyone know how long it will be until the Hashicorp bot autocloses the issue (as happened to the previous few)?

breathingdust commented 3 years ago

Hi all! πŸ‘‹ Just a quick note to let you know this is on our radar and we will be taking a look in the near future to arrive at a resolution.

jufemaiz commented 3 years ago

QQ did this have any progress with the release of v0.14 ?

teemal commented 3 years ago

Any traction on this?

shederman commented 3 years ago

@bflad Is there any progress on this issue? Given it is a major issue blocking all usage of AWS API Gateway via Open API in real world Production environments?

shederman commented 3 years ago

@breathingdust you assigned @bflad to this over 2 months ago. Since then there has been zero visible activity, no updates, no documentation updates warning people away from using Terraform for API Gateway in a Production environment, nothing.

As I indicated in a private message to Hashicorp directly, I am happy to do a Medium article on how Terraform AWS Support for API Gateway is not ready for Production and should be avoided if possible. I think that is now the only responsible course of action since from reading documentation nothing would indicate to the casual reader that the only way to update an OpenAPI AWS API Gateway is to completely destroy and recreate your entire infrastructure on each minor change.

Hashicorp/Terraform AWS Team should do the responsible thing and update the public documentation to indicate this SEVERE fault and warn people away from using their solution in real world environments. The fact that you have STILL not done this is a huge stain.

Clearly this issue is not important to you, but it is VERY VERY important to the teams (like ours) stupid enough to get suckered into using this broken implementation. I think you need to be proactive to immediately ensure that more teams don't get harmed by this lack of support.

grimm26 commented 3 years ago

@shederman are you seriously threatening an open source project? Get a hold of yourself. Yes, the deployment resource is problematic. My company uses api gateway with terraform in production very successfully. If I need to remove an integration, I do a manual step of removing the deployment from the state file first. That's it. Yes, I'd like that to not be the case, but I'm not threatening the developers that are the ones most likely to fix this. If you hate terraforming api gateways, stop doing it.

shederman commented 3 years ago

Nobody is threatening anybody. My concern is that people (like us) are using this assuming it will work in Production and it just won't. The team do not seem to be telling anybody about this.

I think it's pretty bad form to know about such a serious issue and not indicate it in their documentation. I think it SHOULD be indicated in their documentation, and I asked them to do that MONTHS ago, and they still haven't.

So what is the responsible thing to do? Ignore this and wait for however long it takes while more and more users get sucked into the same hole? Ask them to update their documentation? Tell people to avoid it because it's broken? And I don't hate terraforming API Gateway, I want to be able to but am blocked by this critical bug.

Glen-Moonpig commented 3 years ago

Nobody is threatening anybody.

I definitely detected a Medium blog post threat :joy:

I'm sure a pull request would be appreciated if you fancy mucking in @shederman ...

I've been using Terraform for API Gateway in production for a couple of years with daily deployments and its working very well for me. I appreciate all the efforts people contribute to this project also. Thanks everyone ❀️ Happy Christmas πŸŽ„ πŸŽ…

shederman commented 3 years ago

I definitely detected a Medium blog post threat πŸ˜‚

Fair, and apologies for that. This is just hugely frustrating. We have lots of Terraform code for our infra, and (based on public documents and samples) decided to move to using Open API in our Terraform infrastructure. This is a major move for our architecture, moving away from using less secure {proxy} links and towards direct integration. We went off and designed our updated architecture, and then began implementing and found ourselves stuck. In order to update a single method, we need to destroy the entire infrastructure each time, OR retool our tf pipelines to essentially bypass TF for deploying API Gateways. Neither option is great.

I've always been a huge booster for Terraform, but this particular bug has really caused significant chaos. I was hoping that since we were distracted by a few big projects, when we came back there would have been some movement, but there has been none in months, not even a comment. We do pay for Terraform Cloud, so from our perspective this isn't entirely free, although I do get that the provider is, and of course I do absolutely appreciate the awesome effort put in by the various contributors to TF.

All I'd like is some kind of idea as to what to do (wait, give up, try something different), and would also like the team to TELL people about the fact the TF does not work with API Gateway delivered via OpenAPI.

grimm26 commented 3 years ago

@shederman You could use an HTTP API Gateway instead and then you don't have to worry about a deployment resource. That is assuming you don't need a feature that API Gateway v2 doesn't have...

A good POC will also help prevent production chaos. You found the corner case to shoot yourself in the foot with and went ahead into production anyway. Honestly, the AWS API design for API Gateway is a nightmare, too.

And if you pay for terraform cloud, you should be pursuing this issue through Hashicorp support or your account rep.

shederman commented 3 years ago

We use features not available in HTTP API Gateway. And to be clear, not stupid enough to actually go into Production with this. We were in POC, but we had done the Open API work first, assuming (yeah yeah) that the Terraform API Gateway integration would work. So a lot of work. Our choices now are do API Gateway using something other than TF (not fun in a 100% TF controlled infrastructure) or create a hybrid model that uses TF for some of the API Gateway, but scripts for other bits (not 100% sure can do this okay) or give up on Open API and stick with our old TF API Gateway that didn't use it.

Hashicorp just say "hey it's an OSS project" and shrug. When they want to charge money, there's nary a mention of no support for OSS, but have an issue, and it's all they can talk about.

And again, want to be absolutely clear - we used the tool the way it's documented. Nowhere was there any indication that there would be issues. I don't see why we're the idiots for believing the docs. And all I am asking for here is that we get told what's going on and the documentation gets updated so that others don't get suckered into the same mistake.

PS. If the solution for using AWS REST API Gateway with Open API is "use something else" simply because of Terraform, surely that should be in the Terraform docs saying "don't use this for that".

bflad commented 3 years ago

Hi folks πŸ‘‹ You may have noticed me poking around a few other API Gateway v1 issues and pull requests earlier today to warm up for this one. I wanted to fully context switch into this service and ensure we had a clear runway for any code changes that need to get in so we didn't break other existing contributions.

Apologies for the long delay here and the very frustrating behavior with the API Gateway v1 functionality with regards to deployment. Those aspects of this AWS service, which is unique compared to others, has consistently challenged Terraform's abilities to model it successfully and our ability to document recommended configuration patterns in a discoverable manner. At the end of this, beyond just fixing the reported issue(s) here, it seems necessary that the maintainers take some extra steps to add more robust service-level and use-case examples are added into the examples directory of the repository (with links from the resource-level reference pages) and/or expand the Learn platform content (e.g. Serverless Applications with AWS Lambda and API Gateway). If you all have other ideas in this manner, it would be great to discuss them. That aside, let's dive into this.

First and foremost, I would like to ensure that I'm understanding and covering expectations for the followers here. At a high level, the problem statement seems to be:

And what is expected out of this effort, which will be a focus of mine until its complete:

If I'm missing anything up until this point, please let me know.

To begin these efforts, I will need to reproduce the issues by having self-contained API Gateway configurations ready that match the problem statement along with reproduction steps. The initial report has some good details and I should be able to assemble an all Terraform resource configuration with some minor effort on my part tomorrow morning. https://github.com/hashicorp/terraform-provider-aws/issues/11344#issuecomment-699612070 has a starting configuration for the OpenAPI case. I will reach out if I am having trouble in this regard. In the meantime, if you also have a self-contained configuration handy that displays these issues and would like investigated, please feel free to reach out or post a link to a Gist/repository. I cannot promise I'll be able to look at or solve every configuration scenario, but the extra context could be valuable.

It is very late for me now (almost 3am) so I'll pick this up again first thing in the morning. Before I go though, for those attempting to use the resource lifecycle create_before_destroy behavior please note that in the more recent versions of Terraform CLI it seems more sensitive to needing that configuration being applied to every resource in that portion of the dependency graph to have the ordering successfully applied. This means not just the aws_api_gateway_deployment or aws_api_gateway_stage resources where it seems intuitive, but also the upstream aws_api_gateway_* resources that are being updated. I only mention this because as an older practitioner of Terraform, it has tripped me up as seeming different than before. I will try to write up more how to debug issues like that tomorrow.

jufemaiz commented 3 years ago

@bflad thanks for the comprehensive update there! I'll be following along with keen interest, having now deployed a series of service API using individual resources and (due to API hits) considering shifting this to make use of the OpenAPI approach (hoping to cut down the number of API calls and hence duration to validate state needed but unsure if this is accurate).

shederman commented 3 years ago

@bflad Thanks for the above, and great to hear someone allocated to this. I think some major things to look out for:

I also do still think that in the meantime your documentation should be updated to warn consumers of the resources of the broken behaviour

bflad commented 3 years ago

TL;DR


Hi again, folks πŸ‘‹ Here are some updates.

Terraform AWS Provider version 3.25.0, released today, includes some fixes (https://github.com/hashicorp/terraform-provider-aws/pull/17099 / https://github.com/hashicorp/terraform-provider-aws/pull/17209) for the aws_api_gateway_rest_api resource to better respect configuration via OpenAPI if you are working in that model. The resource should no longer show plan differences for "missing" Terraform configuration that was sourced from the OpenAPI specification. It should also now handle any Terraform configuration beyond the body and name arguments as overrides to any OpenAPI specification. Hopefully this should help remove some previously frustrating behavior in that resource.

Now let's turn the focus towards API Gateway REST API Deployments. After some extensive testing, it seemed like most issues captured here and in other similar issues relate around the aws_api_gateway_deployment resource also attempting to manage a stage. Terraform and resources are typically designed with a 1:1 mapping and this type of "shadow" resource management has historically been the source of confusion and headaches. The maintainers are now very cognizant not to introduce more of these types of resources, but of course we are stuck with any existing ones until they can be fixed or removed. In the future we may deprecate the problematic behavior.

The good news is that these deployment problems lean towards being fixable via configuration and documentation updates. I'll provide an outline of these below, which should hopefully guide you towards less problematic Terraform environments. You can find proposed API Gateway documentation changes and a new end-to-end example configuration (which I was using to verify my recommendations) here: https://github.com/hashicorp/terraform-provider-aws/pull/17230

I'll also briefly touch on timestamp() function usage, since that is not a recommended pattern and can make Terraform edge cases even sharper.


As a quick overview of API Gateway's lifecycle expectations and how they map to the various Terraform resources, REST APIs can be configured via two methods:

Once the REST API is configured, the aws_api_gateway_deployment resource can be used along with the aws_api_gateway_stage resource to snapshot and publish the REST API. Stages can be optionally managed further with the aws_api_gateway_base_path_mapping, aws_api_gateway_domain, and aws_api_method_settings resources.

Both configuration methods achieve the same end goal and operators can choose which style is preferable for their environment or use cases. However from a deployment standpoint, it is worth noting up front that it is much simpler in Terraform to setup the OpenAPI deployment properly. This is because a direct 1:1 configuration dependency can be setup. The Terraform resource method for configuring REST APIs is not going anywhere or any less supported, just additional care needs to be put in place to set it up properly for deployments.

_The deeper explanation here is that Terraform currently only knows about differences when a state value has changed and only performs a node operation when there is a local state value change. There are configuration methods for creating edges on the graph (e.g. attribute references and depends_on), but there is not a method (configuration, internally, or protocol-wise) to remotely trigger another node to do something. In practice, this means the local node (aws_api_gateway_deployment resource) can only do something when it has local changing state values. Our workaround for this in Terraform Providers is adding a conventional triggers map argument that accepts arbitrary keys and values that can implement local value changes. Collecting and acting on node changes from other nodes has not been a design focus in Terraform before as far as I know, but maybe this can be investigated in the future to improve the user experience in this area._

REST API Deployment with OpenAPI

Here is a recommended starter configuration with this method:

resource "aws_api_gateway_rest_api" "example" {
  body = jsonencode({
    openapi = "3.0.1"
    info = {
      title   = "example"
      version = "1.0"
    }
    paths = {
      "/path1" = {
        get = {
          x-amazon-apigateway-integration = {
            httpMethod           = "GET"
            payloadFormatVersion = "1.0"
            type                 = "HTTP_PROXY"
            uri                  = "https://ip-ranges.amazonaws.com/ip-ranges.json"
          }
        }
      }
    }
  })

  name = "example"
}

resource "aws_api_gateway_deployment" "example" {
  rest_api_id = aws_api_gateway_rest_api.example.id

  triggers = {
    redeployment = sha1(jsonencode(aws_api_gateway_rest_api.example.body))
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_api_gateway_stage" "example" {
  deployment_id = aws_api_gateway_deployment.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
  stage_name    = "example"
}

There will soon be an end-to-end example available in the repository, which is based off this snippet and expands to include other downstream API Gateway resources to ensure they work as expected. Below you can see this in action, successfully deploying REST API updates without error:

$ terraform apply

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_acm_certificate.example will be created
  + resource "aws_acm_certificate" "example" {
      + arn                       = (known after apply)
      + certificate_body          = (known after apply)
      + domain_name               = (known after apply)
      + domain_validation_options = (known after apply)
      + id                        = (known after apply)
      + private_key               = (sensitive value)
      + status                    = (known after apply)
      + subject_alternative_names = (known after apply)
      + validation_emails         = (known after apply)
      + validation_method         = (known after apply)
    }

  # aws_api_gateway_base_path_mapping.example will be created
  + resource "aws_api_gateway_base_path_mapping" "example" {
      + api_id      = (known after apply)
      + domain_name = (known after apply)
      + id          = (known after apply)
      + stage_name  = "example"
    }

  # aws_api_gateway_deployment.example will be created
  + resource "aws_api_gateway_deployment" "example" {
      + created_date  = (known after apply)
      + execution_arn = (known after apply)
      + id            = (known after apply)
      + invoke_url    = (known after apply)
      + rest_api_id   = (known after apply)
      + triggers      = {
          + "redeployment" = "e042aae1faf8de8d7c7c98c063a986025f058c69"
        }
    }

  # aws_api_gateway_domain_name.example will be created
  + resource "aws_api_gateway_domain_name" "example" {
      + arn                      = (known after apply)
      + certificate_upload_date  = (known after apply)
      + cloudfront_domain_name   = (known after apply)
      + cloudfront_zone_id       = (known after apply)
      + domain_name              = (known after apply)
      + id                       = (known after apply)
      + regional_certificate_arn = (known after apply)
      + regional_domain_name     = (known after apply)
      + regional_zone_id         = (known after apply)
      + security_policy          = (known after apply)

      + endpoint_configuration {
          + types = [
              + "REGIONAL",
            ]
        }
    }

  # aws_api_gateway_method_settings.example will be created
  + resource "aws_api_gateway_method_settings" "example" {
      + id          = (known after apply)
      + method_path = "*/*"
      + rest_api_id = (known after apply)
      + stage_name  = "example"

      + settings {
          + cache_data_encrypted                       = (known after apply)
          + cache_ttl_in_seconds                       = (known after apply)
          + caching_enabled                            = (known after apply)
          + data_trace_enabled                         = (known after apply)
          + logging_level                              = (known after apply)
          + metrics_enabled                            = true
          + require_authorization_for_cache_control    = (known after apply)
          + throttling_burst_limit                     = -1
          + throttling_rate_limit                      = -1
          + unauthorized_cache_control_header_strategy = (known after apply)
        }
    }

  # aws_api_gateway_rest_api.example will be created
  + resource "aws_api_gateway_rest_api" "example" {
      + api_key_source               = (known after apply)
      + arn                          = (known after apply)
      + binary_media_types           = (known after apply)
      + body                         = jsonencode(
            {
              + info    = {
                  + title   = "api-gateway-rest-api-openapi-example"
                  + version = "1.0"
                }
              + openapi = "3.0.1"
              + paths   = {
                  + /path1 = {
                      + get = {
                          + x-amazon-apigateway-integration = {
                              + httpMethod           = "GET"
                              + payloadFormatVersion = "1.0"
                              + type                 = "HTTP_PROXY"
                              + uri                  = "https://ip-ranges.amazonaws.com/ip-ranges.json"
                            }
                        }
                    }
                }
            }
        )
      + created_date                 = (known after apply)
      + description                  = (known after apply)
      + disable_execute_api_endpoint = (known after apply)
      + execution_arn                = (known after apply)
      + id                           = (known after apply)
      + minimum_compression_size     = -1
      + name                         = "api-gateway-rest-api-openapi-example"
      + policy                       = (known after apply)
      + root_resource_id             = (known after apply)

      + endpoint_configuration {
          + types            = [
              + "REGIONAL",
            ]
          + vpc_endpoint_ids = (known after apply)
        }
    }

  # aws_api_gateway_stage.example will be created
  + resource "aws_api_gateway_stage" "example" {
      + arn           = (known after apply)
      + deployment_id = (known after apply)
      + execution_arn = (known after apply)
      + id            = (known after apply)
      + invoke_url    = (known after apply)
      + rest_api_id   = (known after apply)
      + stage_name    = "example"
    }

  # tls_private_key.example will be created
  + resource "tls_private_key" "example" {
      + algorithm                  = "RSA"
      + ecdsa_curve                = "P224"
      + id                         = (known after apply)
      + private_key_pem            = (sensitive value)
      + public_key_fingerprint_md5 = (known after apply)
      + public_key_openssh         = (known after apply)
      + public_key_pem             = (known after apply)
      + rsa_bits                   = 2048
    }

  # tls_self_signed_cert.example will be created
  + resource "tls_self_signed_cert" "example" {
      + allowed_uses          = [
          + "key_encipherment",
          + "digital_signature",
          + "server_auth",
        ]
      + cert_pem              = (known after apply)
      + dns_names             = [
          + "example.com",
        ]
      + early_renewal_hours   = 0
      + id                    = (known after apply)
      + key_algorithm         = "RSA"
      + private_key_pem       = (sensitive value)
      + ready_for_renewal     = true
      + validity_end_time     = (known after apply)
      + validity_period_hours = 12
      + validity_start_time   = (known after apply)

      + subject {
          + common_name  = "example.com"
          + organization = "ACME Examples, Inc"
        }
    }

Plan: 9 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + domain_url       = (known after apply)
  + stage_invoke_url = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

tls_private_key.example: Creating...
tls_private_key.example: Creation complete after 0s [id=c1129fc488709c4293493669e43d40b60144999d]
tls_self_signed_cert.example: Creating...
tls_self_signed_cert.example: Creation complete after 0s [id=199729227385231255426302845367097804347]
aws_api_gateway_rest_api.example: Creating...
aws_acm_certificate.example: Creating...
aws_api_gateway_rest_api.example: Creation complete after 2s [id=halquax36h]
aws_api_gateway_deployment.example: Creating...
aws_acm_certificate.example: Creation complete after 3s [id=arn:aws:acm:us-west-2:123456789012:certificate/35cc4fc5-072f-4543-99d1-a1336ac05a41]
aws_api_gateway_domain_name.example: Creating...
aws_api_gateway_deployment.example: Creation complete after 1s [id=tj62g3]
aws_api_gateway_stage.example: Creating...
aws_api_gateway_stage.example: Creation complete after 1s [id=ags-halquax36h-example]
aws_api_gateway_method_settings.example: Creating...
aws_api_gateway_method_settings.example: Creation complete after 1s [id=halquax36h-example-*/*]
aws_api_gateway_domain_name.example: Creation complete after 3s [id=example.com]
aws_api_gateway_base_path_mapping.example: Creating...
aws_api_gateway_base_path_mapping.example: Creation complete after 1s [id=example.com/]

Apply complete! Resources: 9 added, 0 changed, 0 destroyed.

Outputs:

domain_url = "curl -H 'Host: example.com' https://d-orixhuv0o9.execute-api.us-west-2.amazonaws.com/path1 # may take a minute to become available on initial deploy"
stage_invoke_url = "curl https://halquax36h.execute-api.us-west-2.amazonaws.com/example/path1"

$ curl -s https://halquax36h.execute-api.us-west-2.amazonaws.com/example/path1 | jq '.createDate'
"2021-01-21-00-44-18"

$ curl -H 'Host: example.com' -s https://d-orixhuv0o9.execute-api.us-west-2.amazonaws.com/path1 | jq '.createDate'
"2021-01-21-00-44-18"

$ terraform apply -var 'rest_api_path=/path2'
tls_private_key.example: Refreshing state... [id=c1129fc488709c4293493669e43d40b60144999d]
tls_self_signed_cert.example: Refreshing state... [id=199729227385231255426302845367097804347]
aws_api_gateway_rest_api.example: Refreshing state... [id=halquax36h]
aws_acm_certificate.example: Refreshing state... [id=arn:aws:acm:us-west-2:123456789012:certificate/35cc4fc5-072f-4543-99d1-a1336ac05a41]
aws_api_gateway_deployment.example: Refreshing state... [id=tj62g3]
aws_api_gateway_domain_name.example: Refreshing state... [id=example.com]
aws_api_gateway_stage.example: Refreshing state... [id=ags-halquax36h-example]
aws_api_gateway_base_path_mapping.example: Refreshing state... [id=example.com/]
aws_api_gateway_method_settings.example: Refreshing state... [id=halquax36h-example-*/*]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
+/- create replacement and then destroy

Terraform will perform the following actions:

  # aws_api_gateway_deployment.example must be replaced
+/- resource "aws_api_gateway_deployment" "example" {
      ~ created_date  = "2021-01-22T02:59:46Z" -> (known after apply)
      ~ execution_arn = "arn:aws:execute-api:us-west-2:123456789012:halquax36h/" -> (known after apply)
      ~ id            = "tj62g3" -> (known after apply)
      ~ invoke_url    = "https://halquax36h.execute-api.us-west-2.amazonaws.com/" -> (known after apply)
      ~ triggers      = { # forces replacement
          ~ "redeployment" = "e042aae1faf8de8d7c7c98c063a986025f058c69" -> "e6742b53b5eed7039e6fec056113bb049954d64b"
        }
        # (1 unchanged attribute hidden)
    }

  # aws_api_gateway_rest_api.example will be updated in-place
  ~ resource "aws_api_gateway_rest_api" "example" {
      ~ body                         = jsonencode(
          ~ {
              ~ paths   = {
                  - /path1 = {
                      - get = {
                          - x-amazon-apigateway-integration = {
                              - httpMethod           = "GET"
                              - payloadFormatVersion = "1.0"
                              - type                 = "HTTP_PROXY"
                              - uri                  = "https://ip-ranges.amazonaws.com/ip-ranges.json"
                            }
                        }
                    } -> null
                  + /path2 = {
                      + get = {
                          + x-amazon-apigateway-integration = {
                              + httpMethod           = "GET"
                              + payloadFormatVersion = "1.0"
                              + type                 = "HTTP_PROXY"
                              + uri                  = "https://ip-ranges.amazonaws.com/ip-ranges.json"
                            }
                        }
                    }
                }
                # (2 unchanged elements hidden)
            }
        )
        id                           = "halquax36h"
        name                         = "api-gateway-rest-api-openapi-example"
        tags                         = {}
        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # aws_api_gateway_stage.example will be updated in-place
  ~ resource "aws_api_gateway_stage" "example" {
      ~ deployment_id         = "tj62g3" -> (known after apply)
        id                    = "ags-halquax36h-example"
        tags                  = {}
        # (8 unchanged attributes hidden)
    }

Plan: 1 to add, 2 to change, 1 to destroy.

Changes to Outputs:
  ~ domain_url       = "curl -H 'Host: example.com' https://d-orixhuv0o9.execute-api.us-west-2.amazonaws.com/path1 # may take a minute to become available on initial deploy" -> "curl -H 'Host: example.com' https://d-orixhuv0o9.execute-api.us-west-2.amazonaws.com/path2 # may take a minute to become available on initial deploy"
  ~ stage_invoke_url = "curl https://halquax36h.execute-api.us-west-2.amazonaws.com/example/path1" -> "curl https://halquax36h.execute-api.us-west-2.amazonaws.com/example/path2"

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_api_gateway_rest_api.example: Modifying... [id=halquax36h]
aws_api_gateway_rest_api.example: Modifications complete after 1s [id=halquax36h]
aws_api_gateway_deployment.example: Creating...
aws_api_gateway_deployment.example: Creation complete after 1s [id=9vc6zm]
aws_api_gateway_stage.example: Modifying... [id=ags-halquax36h-example]
aws_api_gateway_stage.example: Modifications complete after 1s [id=ags-halquax36h-example]
aws_api_gateway_deployment.example: Destroying... [id=tj62g3]
aws_api_gateway_deployment.example: Destruction complete after 0s

Apply complete! Resources: 1 added, 2 changed, 1 destroyed.

Outputs:

domain_url = "curl -H 'Host: example.com' https://d-orixhuv0o9.execute-api.us-west-2.amazonaws.com/path2 # may take a minute to become available on initial deploy"
stage_invoke_url = "curl https://halquax36h.execute-api.us-west-2.amazonaws.com/example/path2"

$ curl -s https://halquax36h.execute-api.us-west-2.amazonaws.com/example/path2 | jq '.createDate'
"2021-01-21-00-44-18"

$ curl -H 'Host: example.com' -s https://d-orixhuv0o9.execute-api.us-west-2.amazonaws.com/path2 | jq '.createDate'
"2021-01-21-00-44-18"

REST API Deployment with Terraform Resources

Here is a recommended starter configuration with this method:

resource "aws_api_gateway_rest_api" "example" {
  name = "example"
}

resource "aws_api_gateway_resource" "example" {
  parent_id   = aws_api_gateway_rest_api.example.root_resource_id
  path_part   = "example"
  rest_api_id = aws_api_gateway_rest_api.example.id
}

resource "aws_api_gateway_method" "example" {
  authorization = "NONE"
  http_method   = "GET"
  resource_id   = aws_api_gateway_resource.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
}

resource "aws_api_gateway_integration" "example" {
  http_method = aws_api_gateway_method.example.http_method
  resource_id = aws_api_gateway_resource.example.id
  rest_api_id = aws_api_gateway_rest_api.example.id
  type        = "MOCK"
}

resource "aws_api_gateway_deployment" "example" {
  rest_api_id = aws_api_gateway_rest_api.example.id

  triggers = {
    # NOTE: The configuration below will satisfy ordering considerations,
    #       but not pick up all future REST API changes. More advanced patterns
    #       are possible, such as using the filesha1() function against the
    #       Terraform configuration file(s) or removing the .id references to
    #       calculate a hash against whole resources. Be aware that using whole
    #       resources will show a difference after the initial implementation.
    #       It will stabilize to only change when resources change afterwards.
    redeployment = sha1(jsonencode([
      aws_api_gateway_resource.example.id,
      aws_api_gateway_method.example.id,
      aws_api_gateway_integration.example.id,
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_api_gateway_stage" "example" {
  deployment_id = aws_api_gateway_deployment.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
  stage_name    = "example"
}

As you can see the triggers is much more complicated as we need to collect changes from many more sources of configuration to implement it properly. The two additional configuration options about potentially using the filesha1() function against the configuration file itself or hashing whole resources are both widely used in the broader ecosystem, but they add some additional complexity/caveats. The HashiCorp Community Forums is likely a better place to discuss those types of configuration choices, where there are far more people ready to help than those watching the issues in this code repository.


As an aside about the timestamp() function, please note that it uses a special implementation (overriding the Terraform expectation that plan and apply values must exactly match) which generally translates to it sometimes introducing strange behavior into Terraform plan differences. If you need a static time value in Terraform configurations (e.g. when an API Gateway was deployed), a preferable solution is the time_static resource. Since it participates in the Terraform operation graph just like other resources and can store time with a stable value, it should be much more predictable.

Here is an illustrative example (aws_api_gateway_deployment resource already has a created_date attribute):

terraform {
  required_version = "0.14.5"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.25.0"
    }
    time = {
      source  = "hashicorp/time"
      version = "0.6.0"
    }
  }
}

provider "aws" {
  region = "us-east-2"
}

variable "name" {
  default     = "tf-aws-11344-time"
  description = "Name and OpenAPI title for REST API"
  type        = string
}

variable "path" {
  default     = "/test"
  description = "OpenAPI path to test updates"
  type        = string
}

resource "aws_api_gateway_rest_api" "example" {
  body = jsonencode({
    openapi = "3.0.1"
    info = {
      title   = var.name
      version = "1.0"
    }
    paths = {
      (var.path) = {
        get = {
          x-amazon-apigateway-integration = {
            httpMethod           = "GET"
            payloadFormatVersion = "1.0"
            type                 = "HTTP_PROXY"
            uri                  = "https://ip-ranges.amazonaws.com/ip-ranges.json"
          }
        }
      }
    }
  })

  name = var.name

  endpoint_configuration {
    types = ["REGIONAL"]
  }
}

resource "aws_api_gateway_deployment" "example" {
  rest_api_id = aws_api_gateway_rest_api.example.id

  triggers = {
    redeployment = sha1(jsonencode(aws_api_gateway_rest_api.example.body))
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_api_gateway_stage" "example" {
  deployment_id = aws_api_gateway_deployment.example.id
  description   = "Deployed at ${time_static.deploy.rfc3339}"
  rest_api_id   = aws_api_gateway_rest_api.example.id
  stage_name    = "example"
}

resource "time_static" "deploy" {
  triggers = {
    redeployment = aws_api_gateway_deployment.example.id
  }
}

You can see updates by running a command similar to terraform apply -var 'path=/new' after the initial terraform apply.


Hopefully all this information helps. If these recommendations are not working as expected on Terraform AWS Provider version 3.25.0 or later, please reach out. We will be looking for reproducing configurations and plan output in those cases. πŸ‘

shederman commented 3 years ago

@bflad Thanks so much fo this, we will start testing right away to see if it alleviates our issues. A initial run through looks very promising.

rbowater commented 3 years ago

In my testing, if you trigger deployments off changes in id as per the example, that means that the resource will be destroyed and recreated to reflect the ID change (e.g. you change the method from a POST to a PUT). This leads to the following situation:

aws_api_gateway_integration.api_integration: Creation complete after 1s [id=agi-mn2mqickwg-fdd064-PUT]
aws_api_gateway_deployment.deployment: Creating...
aws_api_gateway_deployment.deployment: Creation complete after 1s [id=b5r1ui]
aws_api_gateway_deployment.deployment: Destroying... [id=s0ir4f]
aws_api_gateway_deployment.deployment: Destruction complete after 1s
aws_api_gateway_integration.api_integration: Destroying... [id=agi-mn2mqickwg-fdd064-POST]

Essentially, the API gets deployed before the old integration is destroyed, which means your API deployment will contain both the old integration and the new one at the same time. This might not be desired, so unless I've missed something it's worth taking care when triggering deployments off resources that are getting destroyed and recreated as opposed to just being modified in place.

dmurphy-github commented 3 years ago

Hi All,

Hopefully you can help us understand why we get a cycle error in the following case;

We have an API deployed that has a single method (e.g. "ANY") and single corresponding integration. This needs to be changed to two methods (e.g. "ANY" and "OPTIONS") and two integrations. Our actual implementation is quite complex but after extensive testing we think the change that is causing the error can be reproduced using the example @bflad provided above on 22 Jan.

Steps to reproduce:

  1. Apply the example configuration:
resource "aws_api_gateway_rest_api" "example" {
  name = "example"
}

resource "aws_api_gateway_resource" "example" {
  parent_id   = aws_api_gateway_rest_api.example.root_resource_id
  path_part   = "example"
  rest_api_id = aws_api_gateway_rest_api.example.id
}

resource "aws_api_gateway_method" "example" {
  authorization = "NONE"
  http_method   = "GET"
  resource_id   = aws_api_gateway_resource.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
}

resource "aws_api_gateway_integration" "example" {
  http_method = aws_api_gateway_method.example.http_method
  resource_id = aws_api_gateway_resource.example.id
  rest_api_id = aws_api_gateway_rest_api.example.id
  type        = "MOCK"
}

resource "aws_api_gateway_deployment" "example" {
  rest_api_id = aws_api_gateway_rest_api.example.id

  triggers = {
    # NOTE: The configuration below will satisfy ordering considerations,
    #       but not pick up all future REST API changes. More advanced patterns
    #       are possible, such as using the filesha1() function against the
    #       Terraform configuration file(s) or removing the .id references to
    #       calculate a hash against whole resources. Be aware that using whole
    #       resources will show a difference after the initial implementation.
    #       It will stabilize to only change when resources change afterwards.
    redeployment = sha1(jsonencode([
      aws_api_gateway_resource.example.id,
      aws_api_gateway_method.example.id,
      aws_api_gateway_integration.example.id,
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_api_gateway_stage" "example" {
  deployment_id = aws_api_gateway_deployment.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
  stage_name    = "example"
}
  1. Modify the configuration by changing the name of the method from "example" to "new_example" and modify the integration and deployment resources accordingly:
resource "aws_api_gateway_rest_api" "example" {
  name = "example"
}
resource "aws_api_gateway_resource" "example" {
  parent_id   = aws_api_gateway_rest_api.example.root_resource_id
  path_part   = "example"
  rest_api_id = aws_api_gateway_rest_api.example.id
}
resource "aws_api_gateway_method" "new_example" {
  authorization = "NONE"
  http_method   = "GET"
  resource_id   = aws_api_gateway_resource.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
}
resource "aws_api_gateway_integration" "example" {
  http_method = aws_api_gateway_method.new_example.http_method
  resource_id = aws_api_gateway_resource.example.id
  rest_api_id = aws_api_gateway_rest_api.example.id
  type        = "MOCK"
}
resource "aws_api_gateway_deployment" "example" {
  rest_api_id = aws_api_gateway_rest_api.example.id
triggers = {
    # NOTE: The configuration below will satisfy ordering considerations,
    #       but not pick up all future REST API changes. More advanced patterns
    #       are possible, such as using the filesha1() function against the
    #       Terraform configuration file(s) or removing the .id references to
    #       calculate a hash against whole resources. Be aware that using whole
    #       resources will show a difference after the initial implementation.
    #       It will stabilize to only change when resources change afterwards.
    redeployment = sha1(jsonencode([
      aws_api_gateway_resource.example.id,
      aws_api_gateway_method.new_example.id,
      aws_api_gateway_integration.example.id,
    ]))
  }
lifecycle {
    create_before_destroy = true
  }
}
resource "aws_api_gateway_stage" "example" {
  deployment_id = aws_api_gateway_deployment.example.id
  rest_api_id   = aws_api_gateway_rest_api.example.id
  stage_name    = "example"
}
  1. Apply the updated configuration.

Expected behaviour: A new deployment is created and the old deployment is replaced.

Actual behaviour: Error: Cycle: aws_api_gateway_stage.example, aws_api_gateway_method.example (destroy), aws_api_gateway_deployment.example, aws_api_gateway_deployment.example (destroy deposed d932d47f)

We would really appreciate it if you could look into this case.

Thank you

Terraform v0.12.20 Provider.aws v3.31.0

hannes-ucsc commented 3 years ago

Also still seeing this despite following the best practices (explicit stage instead of implicit one, create_before_destroy on the deployment, hash of the API spec as a trigger).

$ terraform -version
Terraform v0.12.24
+ provider.aws v3.36.0
+ provider.google v2.20.3
+ provider.null v2.1.2
+ provider.template v2.2.0

Can someone explain how to read the Error: Cycle line?

Pepert commented 2 years ago

Just putting it here, in case that helps somebody: I followed the example of bflad (using the REST API Deployment with OpenAPI version), but still had this cycle error.

I finally found that I had some aws_lambda_permission resources to bind lambdas with API gateway that were being updated at the same time as the deployment resource.

After adding depends_on = [aws_api_gateway_deployment.example] on my permission resources, the deployment went fine (ex below):

resource "aws_lambda_permission" "lambda_permission_example" {
  statement_id  = "AllowExecutionFromAPIGateway"
  action        = "lambda:InvokeFunction"
  function_name = "lambda name example"
  principal     = "apigateway.amazonaws.com"
  source_arn    = "${aws_api_gateway_rest_api.example_api_gateway.execution_arn}/*/POST/whatever/*"

  depends_on = [aws_api_gateway_deployment.example]
}
jeffbski-rga commented 2 years ago

In my case I was able to work around the issue when trying to destroy an endpoint from an API Gateway by first removing the ids from the trigger redeploy block and then following up with removing the items from terraform in the next deploy.

nsergeev1 commented 1 year ago

Still have the same error.

hannes-ucsc commented 1 year ago

It should also be considered that the workarounds may only work in certain situations i.e. when previous attempts at applying the resource changes already created some of the resources involved. What's really needed is a repeatable reproduction. It should also be said that the API Gateway v1 API is an aberration from other AWS APIs. That's probably why v2 was created but that's just speculation.