aws_api_gateway_method_response Concurrency

hashibot commented 7 years ago

This issue was originally opened by @steve-gray as hashicorp/terraform#11395. It was migrated here as part of the provider split. The original body of the issue is below.

When creating multiple method response codes that live beneath the same method, Terraform will typically fail due to concurrency/conflict errors on the AWS side. It appears that AWS (whilst not documented anywhere visible) is treating the api_gateway_method as the concurrency boundary.

Terraform Version

0.8.2

Affected Resource(s)

Please list the resources as a list, for example:

aws_api_gateway_method_response

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Expected Behavior

Multiple response codes should be created correctly for an API response

Actual Behavior

Fails with error such as

 aws_api_gateway_method_response.dummy_function_response1: Error creating API Gateway Method Response: ConflictException: Response already exists for this resource
    status code: 409, request id: 56acda62-e292-11e6-b56a-85b5b33b47bd
* aws_api_gateway_method_response.dummy_function_response3: Error creating API Gateway Method Response: ConflictException: Response already exists for this resource
    status code: 409, request id: 56ae3a9c-e292-11e6-8218-a9497181c142
* aws_api_gateway_method_response.http409: ConflictException: Unable to complete operation due to concurrent modification. Please try again later.
    status code: 409, request id: 56acb348-e292-11e6-8d8e-77f3e0ae3f54
* aws_api_gateway_method_response.http401: ConflictException: Unable to complete operation due to concurrent modification. Please try again later.
    status code: 409, request id: 56ac6527-e292-11e6-8d8e-77f3e0ae3f54
* aws_api_gateway_method_response.dummy_function_response2: Error creating API Gateway Method Response: ConflictException: Unable to complete operation due to concurrent modification. Please try again later.
    status code: 409, request id: 56ac65c5-e292-11e6-99f9-efaae93431b6

Steps to Reproduce

terraform apply where you have an API gateway operation with multiple (2+) HTTP response codes defined which are new and require creating. Subsequent attempts may work as individual responses get created, but will require many retries before the race conditions are not encountered.

betabandido commented 7 years ago

@grubernaut I have randomly experienced this issue too. Do you know whether there has been any progress?

steve-gray commented 7 years ago

No. There's really only four ways to fix this:

TF Files: Everyone who creates AWS API Gateways will need to explicitly dasiy-chain depends_on clauses between elements for their entire API terraform spec, effectively serializing the operations. This is insanely impractical and hard to manage/reason about for most people.
Config: Set parallelism to 1 (kind of forces the same thing), but makes large deployments insanely slow. You might be able to target just specific resource types and run the terraform applys with the low parallism one after another, but it's hard work and won't fit most peoples usage.
Core: Extend the terraform core provider interfaces to permit signalling of any special requirements for scheduling (i.e. only run 1 gateway_method_response, or similiar construct mutation at a time, using the API Gateway API ID as the boundary for parallelism).
This Provider: Add a wait/mutex around the affect aspects of the API, bounded by the API ID, so that the Terraform AWS provider does not operate multiple API Gateway mutation requests concurrently against the same API gateway instance (or retries them in the case of a 409). This means that despite terraform potentially issuing parallel requests against the provider, a subset would be serialized.

Option 3 is the best, but with the Terraform core/providers now being split out, the work to augment the API contract to support such scheduling hints will gradually drift out of reach, particularly if there's any traction on people writing new providers in the short/near-term.

radeksimko commented 7 years ago

Hi folks, @steve-gray thanks for opening the issue and suggesting solutions, I more-or-less agree with the explained pros and cons of your first 2 suggestions. I'd be more inclined to 4th one though.

Implementing this into the core would not be trivial, because limits like this aren't usually set for 1 resource globally. This is at the very least region-specific limit (AFAIK) and we want to allow multiple aws_api_gateway_method_responses to be created/modified/deleted in parallel, if they are in different regions.

Theoretically we could somehow make schema understand the relationship between region and limits and introduce a 1st class support for parallelisation per resource, but that won't solve the problem either, because the provider might not always have the full picture of your whole AWS account, region or all API Gateway resources within a region. This is by design - we don't want Terraform to get in the way of other tools.

Either you may have other tool(s) managing different parts of your AWS account, or even two different teams using Terraform within the same region. In such case the two configs defined by two different teams don't know about each other, so they would still bump into described issues.

I believe that resource-specific mutex with a reasonably scoped key is a solution, possibly with retries on the mentioned error to cover the above corner cases.

daniebker commented 7 years ago

I'm also experiencing this issue with aws_api_gateway_method_settings. It fails when trying to create more than one method setting at the same time. Looks like the api_gateway_stage doesn't allow concurrent modification.

* aws_api_gateway_method_settings.method_settings_one: 1 error(s) occurred:

* aws_api_gateway_method_settings.method_settings_one: Updating API Gateway Stage failed: ConflictException: Unable to complete operation due to concurrent modification. Please try again later.
        status code: 409, request id: b5a206a2-9221-11e7-80a2-09c3a141ec6e

* aws_api_gateway_method_settings.method_settings_two: 1 error(s) occurred:

* aws_api_gateway_method_settings.method_settings_two: Updating API Gateway Stage failed: ConflictException: Unable to complete operation due to concurrent modification. Please try again later.
        status code: 409, request id: b5a19197-9221-11e7-bd0b-45efb75cf2a0

I resolved it by daisy chaining the method setting creation.

blaltarriba commented 7 years ago

I have the same issue using 0.10.0 version.

I set depends_on in each aws_api_gateway_method to try to avoid the concurrency but I still have the same issue, but this behaviour is random.

Any idea?

betabandido commented 6 years ago

@steve-gray @radeksimko Has there been any progress on this issue?

We have a module for creating an API method that internally creates all the related resources: method, integration, integration response and response (see: https://github.com/vistaprint/TerraformModules/tree/master/modules/api_method).

One option would be to make a module depend on another module, but that is not possible yet (see: https://github.com/hashicorp/terraform/issues/10462). While this issue persists we plan to use a custom dependency chain. But, it does not seem to exist an easy way to do this. Our idea is to let the module accept an input variable with a value generated from a module's output variable, and then consume the input variable to create an explicit dependency to the module generating the output variable.

We have found ways to do so for resources such as api_gateway_deployment. In this case, we consume the input variable by assigning it to the variables argument (which we do not use, so we do not care if AWS creates an unused variable in the stage).

But, we cannot find a way to do something similar with other resources such as api_gateway_method_response. In this case, there is no argument in the resource that we can use to create the dependency.

Are there any "hidden" arguments in resources that we can use to consume the module input variable, and thus establish a dependency?

I can provide a simple code snippet if that helps.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

hashicorp / terraform-provider-aws