S3 Backend: Support multi-region bucket replication

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

https://www.terraform.io

Other

43.13k stars 9.58k forks source link

S3 Backend: Support multi-region bucket replication #32190

Open animaxcg opened 2 years ago

animaxcg commented 2 years ago

Terraform Version

All

Terraform Configuration Files

terraform {
  backend "s3" {
    bucket         = "bucket-${region}"
    dynamodb_table = "my-lock-table"
    encrypt        = true
    key            = "./terraform.tfstate"
    region         = "${region}"
  }
}

Debug Output

doesn't apply

Expected Behavior

I would expect that I can specify the path for LockID to not include the bucket name so it works with multi-regional replicated S3 buckets

Actual Behavior

bucket name of initial apply is in the LockID value which makes it impossible to run failover if AWS is down in the inital apply region despite all the data being available in another

Steps to Reproduce

Create mutli-regional with replication backend s3 buckets Create global dynamo lock table in same regions as above buckets run an apply from region1 modify module and and set env to use backend in other region run apply from region2 will fail as LockID won't match bucket name

Additional Context

This is a pretty big miss as in a failover scenario I have to modify lockIDs manually.. not something I want to do to swing an application to another region.. and than to swing back I would have to do the same.

References

No response

apparentlymart commented 2 years ago

Hi @animaxcg! Thanks for reporting this.

It seems you are trying to use the S3 backend in a situation it was not designed to deal with, and so this is a feature request to support that new usage rather than a bug. I'm going to relabel this so that the AWS provider team (who are also responsible for the S3 backend) might find it and consider whether and how to support this new capability.

Thanks!

Nuru commented 1 year ago

@apparentlymart (BTW, this was bounced from the AWS provider team back to here)

I just ran into this exact same problem. Setting aside problems with multiple writers when using a Multi-Region Access Point (MRAP), I want to create a primary/standby (bi-directional replication with only one write destination at a time) configuration exactly as the OP did.

What I want is the ability to specify the Lock Key ID Prefix as something independent of the S3 bucket name or region. If I have buckets example-use1-tfstate and example-usw2-tfstate I would like to set the Lock ID key for both backend/buckets to be example-global-tfstate to help ensoure the 2 buckets stay in sync.

This seems to be to be a relatively easy change to make, just allowing the LockID key prefix to be configurable instead of forcing it to be the bucket name, in much the same way I can already specify the workspace_key_prefix for a workspace to affect the full name (path/key) of the Terraform state stored in S3. Providing this capability will greatly improve the High Availability (HA) posture of Terraform in that with the combination of 2 buckets in different regions, kept in sync by bi-drectional cross-region replication (supported since 2020 by AWS) and a 2 DynamoDB tables in the same 2 regions with the same name, kept in sync by DynamoDB bidrectional replication (also supported since 2020 by AWS as DynamoDB Global Tables

crw commented 1 year ago

Hi @Nuru, the code for backends is in this repository and so we manage the S3 backend issues out of this repo. However the team that would work on it is the AWS Provider team, as they have the setup to actually test the functionality and are more deeply in the world of AWS features. Just an FYI as to why the issue was referred back to this repo.

jcarlson commented 1 year ago

FWIW, I was able to sort of get a multi-region access point working as an S3 backend. See here:

terraform {
  required_version = "~> 1.5"

  backend "s3" {
    bucket   = "mfzwi23gnjvgw" # <-- MRAP alias, without '.mrap' suffix
    endpoint = "mrap.accesspoint.s3-global.amazonaws.com" # <-- use the multi-region endpoint
    region   = "us-east-1" # <-- this is the tricky bit
    # ... other options
  }
}

The tricky bit is that multi-region access points will route you to the nearest bucket, and the API request has to be signed with Signature Version 4A (SigV4A).

In the example above, if my request is routed to us-west-2, the request is still signed with us-east-1 and I get the following error message on terraform init:

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Error refreshing state: AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-west-2'
    status code: 400, request id: ..., host id: ...

halradaideh commented 1 year ago

this would be awesome

gdavison commented 1 year ago

There are a few considerations to take into account when supporting multi-region and failover support for the S3 backend.

In order to work as a lock, DynamoDB needs strongly consistent reads as well as being able to make conditional operations.

According to AWS documentation, DynamoDB Global Tables do not support strongly consistent reads across regions. Replication can take multiple seconds. If you are using a DynamoDB Global Table to lock Terraform state, all lock operations will need to be done to the same region. DynamoDB doesn't provide a way to enforce this, so it has to be managed by the Terraform user.

S3, however, does allow configuring a Multi-Region Access Point in an active-passive configuration with manual failover configuration. An active-active multi-region S3 configuration does not guarantee consistent reads, so should be avoided for Terraform backend storage. See https://docs.aws.amazon.com/AmazonS3/latest/userguide//MultiRegionAccessPoints.html for more details. Note that it is also possible to bypass the MRAP and write directly to a member bucket unless additional configuration is added to prevent access.

animaxcg commented 1 year ago

@gdavison The likelyhood of 2 terraform runs happening on 2 separate regions at the same time is far less likely than needing to run terraform from another region. the latter is an operational MUST in the time of an aws outage to switch things like DNS within AWS. the former only happens when the latter is also happening meaning terraform will fail anyways as the region is down.

Nuru commented 1 year ago

@gdavison You only need strong consistency for an active-active configuration, or a case where automatic failover is subject to rapid flapping. My request to be able to specify the path for LockID to not include the bucket name is a very simple one and would support manual failover during a region outage using an active-standby configuration. Although the OP asked about using it with multi-region access points, I am prepared and eager to use it with standard regional buckets synced via S3 replication, and can tolerate the propagation delay.

This simple feature request is all that is needed for me to implement a disaster recovery fail-over capability, and the lack of this feature is a major impediment to having any reasonable way to guard against an outage in the region hosting my S3 bucket and DynamoDB table.

gdavison commented 1 year ago

Hi @animaxcg and @Nuru.

I was listing some of the considerations to take into account when implementing multi-region support for the backend, not saying that we wouldn't implement it. Keep in mind that, even if there are things that would never happen the way that you use Terraform, we have to deal with many other use cases where we do have to guarantee certain behaviour for users that manage Terraform differently.

animaxcg commented 1 year ago

@gdavison One of the main uses of terraform is to be able to apply the same infrastructure to N regions/DCs etc and manage that. or maybe it is best to quote the current home page: "Infrastructure automation to provision and manage resources in any cloud or data center." - https://www.terraform.io/

So to quote yourself "other use cases where we do have to guarantee certain behaviour for users that manage Terraform differently" - you are failing to uphold the banner on the website which clearly states you can manage infrastructure across "any cloud or data center"

hammopau commented 11 months ago

Hi. Just to add another supportive voice to this discussion...

An example use-case for state in MRAP is around managing cross-region failover Route53 records. As-is, following a regional failover, we'd need to manually import the zones into modules & state in the alternative region. Yes, the reccords themselves should still work in a DR event, but we wont be able to manage them via TF (which will invariably be the case when target systems are re-deployed in response to incorrect or changed config) until they're imported. Given these are frequently part of 'region prerequisites' or other, in our case, Customer facing solutions, this is an undesirable level of complexity when trying to manage a regional failover. I concede that refactoring can reduce complexities around failover, but this introduces more complexities for day-to-day.

My spurs may be chinking a bit here, but I care less about DDB locks failing over at the speed of light as these can be cleaned relatively easily & are easily identifable. Plus as failover is manually controlled & if you decide to switch when pipelines are running, then go work for Starbucks... 😄

This is an important feature when managing cross region failover & I would have thought would be quite obvious to Hashi...