hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
43.14k stars 9.58k forks source link

S3 backend config of terraform_remote_state data source does not lock tfstate #31877

Open schollii opened 2 years ago

schollii commented 2 years ago

Terraform Version

terraform 1.2.7

Terraform Configuration Files

In one folder, create main.tf:

output "some_var" {
  value = "abc"
}

In same folder create backend.tf (replace YOUR_ORG by something unique):

terraform {
  backend "s3" {
    bucket  = "YOUR_ORG-tfstate-backends"
    region  = "us-east-1"
    encrypt = true

    dynamodb_table = "tfstates-lock"
    key            = "test-remote-config-lock-bug/terraform.tfstate"
  }
}

Create an s3 bucket YOUR_ORG-tfstate-backends in us-east-1, and a dynamodb table tfstates-lock.

Run terraform init and terraform apply and store the state in s3. Verify that the tfstate is there.

In another folder, create 'main.tf':

data "terraform_remote_state" "stack" {
  backend = "s3"

  config = {
    bucket  = "YOUR_ORG-tfstate-backends"
    region  = "us-east-1"
    encrypt = true

    dynamodb_table = "tfstates-lock-TABLE_THAT_DOES_NOT_EXIST"
    key            = "test-remote-config-lock-bug/terraform.tfstate"
  }
}

locals {
  stack_out = data.terraform_remote_state.stack.outputs
  some_var_in_remote_state = local.stack_out.some_var
}

Run terraform init and apply.

Debug Output

n/a

Expected Behavior

terraform should abort in the second stack because it should try to lock the remote state and fail to find the dyndb table.

Assumption: the remote state should be locked when getting outputs; but I just checked and terraform output does not seem to lock the state file, so that's probably why the actual behavior is different from expected.

However, this should be mentioned somewhere in the docs related to terraform_remote_state for backend = s3. I imagine the dynamodb_table could just be omitted.

Actual Behavior

terraform did not fail. "abc" is output, meaning the state was read without first locking it.

Steps to Reproduce

Explained in the terraform config files section otherwise too hard to explain, but basically:

Additional Context

No response

References

No response

jbardin commented 2 years ago

Hi @schollii,

Long ago when locking was added to remote state operations, the terraform_remote_state data source was not updated to use these locks to prevent any disruption in existing workflows, and it has remained this way ever since. This works out OK, because none of the remote state implementations have a consistency level which would ever return a corrupt state value on concurrent access, and even if that could happen, it would only cause the remote state data source to fail and just halt evaluation.

While we can look into locking options for terraform remote state when we decide on a plan to move forward with a new remote storage subsystem, I'm not sure locking would be added in the future either. The state locks are meant to protect the state from unsafe concurrent access, not as a mechanism for coordinating multiple processes. Even if the remote state data source were to lock the state, it can't serialize access in a defined order, meaning that the remote data source could be read immediately before or immediately after a concurrent operation with no error but yielding different results. Terraform has always left this type of workflow coordination to external systems.

schollii commented 2 years ago

Ok thanks. I think it would be worth adding a couple of sentences (in the remote state webpage) explaining that remote state data sources do not lock the remote state (even those backeds that support it like s3), because it is one less argument to provide (in the case of s3 backend, the dynamodb lock table is ignored and can be left out).