hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.34k stars 9.49k forks source link

Creating a new workspace with `terraform workspace new -state=tf.state.default` does not work for s3 remote state. #29819

Open dnozay opened 2 years ago

dnozay commented 2 years ago

Terraform Version

╰─ terraform version
Terraform v1.0.9
on darwin_amd64
+ provider registry.terraform.io/hashicorp/aws v3.37.0
+ provider registry.terraform.io/hashicorp/random v3.1.0

Terraform Configuration Files

terraform {
  backend "s3" {
    bucket               = "my-tfstate-bucket"
    key                  = "kubernetes-infra.tfstate"
    region               = "us-west-2"
    workspace_key_prefix = "tf-state"
    profile              = "test-account/administrator"
  }
}

Debug Output

2021-10-27T16:38:34.408-0700 [DEBUG] [aws-sdk-go] DEBUG: Request s3/PutObject Details:
---[ REQUEST POST-SIGN ]-----------------------------
PUT /tf-state/dnozay_testing/kubernetes-infra.tfstate HTTP/1.1
Host: xxxxxx
User-Agent: aws-sdk-go/1.40.25 (go1.16.4; darwin; amd64) APN/1.0 HashiCorp/1.0 Terraform/1.0.9
Content-Length: 155
Authorization: xxxxxx
Content-Md5: xxxxxx
Content-Type: application/json
X-Amz-Content-Sha256: xxxxxx
X-Amz-Date: xxxxxx
X-Amz-Security-Token: xxxxxx
Accept-Encoding: gzip

{
  "version": 4,
  "terraform_version": "1.0.9",
  "serial": 0,
  "lineage": "xxxxxx-xxxxxx-xxxxxx-xxxxxx-xxxxxx",
  "outputs": {},
  "resources": []
}

Expected Behavior

Creating a new workspace should work and use input state as base.

terraform state pull > tf.state.default
terraform workspace new -state=tf.state.default ${USER}_testing

Actual Behavior

Steps to Reproduce

terraform workspace select default
terraform state pull > tf.state.default
ls -lh tf.state.default
echo "number of resources=$(terraform state list | wc -l)"
terraform workspace delete -force ${USER}_testing
TF_LOG=trace terraform workspace new -state=tf.state.default ${USER}_testing
echo "number of resources=$(terraform state list | wc -l) 😭😭😭"
terraform state push tf.state.default
echo "number of resources=$(terraform state list | wc -l)"

Additional Context

without TF_LOG=trace

Switched to workspace "default".
-rw-r--r--  1 dnozay  staff   281K Oct 27 16:45 tf.state.default
number of resources=     205
Deleted workspace "dnozay_testing"!
WARNING: "dnozay_testing" was non-empty.
The resources managed by the deleted workspace may still exist,
but are no longer manageable by Terraform since the state has
been deleted.

Created and switched to workspace "dnozay_testing"!

You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "terraform plan" Terraform will not see any existing state
for this configuration.
number of resources=       0 😭😭😭
number of resources=     205

As you can see in the steps, workaround is to explicitly push the state

terraform state push tf.state.default

However, this is using the same lineage which could be a problem. In that regard, S3 backend and GCS backend are not working the same.

apparentlymart commented 2 years ago

Thanks for reporting this, @dnozay.

For most commands -state=... is a legacy option for the local backend only, but it seems like it intentionally has a different meaning for terraform workspace new, because that command is handling the option inline itself rather than passing it over to the backend as other commands do:

https://github.com/hashicorp/terraform/blob/de105595e2788b5614081a295268cdb75964ee06/internal/command/workspace_new.go#L140-L163

(statePath in the above is what the -state=... option gets decoded into.)

The logic here seems to be backend-agnostics:

  1. Read the given file into memory as a state file object.
  2. Write the state file object directly to the "state manager", which is an interface that all of the backends implement. This'll typically just update the in-memory structure to consider this new snapshot to be the current snapshot.
  3. Persist the current snapshot in the state manager, which typically means to serialize the current snapshot back to the state file serialization and write it to whatever remote storage we're talking about. (e.g. S3 or GCS)

With that said then, it's not clear to me why this behavior would be different depending on which backend you've selected and I wonder if something else was confounding things here that made it seem like the GCS backend behaved differently. I'm going to reclassify this as a general CLI bug for the moment to recognize that, since I think we ought to try to prove it as being an S3-backend-specific issue before we pass it over to the AWS provider team (who maintains that backend).

dnozay commented 2 years ago

As mentioned in the repro scenario

terraform workspace new -state=tf.state.default ${USER}_testing
terraform state list  | wc -l

shows no resources when using s3 remote state; I've also tried with gcs, and that worked much better.

tsibley commented 1 year ago

This is still an issue on:

Terraform v1.3.1
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.32.0
mattew commented 1 year ago

I have the same problem with:

Terraform v1.4.0
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v4.57.1

When state is stored in Azure storage account it works.