odd behaviour with global state store when using mixed versions (TF 0.11 and 0.12)

gbataille commented 5 years ago

Terraform Version

Terraform v0.11.14
+ provider.aws v2.17.0
+ provider.template v2.1.2

Terraform Configuration Files

terraform {
  required_version = "~> 0.11.0"
  backend "s3" {
    role_arn = "arn:aws:iam::xxxxxxx:role/role_terraform"
    bucket = "pix4d-terraform-xxxxx"
    key    = "state.tfstate"
    region = "us-east-1"
    dynamodb_table = "pix4d-terraform-xxxxxx"
    workspace_key_prefix = "xxxxx"
  }
}

Crash Output

Terraform doesn't allow running any operations against a state
that was written by a future Terraform version. The state is
reporting it is written by Terraform '0.12.3'

A newer version of Terraform is required to make changes to the current
workspace.

Expected Behavior

Able to do terraform init on a project that is still using 0.11

Actual Behavior

If some project in the state store are migrated to 0.12, then the state.tfstate file that is at the root of the state store is indicating "version": 4 and terraform init using terraform 0.11 will fail

Steps to Reproduce

configure a global state store (like in S3)
create a terraform project in 0.11 using this state store
create a terraform project in 0.12 using the same state store (the state.tfstate at the root of the state store should say "version": 4)
in the first project (0.11) delete your .terraform folder to start from scratch
in the first project (0.11) run terraform init

Workaround

(Seems to work only if you use workspaces. Using default does not work as far as I can tell)

Run terraform init. It fails
Run terraform workspace select xxx. If you don't have workspaces and use default, create a new one (terraform workspace new xxxx)
Re-run terraform init. This time it works and you can move forward.

Additional Context

Due to the workaround, and the fact that this is a transient thing (while all our repo migrate), this is a low priority issue that can be dealt with. If there is a quick fix though.... ;)

To try and clarify a bit more our setup:

we have a company-wide centralized state store in S3. All our teams/projects are using the same state/lock store.
That means that each team uses a different key which translates to a "folder" in the S3 store.
Some teams have started to migrate to 0.12 but not all. We obviously don't want to synchronize the migrations and we felt it was ok, since each project states contain it's own version.
However, on terraform init, terraform seems to try and use some global state when contacting the global state store which makes it fail.
The funny thing is that there is a workaround. Seems that the first init fails but not completely: at least workspaces can be listed. Then you can change workspace. And if you do so, then you have access to a specific state file (I guess) and a further init will run (I guess because it ignores the global state file).
Maybe it makes a difference also that we don't use the default workspace

mildwonkey commented 5 years ago

Hi @gbataille ! I wasn't able to reproduce this issue, and I am hoping you can help me out. I used a backend configuration very similar to yours (only changing bucket name), and switched between terraform versions 0.11 and 0.12 (using different workspaces) without seeing this issue.

It's possible that this issue has already been fixed, so to start I'd like to know if you are still seeing the issue in the latest version of terraform.

You said that you don't use the default workspace. Is it possible that there is some state that was accidentally stored in the default workspace? It would be interesting to see the output of terraform state pull in a directory where you have run terraform init but have not selected a workspace.

gbataille commented 5 years ago

Hey @mildwonkey Unfortunately, I don't have a TF 0.11 state left, so it's a bit harder to reproduce it in the right conditions.

terraform state pull from a fresh init, with the default workspace still, give

{
  "version": 4,
  "terraform_version": "0.12.5",
  "serial": 43,
  "lineage": "105359c4-924b-5a03-8c6b-5e7971591850",
  "outputs": {},
  "resources": []
}

This is the file that is at the root of my state store, outside of any workspace_key_prefix and even more outside of any workspace.

If I take an extract of my central state store, at the root, I have

                           PRE inspection-cloud/
                           PRE platform_cloud/
                           PRE platform_cloud_services/
                           PRE raffi_test/
                           PRE web-marketing/
2019-07-01 09:20:50        157 state.tfstate

Then in each folder (corresponding to a workspace_key_prefix, corresponding to a different team/application using terraform and this central state store, I have

                           PRE production/
                           PRE staging/

Which I think is the issue. Since those workspace_key_prefix terraform state can be based on different TF version, there really ought to be a "default" state.tfstate per workspace_key_prefix folder rather than one for the entire state store.

--> does that make sense?

Again, as mentioned, since it's only on a specific migration path, and since there is a reasonable workaround, I'm not sure it's worth spending too long on it.

mildwonkey commented 5 years ago

That's helpful information, and confirms my suspicion. I set up the reproduction case, and created a few states with both 0.11 and 0.12. When I run terraform state pull from the default workspace, this is what I get:

tf state pull
Empty state (no state)

Someone (or some process) created (and deleted) resources in the default workspace. Remove that "dangling" state file and the problem will disappear.

gbataille commented 5 years ago

ah... interesting. That's not a "needed" artifact. Cool. I'll do that (but again, since I don't have repos in 0.11 anymore, not sure we'll prove anything). This thread might prove to hold some interesting documentation if someone ends up in the same place. Thanks for having a look

gbataille commented 5 years ago

Well this file is used!

Since I removed it, trying to do a terraform init:

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Error refreshing state: state data in S3 does not have the expected content.

This may be caused by unusually long delays in S3 processing a previous state
update.  Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value:

With debug logs: https://gist.github.com/gbataille/39baf6b7a8d7b434ae44a9e66f4af37f

so it is looking for the state.tfstate at the root of the store, outside of any workspace_prefix_key

It's possibly because in DynamoDB, I have a digest for this file

pix4d-terraform-state-store/state.tfstate-md5 | f254438c3ceb0678771d1450d1b6ef1a
-- | --

but I'm a bit worried that if I kill it I'll break something bad...

In the meantime though, I restored the state.tfstate that I had deleted

mildwonkey commented 5 years ago

Hi @gbataille ! If that state file, and therefor the default workspace, is used, you are experiencing expected behavior. This is (currently) a workflow issue, not a terraform issue. If you'd like, you can open a new feature request or rewrite this issue as a feature request to change the init behavior.

gbataille commented 5 years ago

well that's the thing, I'm not using the default workspace (or I don't think I am). But my understanding is that the default workspace always exists. And when you do terraform init from a fresh copy of your TF files (e.g. fresh git clone, no .terraform folder, then you end up in the default workspace even if it's not used.

There are 2 things odd for me:

why does it have a state since we don't use it (as shown above the state is empty)
why is it accessing the default state outside of the workspace_key_prefix, i.e. some global default state? Talking in S3 path with the examples I gave above, I would expect/understand that it tries to access pix4d-terraform-state-store/platform-cloud-services/state.tfstate but not pix4d-terraform-state-store/state.tfstate

hashicorp / terraform