hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.34k stars 9.49k forks source link

Extend plan exit code to indicate pending state version upgrade #27952

Open Kardi5 opened 3 years ago

Kardi5 commented 3 years ago

Terraform Version

Terraform v0.13.6

Terraform Configuration Files

Independent from used configuration files

Debug Output

--

Crash Output

--

Expected Behavior

When upgrading from Terraform 0.12.26 to Terraform 0.13.6 following the guide at https://www.terraform.io/upgrade-guides/0-13.html the remote state file should be updated to include the new provider references. (Atleast I interpreted it that way, but maybe this only works with local states but not for remote states?)

Actual Behavior

The remote Terraform state was not changed (eg. still showing terraform_version as 0.12 and the provider still being in the 0.12.x format) after running terraform apply with the upgraded modules and newest providers. I can see that the remote state is being used by TF because a lease for the Azure Container file is aquired during apply and the modification date changes (though nothing changes in the file).

This results in an error when trying to run Terraform 0.14.7

Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider "azurerm".

You must complete the Terraform 0.13 upgrade process before upgrading to later versions.

I thought the state would update, as the guide says:

For this upgrade in particular, completing the upgrade will require running terraform apply with Terraform 0.13 after upgrading in order to apply some upgrades to the Terraform state, and we recommend doing that with no other changes pending.

I think terraform state replace-provider registry.terraform.io/-/azurerm hashicorp/azurerm would do what I want but this would require manually downloading and uploading/replacing a lot of states over multiple subscriptions.

Steps to Reproduce

  1. Have a remote state from TF version 0.12 in an Azure Storage Account Container backend
  2. Make sure no changes are pending
  3. Have configuration files with AzureRM resources in TF 0.13 format
  4. terraform init (Run through Ansible)
  5. terraform apply (Run through Ansible)
  6. No changes in the remote state are observed

Additional Context

In this system Terraform is called from Ansible (Ansible Terraform Module: https://docs.ansible.com/ansible/2.9/modules/terraform_module.html) which feeds Terraform the necessary variables for the configuration files via variable files in JSON format. Ansible also sets the following variables as env vars: ARM_CLIENT_ID, ARM_CLIENT_SECRET, ARM_SUBSCRIPTION_ID and ARM_TENANT_ID aswell as the backend config via Storage Account, Resource Group and Container Name.

TF_CLI_ARGS_init ist set to -reconfigure -upgrade and force_init to true (Ansible TF Module)

References

alisdair commented 3 years ago

Hi @Kardi5. I'm unable to reproduce this issue, and the additional details you describe around using Ansible make it difficult to understand what the problem could be here.

Here's what I did:

  1. Create a simple Terraform config:

    terraform {
      backend "consul" {
        path = "27952"
      }
    }
    
    resource "null_resource" "none" {
    }
  2. Terraform 0.12.30: run terraform init and terraform apply -auto-approve
  3. Verify the state exists using the Consul UI
  4. Terraform 0.13.6: run terraform init and terraform apply, see that there are no changes
  5. Check the state using the Consul UI and see that it has been upgraded

(There should be no difference here between any of the remote state backends, and I don't have any easy way to set up an Azure Storage Container backend.)

Are you able to adjust these simple reproduction steps to show the issue you're seeing? Removing as many of the complex details as possible would help us find the root problem here, so please try using Terraform directly instead of via Ansible.

Kardi5 commented 3 years ago

Hi @alisdair,

thanks for the fast response and for your time.

I ran the commands (TF init, plan and apply based on plan) and could also not reproduce the issue.

Looking at the Ansible module code the following was happening:

  1. Run Terraform init
  2. Run Terraform plan
  3. Run Terraform apply, but only if return code of Terraform plan was "2". But return code of TF plan was always "0" as format change/side-effects are not counted as infrastructure changes.

So the Ansible Terraform module was checking if TF plan reported a pending change and based on that either ran or did not run the TF apply command.

References: https://github.com/ansible/ansible/blob/stable-2.9/lib/ansible/modules/cloud/misc/terraform.py#L259 https://github.com/ansible/ansible/blob/stable-2.9/lib/ansible/modules/cloud/misc/terraform.py#L358 to 363

Maybe "meta-changes" could also be counted as changes? I also see the problems with this idea: TF plan command shows "no changes" (to the infra) but return code being "2" indicates changes. Maybe some kind of "upgrade-change necessary" notification?

alisdair commented 3 years ago

Ah, I see! That explanation for this situation makes sense now, thanks for figuring that out.

It's not immediately obvious to me how to implement your suggestion to indicate that a state version upgrade is pending, but I've marked this issue as an enhancement request so that we can investigate that if there's more support for this idea. I also retitled the issue to make it easier to find. Thanks again for the report!