gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.06k stars 977 forks source link

Migrate state if directory is changed #847

Open bfleming-ciena opened 5 years ago

bfleming-ciena commented 5 years ago

You know how terraform handles moving directories around... The next time you run terraform it detects that and asks if you want to migrate or copy your state to a new file.

When I'm using terragrunt, and sourcing one project, that behavior seems to have been lost.

If I move the directory and run terragrunt, the state is not found and it tries to create your objects.

ldormoy commented 4 years ago

In the meantime, is there a way to achieve this using "terragrunt state" commands?

The only way I found (using a S3 backend) is to directly mv the file using the aws CLI.

sclausson commented 4 years ago

Is there any workaround for this beyond moving the default.state file to the new backend path as @ldormoy suggests?

kannan-ak commented 4 years ago

+1

Just faced this issue today. Terragrunt creates a new statefile when the directories are changed. Would be helpful if terragrunt-cache can keep a reference of the statefile by using the configs it passing.

FernandoMiguel commented 4 years ago

@ldormoy moving it in s3 is not enough as the path is also in the statefile

steve-a-jones commented 4 years ago

Running in to this same thing -- what is the recommended pattern here? Experimenting with different directory structures but have live resources that I don't want to accidentally duplicate..

yorinasub17 commented 4 years ago

It would be nice to have a solution to this, but this means terragrunt needs to do its own state tracking and that is a significant deviation from terragrunt as a terraform wrapper. We've already run into multiple unexpected issues with terragrunt doing state tracking of the files in .terragrunt-cache, and having an additional layer of state tracking will further complicate the problem.

Since this isn't really an issue we have internally at Gruntwork (our patterns are more or less solidified such that we rarely do directory experimentation), we are unlikely to tackle on this any time soon. That said, if anyone wants to take a crack at this, we'd be happy to help out with design discussions/pointers/PR reviews. Next step would be to submit an RFC with a proposal of how the state tracking will work.

As far as workarounds go, check out https://community.gruntwork.io/t/terraform-state-is-messed-up-after-moving-folders/448/4?u=yoriy for a pattern based on terragrunt state commands.

steve-a-jones commented 4 years ago

@yorinasub17 Appreciate the response! The workaround above worked out for me -- this is exactly what i was looking for.

rgarrigue commented 4 years ago

I tried the workaround for a simple folder renaming case, like this

terragrunt state pull > temporary_state.json
cd ..
git mv old_bad_naming new_shiny_naming
cd new_shiny_naming
terragrunt state push temporary_state.json

The push failed on a Failed to refresh destination state: state data in S3 does not have the expected content. Any idea ?

@yorinasub17 without going in the complexity of state tracking to be able to do some auto remediation on apply, would it be possible to come up with a terragrunt mv command doing the above, maybe with a couple of safeguard like plan before & after, revert if different ?

yorinasub17 commented 4 years ago

The push failed on a Failed to refresh destination state: state data in S3 does not have the expected content. Any idea ?

This usually occurs when the digest does not match in dynamodb, which is typically caused by stale state locks. If you know for sure that nothing is touching the state, you can delete the corresponding entry in dynamodb to clear the lock.

without going in the complexity of state tracking to be able to do some auto remediation on apply, would it be possible to come up with a terragrunt mv command doing the above, maybe with a couple of safeguard like plan before & after, revert if different ?

I think this would work, but I don't quite have time at the moment to think through all the implications to make a call for sure (e.g., what happens when there is an error in the process? Should there be support for rollbacks? Would this support changes that affect relative paths in the terragrunt.hcl file?). As mentioned above, an RFC with the exact details of how this might work, or if the implementation is simple, a direct PR would be highly appreciated!

maximivanov commented 3 years ago

Sequence of steps mentioned above worked for me to rename/move module at path A to path B:

cd A/
terragrunt state pull > /var/app/backup.tfstate
cd ..
mv A B
cd B/
terragrunt init
terragrunt state push /var/app/backup.tfstate
terragrunt plan # there must be no changes reported

Note that remote state of A will not be removed from the backend. You need to remove it manually after the migration.

Alternatively, instead of pulling the remote state file, one could terragrunt state mv the resources from the remote state to a local file and restore from that file. But it leaves outputs in the remote state file so it is still not a clean solution.

related blog post

trallnag commented 3 years ago

@maximivanov, worked for me. Though I moved / copied the state directly in the S3 backend

kitos9112 commented 3 years ago

@maximivanov steps work nicely. I may also understand why Gruntworks folks stay reluctant and do not want to mess around their Terragrunt internal logic.

a0s commented 3 years ago

Lol, just found terragrunt reads tfstate from the current dir, but write it back to own cache only 🤦🏻 It skips updating tfstate file which it read before.

TF_LOG=trace tg state rm aws_s3_bucket.tf_backend
2021-10-26T12:52:41.270+0300 [DEBUG] Adding temp file log sink: /var/folders/m6/xclqx4bj1f709jdbq5g5z5_m0000gp/T/terraform-log2346698276
2021-10-26T12:52:41.271+0300 [INFO]  Terraform version: 1.0.8
2021-10-26T12:52:41.271+0300 [INFO]  Go runtime version: go1.17.1
2021-10-26T12:52:41.271+0300 [INFO]  CLI args: []string{"/usr/local/bin/terraform", "state", "rm", "aws_s3_bucket.tf_backend"}
2021-10-26T12:52:41.271+0300 [TRACE] Stdout is not a terminal
2021-10-26T12:52:41.271+0300 [TRACE] Stderr is not a terminal
2021-10-26T12:52:41.271+0300 [TRACE] Stdin is a terminal
2021-10-26T12:52:41.271+0300 [DEBUG] Attempting to open CLI config file: /Users/a0s/.terraformrc
2021-10-26T12:52:41.271+0300 [INFO]  Loading CLI configuration from /Users/a0s/.terraformrc
2021-10-26T12:52:41.271+0300 [DEBUG] checking for credentials in "/Users/a0s/.terraform.d/plugins"
2021-10-26T12:52:41.271+0300 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2021-10-26T12:52:41.271+0300 [DEBUG] will search for provider plugins in /Users/a0s/.terraform.d/plugins
2021-10-26T12:52:41.272+0300 [TRACE] getproviders.SearchLocalDirectory: found localhost/a0s/kubernetes-awaiter v0.0.1 for darwin_amd64 at /Users/a0s/.terraform.d/plugins/localhost/a0s/kubernetes-awaiter/0.0.1/darwin_amd64
2021-10-26T12:52:41.272+0300 [TRACE] getproviders.SearchLocalDirectory: found localhost/cookielab/postgresql v0.0.0-add-ssh-jumphost-support for darwin_amd64 at /Users/a0s/.terraform.d/plugins/localhost/cookielab/postgresql/0.0.0-add-ssh-jumphost-support/darwin_amd64
2021-10-26T12:52:41.272+0300 [WARN]  failed to read metadata about /Users/a0s/.terraform.d/plugins/localhost/cookielab/postgresql/0.0.0-add-ssh-jumphost-support/darwin_amd64/terraform-provider-postgresql: stat /Users/a0s/.terraform.d/plugins/localhost/cookielab/postgresql/0.0.0-add-ssh-jumphost-support/darwin_amd64/terraform-provider-postgresql: no such file or directory
2021-10-26T12:52:41.272+0300 [WARN]  Provider plugin search ignored symlink /Users/a0s/.terraform.d/plugins/terraform-provider-libvirt: only the base directory /Users/a0s/.terraform.d/plugins may be a symlink
2021-10-26T12:52:41.272+0300 [DEBUG] ignoring non-existing provider search directory /Users/a0s/Library/Application Support/io.terraform/plugins
2021-10-26T12:52:41.272+0300 [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2021-10-26T12:52:41.273+0300 [INFO]  CLI command args: []string{"state", "rm", "aws_s3_bucket.tf_backend"}
2021-10-26T12:52:41.283+0300 [TRACE] Meta.Backend: BackendOpts.Config not set, so using settings loaded from backend.tf:3,3-18
2021-10-26T12:52:41.283+0300 [TRACE] Meta.Backend: built configuration for "local" backend with hash value 666019178
2021-10-26T12:52:41.284+0300 [TRACE] Preserving existing state lineage "ec7e3991-0030-8cc5-c310-b82e2e4873e1"
2021-10-26T12:52:41.284+0300 [TRACE] Preserving existing state lineage "ec7e3991-0030-8cc5-c310-b82e2e4873e1"
2021-10-26T12:52:41.284+0300 [TRACE] Meta.Backend: working directory was previously initialized for "local" backend
2021-10-26T12:52:41.284+0300 [TRACE] Meta.Backend: using already-initialized, unchanged "local" backend configuration
2021-10-26T12:52:41.285+0300 [TRACE] Meta.Backend: instantiated backend of type *local.Local
2021-10-26T12:52:41.285+0300 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers
2021-10-26T12:52:41.286+0300 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/aws v3.54.0 for darwin_amd64 at .terraform/providers/registry.terraform.io/hashicorp/aws/3.54.0/darwin_amd64
2021-10-26T12:52:41.286+0300 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.0 for darwin_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.0/darwin_amd64
2021-10-26T12:52:41.286+0300 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/aws/3.54.0/darwin_amd64 as a candidate package for registry.terraform.io/hashicorp/aws 3.54.0
2021-10-26T12:52:41.286+0300 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.0/darwin_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.0
2021-10-26T12:52:42.202+0300 [TRACE] providercache.fillMetaCache: using cached result from previous scan of .terraform/providers
2021-10-26T12:52:42.266+0300 [DEBUG] checking for provisioner in "."
2021-10-26T12:52:42.285+0300 [DEBUG] checking for provisioner in "/usr/local/bin"
2021-10-26T12:52:42.285+0300 [DEBUG] checking for provisioner in "/Users/a0s/.terraform.d/plugins"
2021-10-26T12:52:42.286+0300 [INFO]  Failed to read plugin lock file .terraform/plugins/darwin_amd64/lock.json: open .terraform/plugins/darwin_amd64/lock.json: no such file or directory
2021-10-26T12:52:42.286+0300 [TRACE] backend/local: CLI option -backup is overriding state backup path to -
2021-10-26T12:52:42.286+0300 [TRACE] Meta.Backend: backend *local.Local supports operations
2021-10-26T12:52:42.286+0300 [TRACE] backend/local: state manager for workspace "default" will:
 - read initial snapshot from terraform.tfstate
 - write new snapshots to terraform.tfstate
 - create any backup at
2021-10-26T12:52:42.286+0300 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers
2021-10-26T12:52:42.287+0300 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/aws v3.54.0 for darwin_amd64 at .terraform/providers/registry.terraform.io/hashicorp/aws/3.54.0/darwin_amd64
2021-10-26T12:52:42.287+0300 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.0 for darwin_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.0/darwin_amd64
2021-10-26T12:52:42.287+0300 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/aws/3.54.0/darwin_amd64 as a candidate package for registry.terraform.io/hashicorp/aws 3.54.0
2021-10-26T12:52:42.287+0300 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.0/darwin_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.0
2021-10-26T12:52:43.162+0300 [TRACE] providercache.fillMetaCache: using cached result from previous scan of .terraform/providers
2021-10-26T12:52:43.222+0300 [DEBUG] checking for provisioner in "."
2021-10-26T12:52:43.240+0300 [DEBUG] checking for provisioner in "/usr/local/bin"
2021-10-26T12:52:43.240+0300 [DEBUG] checking for provisioner in "/Users/a0s/.terraform.d/plugins"
2021-10-26T12:52:43.241+0300 [INFO]  Failed to read plugin lock file .terraform/plugins/darwin_amd64/lock.json: open .terraform/plugins/darwin_amd64/lock.json: no such file or directory
2021-10-26T12:52:43.241+0300 [TRACE] backend/local: CLI option -backup is overriding state backup path to -
2021-10-26T12:52:43.242+0300 [TRACE] statemgr.Filesystem: preparing to manage state snapshots at terraform.tfstate
2021-10-26T12:52:43.244+0300 [TRACE] statemgr.Filesystem: existing snapshot has lineage "da3b5310-fc1b-3d23-5091-9cf752ff5471" serial 7
2021-10-26T12:52:43.244+0300 [TRACE] statemgr.Filesystem: locking terraform.tfstate using fcntl flock
2021-10-26T12:52:43.244+0300 [TRACE] statemgr.Filesystem: writing lock metadata to .terraform.tfstate.lock.info
2021-10-26T12:52:43.244+0300 [TRACE] statemgr.Filesystem: reading latest snapshot from terraform.tfstate
Removed aws_s3_bucket.tf_backend
2021-10-26T12:52:43.245+0300 [TRACE] statemgr.Filesystem: read snapshot with lineage "da3b5310-fc1b-3d23-5091-9cf752ff5471" serial 7
2021-10-26T12:52:43.245+0300 [TRACE] statemgr.Filesystem: creating backup snapshot at terraform.tfstate.1635241963.backup
2021-10-26T12:52:43.246+0300 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 8
2021-10-26T12:52:43.246+0300 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
Successfully removed 1 resource instance(s).
2021-10-26T12:52:43.271+0300 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2021-10-26T12:52:43.272+0300 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock
smitthakkar96 commented 2 years ago

I wrote a script, figuring out how to make this work with Atlantis

import boto3
from git import Repo
import yaml

def _get_s3_config():
    staging = yaml.safe_load(open('envs/staging.yaml', 'r'))
    production = yaml.safe_load(open('envs/production.yaml', 'r'))
    common = yaml.safe_load(open('envs/default.yaml', 'r'))
    production.update(common)
    staging.update(common)
    return {
        'production': production,
        'staging': staging,
    }

def _get_s3_client(s3_config):
    session = boto3.Session(profile_name=s3_config['account'], region_name=s3_config['region'])
    return session.client('s3')

def main():
    """
        Based on git diff, it updates the path for state files
    """
    repo = Repo('./')
    diff = repo.index.diff('origin/master')

    s3_config = _get_s3_config()
    s3_client_production = _get_s3_client(s3_config['production'])
    s3_client_staging = _get_s3_client(s3_config['staging'])

    # Iterate over all renamed files
    for patch in diff.iter_change_type('R'):

        # Filter all paths containing `terragrunt.hcl`
        if 'terragrunt.hcl' not in patch.a_path:
            continue

        # Get S3 client for environment
        if 'staging' in patch.a_path:
            client = s3_client_staging
            env_config = s3_config['staging']
        elif 'production' in patch.a_path:
            client = s3_client_production
            env_config = s3_config['production']

        # New Path has to be the state path without bucket-name
        s3_path_new = f'{env_config["remote_state_path_prefix"]}/{patch.a_path.replace("/terragrunt.hcl", "")}/terraform.tfstate'
        # Old path has to be the state path with bucketname
        s3_path_old = f'{env_config["bucket"]}/{env_config["remote_state_path_prefix"]}/{patch.b_path.replace("/terragrunt.hcl", "")}/terraform.tfstate'

        print(f'Migrating State from Path={s3_path_old} to {s3_path_new}.')

        # Copy the file to new path
        client.copy_object(Bucket=env_config['bucket'], CopySource=s3_path_old, Key=s3_path_new)
        # Delete the old object
        client.delete_object(Bucket=env_config['bucket'], Key=s3_path_old.replace(env_config['bucket'] + '/', ''))

if __name__ == '__main__':
    main()