Using `terraform workspace` does not work as expected

pwillis-els commented 2 years ago

Report from a user about bugs using Workspace: https://github.com/pwillis-els/terraformsh/issues/10

it seems terraformsh don't use workspaces and there is no way I found applicable to do the execution against some workspace

See in screenshot how it has ignore that we are in different workspace and tried to alter the current state of default workspace instead of use the current space?

pwillis-els commented 2 years ago

@AMKamel I'm not sure what's going on here, I haven't used workspaces with the tool yet.

One thing to try would be to set environment variable TF_DATA_DIR to one directory, like:

export TF_DATA_DIR=".terraform"

This mimics the default behavior of Terraform to re-use the same local state directory for your Terraform runs.

You can also use the new -n option in the unreleased version of terraformsh, which does something similar, in that it doesn't delete the temporary TF_DATA_DIR that terraformsh creates and destroys on each run

AMKamel commented 2 years ago

Isn't using TF_DATA_DIR to something like /tmp/xyz.com.terraform.d that I have to take care or this dir as long as I want to keep track of the state, if so this is very hard as I am using terraform in automation and using a postgresql backend with spaces is safer, right?

pwillis-els commented 2 years ago

A postgres backend with spaces I would think is safer, yes.

The temporary TF_DATA_DIR directory terraformsh creates (and destroys) is intended to draw attention to the fact that Terraform keeps local state (even if you're using a remote state backend like S3). A lot of users don't realize how much of using Terraform depends on this local state, which then causes problems when they use it in automation and it doesn't work the way it did on their laptop.

Basically, if you use Terraform the way it comes by default, there will be these .terraform directories littering your root modules, and if you try to use the same root module directory to apply Terraform to different environments, or with different configuration, you can end up "stepping on" the old .terraform directory from a previous run. I added some commands like terraformsh clean to try to work around that issue, but it just got so annoying that I added the automatic temp TF_DATA_DIR as a workaround. So probably this is what's causing the issue with your terraformsh workspaces commands.

I'm sorry I haven't gotten a chance to test workspaces out yet, hopefully this weekend I can take a look at it. Can you tell me more about your use of postgres backend with spaces?

Thanks

peterwwillis commented 2 years ago

@AMKamel I've started working on some tests for postgres state backend, as well as some command-line parsing fixes, and I think I understand the issue better now.

So, it looks like what Terraform is trying to do with the workspace command is to keep both remote state and local state. Your backend state may be Postgres, but which workspace you're actively using is only stored in local state (which I guess makes sense). But since Terraformsh is creating and destroying local state each run, you lose which workspace is selected locally.

The previous way to solve this problem in Terraformsh 0.11 or earlier would be to set TF_DATA_DIR=.terraform, and then Terraformsh won't mess with it.

The next way to solve it is with the code I committed a few weeks ago, where the -n option also prevents destroying the temporary TF_DATA_DIR.

Both of those options will preserve the local state directory, so running terraformsh workspace select abukamel.something and then running terraformsh plan, should work, I think?

In addition, I've just pushed some new code which changes how command-line parsing works for Terraformsh. Now you can run a command like this:

terraformsh workspace select abukamel.something plan

With dry-run mode, we can see it runs these commands:

+ terraform init -input=false -reconfigure -force-copy
+ terraform workspace select abukamel.something
+ terraform init -input=false -reconfigure -force-copy
+ terraform get -update=true
+ terraform validate 
+ terraform plan -input=false -out=/home/psypete/git/PUBLIC/terraformsh/tests/foo/tf.a646d79d63.plan

This will generate a new local state and destroy it at the end of the run, but it allows you to run terraform workspace select ... during the run, so you can still switch to the workspace you want before running your plan command.

It seems the above runs 'terraform init' twice, but it looks like it works fine (in my testing, anyway). If that became a problem, you could avoid it with the -D option to only run commands you specify on the command line, like this:

$ terraformsh -N -D init workspace select abukamel.something plan
+ terraform init -input=false -reconfigure -force-copy
+ terraform workspace select abukamel.something
+ terraform plan -input=false -out=/home/psypete/git/PUBLIC/terraformsh/tests/foo/tf.a646d79d63.plan

So, based on these options, will one of these methods work for you?

AMKamel commented 2 years ago

Great, thanks, to resolve the issues I was having I tried to reuse the commands outputs from terraformsh using python, I do use postgresql as a backend to creating terraform cloud workspaces, so pgsql is holding the state for terraform cloud workspace, and terraform cloud workspace which is the remote backend is holding the state for infrastructure code.

So when destroying, I destroy the state at terraform cloud workspace first then destroy the state at pgsql backend workspace which will delete the terraform cloud (now empty) workspace, then finally delete the pqsql workspace used as backend to terraform cloud workspace itself.

Snippet creating pgsql workspace

        shutil.copytree(f"{terraform_code_dir}/", f"{config_dir_location}/")
        zx.run_shell_print(
            f"cd {config_dir_location} && \
            terraform init -input=false -reconfigure -force-copy -backend-config {backend_config_file_location}")
        try:
            zx.run_shell_print(
                f"cd {config_dir_location} && \
                terraform workspace select {fqdn}")
        except:
            zx.run_shell_print(
                f"cd {config_dir_location} && \
                terraform workspace new {fqdn}")

        plan_file_location = f"{config_dir_location}/config.plan"

        if deployment_action == "apply":
            zx.run_shell_print(
                f"cd {config_dir_location} && \
                terraform plan -out {plan_file_location} -input=false -var-file {vars_file_location} && \
                terraform apply -input=false -auto-approve {plan_file_location}")
        elif deployment_action == "destroy":
            zx.run_shell_print(
                f"cd {config_dir_location} && \
                terraform plan -destroy -out {plan_file_location} -input=false -var-file {vars_file_location} && \
                terraform apply -input=false -auto-approve {plan_file_location} && \
                terraform workspace select default && \
                terraform workspace delete {fqdn}")

snippet creating infrastructure using terraform cloud

        os.environ["TF_REGISTRY_DISCOVERY_RETRY"] = "10"
        os.environ["TF_REGISTRY_CLIENT_TIMEOUT"] = "60"
        os.environ["TF_IN_AUTOMATION"] = "true"

        shutil.copytree(f"{terraform_code_dir}/", f"{config_dir_location}/")
        zx.run_shell_print(
            f"cd {config_dir_location} && \
            terraform init -input=false -reconfigure -force-copy \
            -backend-config {backend_config_file_location}"
        )
        if deployment_action == "apply":
            zx.run_shell_print(f"cd {config_dir_location} && \
            terraform apply -input=false -auto-approve")
        elif deployment_action == "destroy":
            zx.run_shell_print(f"cd {config_dir_location} && \
            terraform destroy -input=false -auto-approve")

pwillis-els / terraformsh

Using `terraform workspace` does not work as expected #11