Make Terraform workspaces explicit

marco-m commented 6 years ago

The problem I am facing

I use and love Tf workspaces, but the problem I see is that they are implicit: the behavior of terraform depends on the last terraform workspace select.

Consider the following. In the same directory, the effects of the terraform apply depends on the current, implicit workspace:

terraform workspace select foo
... hack hack hack ...
terraform apply

terraform workspace select bar
... hack hack hack ...
terraform apply

If something went wrong and I want to reconstruct the steps, I cannot simply look at the shell history and look at a single line, I have to find the latest terraform workspace select. Now, imagine I worked in multiple consoles: I could have done the workspace selection in one console and the actual apply in another.

Doing a terraform show might not be enough, because in the meantime I could have done another select (maybe a runaway script ...)

To limit the potential for confusion or errors, we modify the shell prompt to always show the current workspace (like you can modify the shell prompt to show the git status of the current directory). But also in this case, the shell prompt is not recorded in the shell history and requires the developer to prepare the prompt beforehand.

An humble proposal

This proposal comes from the fly command-line utility of Concourse CI, where you could replace the concept of "fly target" with "terraform workspace".

In order to apply (or do any command that requires to know the current workspace):

terraform -ws foo apply

or, if this is more uniform to current subcommands structure

terraform apply -ws foo

Backwards compatibility / optionality

If backwards compatibility is needed or if in any case you want this feature to be optional, then maybe the terraform configuration could have an optional key similar to:

terraform {
  required_version = "> 0.7.0"
  explicit_workspace = "true"
}

that, if present, would require the workspace to be always passed on the command-line.

Hope you find this idea valid :-) and really thanks for Terraform!

apparentlymart commented 6 years ago

Hi @marco-m! Thanks for this feature request.

The current usage model for workspaces was modeled after git branches, with terraform workspace select <name> being an approximate analog to git checkout <name>. However, I do see your point that it can make it hard to understand the context in which a command was run when e.g. preparing for an outage retrospective, and indeed Terraform does not have a corresponding feature to git reflog to see a history of changes to the working directory's context.

We're generally cautious about adding more variants and options to the core workflow, since it increases the surface area of things to learn when new users come to Terraform, and hurts people's ability to transfer their Terraform skills from project to project, and company to company. However, I do see the use-case here and would like to think for a while about whether there's a different way we could meet this need without adding a new option, or changing the primary workflow.

In the mean time, you may be able to achieve something like what you want here by creating a Terraform wrapper script. A number of teams use this to impose additional constraints on workflow, and it's often a good stepping-stone towards running Terraform inside a CI system or other automation.

Terraform supports a TF_WORKSPACE environment variable (documented in our "running in automation" guide) that overrides any selection made with terraform workspace select. This was added specifically to allow wrapper scripts and other automation to force a particular workspace regardless of context.

export TF_WORKSPACE=determine_workspace_from_args_somehow()
terraform apply

I understand that this doesn't fully meet your goals here, since there is nothing stopping a user from directly running terraform apply and thus bypassing this mechanism.

Teams that need to create a reliable audit trail around Terraform changes often create a more regimented workflow with VCS commits triggering Terraform to run in a CI or other automation system. That way, all actions are serialized and tracked in a central location (in the CI system) rather than across potentially many personal shell histories, and each change can be easily mapped back to the git commit for the configuration that triggered it.

This also has other benefits for a production environment, such as managing any necessary credentials centrally within the CI system where it's more straightforward to implement best-practices such as time-limited, restricted-role credentials. It's also possible, with some additional configuration, to run terraform plan against changes proposed in pull requests as context for code review.

Building such a thing in-house is cost-prohibitive for many smaller teams, but there are building blocks available such as Hootsuite's Atlantis. HashiCorp's own Terraform Enterprise is also an option, designed specifically to meet enterprise requirements such as change tracking and credentials management.

With all of that said, we'll keep this issue open to remind us to think more about audit-ability of direct Terraform CLI actions; the above is intended just as some ideas of how teams with this sort of need might be able to meet them in the mean time.

marco-m commented 6 years ago

Hello @apparentlymart, thanks for the detailed answer, I appreciate the explanation and the suggestions.

I agree that at the end terraform should be invoked from a CI system, and this is my final goal. On the other hand, I am using terraform to bootstrap also the CI system, so I have a chicken and egg problem until I arrive there :-)

apparentlymart commented 6 years ago

Using Terraform to deploy your automation for Terraform is definitely an interesting challenge! It's true that this means there's always at least one Terraform configuration that isn't managed in your CI.

In my own experience doing this (in a previous role, before I joined HashiCorp) we just accepted that this one configuration was managed outside of our usual process, and took steps to make sure that mistakes here couldn't possibly affect our "real" production systems, such as deploying it on completely isolated AWS resources. On occasions where operator error led to a CI system outage, this was annoying but at least impacted only our internal workflow and not any customer-facing systems. The CI system also changed infrequently enough that we were able to justify a higher level of risk for updates to it, compared to the "real" system.

Different organizations have different constraints of course, and so I understand that this tradeoff isn't for everyone. As noted before, I would like to find a suitable answer here eventually, but unfortunately for the moment our focus is elsewhere (on configuration language usability, at the time of writing) and so I have to ask for patience on this particular problem.

marco-m commented 6 years ago

Yes I understand perfectly, don't worry, you gave me plenty of good suggestions / workarounds.

xird commented 6 years ago

Here's my workaround based on https://github.com/hashicorp/terraform/issues/15469#issuecomment-343499242 : http://blog.ampli.fi/explicit-workspaces-terraform/

TL;DR:

main.tf:

provider "aws" {
  region     = "eu-west-1"
}

variable "my_env" {}
variable "my_server_name" {}

resource "null_resource" "is_environment_name_correct" {
  count = "${var.my_env == terraform.workspace ? 0 : 1}"
  "ERROR: Workspace does not match given environment name!" = true
}

resource "aws_instance" "my_test" {
  # Amazon Linux AMI
  ami           = "ami-9cbe9be5"
  instance_type = "t2.micro"

  tags {
    Name = "${var.my_server_name}"
  }
}

dev.tfvars:

my_env         = "dev"
my_server_name = "Development server"

prod.tfvars:

my_env         = "prod"
my_server_name = "Production server"

nbetm commented 6 years ago

My big problem was visual feedback of which workspace was the active one. I come up with this small bash helper function for my PS1:

__tf_ps1 () {
    local workspace=""
    local print_format=${1:-(%s)}

    if [[ -z ${TF_WORKSPACE+x} ]]; then
        if [[ -f .terraform/environment ]]; then
            workspace=`cat .terraform/environment`
        else
            workspace=""
        fi
    else
        workspace=${TF_WORKSPACE}
    fi

    if [[ $workspace != "" ]]; then
        printf "$print_format" "$workspace"
    fi

    unset workspace print_format
    return 0
}

Here is a snapshot of my prompt. In this example "default" is the current workspace.

I hope this helps!

apparentlymart commented 6 years ago

Thanks for sharing that snippet, @nbetm!

On the subject of making the currently-selected workspace more visible, we also recently merged a change (#18253) that includes the workspace name in the terraform apply and terraform destroy confirmation prompts so that an operator has that additional context available when deciding whether to apply the generated plan.

pavitra-infocusp commented 9 months ago

Terraform, in the spirit of its legacy in describing infra as code, should also support "Workspace as code". Hear me out.

Example:

# main.tf

terraform {
  workspace {
    dev {
     // Define workspace-specific variables.
      ec2_instance_type = "t2.micro"
    }
    prod {
      ec2_instance_type = "t2.large"
    }
  }
}

Running terraform apply now fails with an error message saying that workspace is not selected. This makes it explicit and prevents any accidental deploys from unexpected workspace.

To deploy a workspace, run terraform -workspace=<workspace name> apply, where workspace_name can only of the values defined in the Terraform file above.

hashicorp / terraform