New lifecycle tag "block changes" to prevent changes from happening

hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

Other

42.73k stars 9.56k forks source link

Terraform Version

Any, eg 1.5

Use Cases

I am looking at using google workforce federation against Azure AD. In this, a GCP resource IAM policy is created which references a principalset:// principal type, which encodes the objectID of an AzureAD Group, which is a UUID. This basically says "anyone who is a member of this AzureAD group" will have the permission I assign to this GCP resource.

UUIDs are a little cumbersome to work with in code and configuration, especially when working with thousands of them. A yaml or json file full of UUIDs is pretty impenetrable. However I understand why google workforce federation works this way, because azure ad groups can be renamed, so the principalset does need to refer to the objectID rather than the displayName to prevent someone in azure getting permissions they shouldnt have just by renaming an azure group.

So I was thinking of using a data "azuread_group" resource to lookup the objectID based on name, and then config related to who I want to have permission on this or that gcp resource can reference group names instead, which is much more user-friendly for devs to engineer against.

This however suffers from the security issue mentioned above, if I start looking up group names using this datasource, and grabbing the objectID and using that in the principalSet:// principal string in a GCP resource IAM policy, someone could just rename a group.

I was thinking in this case though, terraform would notice, because the e.g. "google_folder_iam_member" field "member" would change (an update) which would either result in a "replace" operation or an "update" operation.

It then struck me, that I would like to have this "fixed" so that terraform would block these operations (basically error) if this field was updated. Something like:

lifecycle {
    block_changes = [ member ]
}

It struck me that this doesn't exist right now, but it would be quite useful.

I think then we could ensure that the permissions were resilient to someone trying to privilege escalate via group naming, whilst keeping the ability for teams to understand their code better by referring to permissions principals by name and not ID.

Attempted Solutions

None in terraform - I am investigating writing tooling that can help developers automatically translate between names and UUIDs in their code, but this isn't really a great approach

Proposal

lifecycle {
    block_changes = [ member ]
}

References

No response

Thanks for sharing this use-case, @gtmtech!

In today's Terraform this is an example of something we'd consider to be a kind of "policy check", which are usually enforced like this:

Run terraform plan -out=tfplan to produce a saved plan file.
Run terraform show -json tfplan to get Terraform's description of the plan in JSON format.
Use your own software to parse that JSON and implement whatever policy rules you have, reporting an error if anything does not meet policy.
If the policy check succeeded and the plan looks okay otherwise, run terraform apply tfplan to apply it.

Although this approach does require some extra steps, it also allows implementing arbitrary rules about what changes are acceptable without every one needing special support in Terraform itself.

There is one existing feature in Terraform that works like you've described: prevent_destroy makes planning fail if anything marked with it is planning for destruction. However, we consider that to have been a design error because:

Some users find it too coarse: they want to have some more nuance in the rule such as ignoring a plan to destroy rather than just rejecting it outright.
Some users find it too fine: they want to just say "never allow destroying any instance of type aws_db_instance" rather than having to annotate each one separately.
Putting policy directly in the configuration of the thing the policy applies to makes it too easy to accidentally disable the policy while making other changes. If it lives outside Terraform then it can have a separate explicit change process, possibly controlled by someone other than who the policy is constraining.
Putting policy in the configuration makes it the responsibility of the module author rather than the workspace operator. This is okay for the situation where those two are the same person, but it's common for someone to want to impose policy on a module they didn't write but are responsible for the effects of nonetheless.

All of those considerations led to the current posture of making policy checks something independent of Terraform itself. That approach allows everyone to tailor to exactly the rules they need, allows both hard failure and extra-approval-required conditions (as long as Terraform is running in automations that can support waiting for that extra approval), and makes the policy independent of what it is constraining. On the other hand, it does require some additional effort on the part of the person setting up the automation around Terraform; Terraform Cloud has this built in, but other automation methods may not.

With all of that said, I suspect that this feature would end up in the same regret bucket as prevent_destroy if it were to be implemented as a built-in, for many of the same reasons. Therefore my instinct is to ask you to implement this as a policy rule like I described above, but I'd be interested to hear if that seems infeasible for reasons I've not considered yet.

Thanks again!

hashicorp / terraform