hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.29k stars 9.49k forks source link

A lifecycle flag to never refresh resources #15472

Open gtmtech opened 7 years ago

gtmtech commented 7 years ago

Terraform 0.9.8

Background: Terraform likes to fully manage the resources it creates, and goes somewhat to trying to reconcile their latest state by doing a refresh/fact-find before issuing a plan or applying a plan. This is great behaviour. Its also great that terraform can easily spin up an environment in its entirety, and this is very useful for creating dev, prod environments very easily.

However certain development workflows involve subsequent changes to these resources which terraform is not necessarily good at handling itself, maybe because its current api support isnt good enough, or maybe because some things are complex to manage.

An example of this (but not limited to the AWS provider) is the aws_rds_cluster resource. We regularly restore a snapshot (which creates a different aws rds cluster, as you can never restore over the top of an existing cluster in aws), and then update dns and/or apps to point to the new cluster for a seamless db upgrade. Then we delete the old database, but not the dns records. Putting aside the fact that terraform's aws_rds_cluster resource for doing this has bugs preventing a successful restore from a snapshot anyway, assuming it would work, this is very unwieldy in terraform, and perhaps even impossible currently.

Suppose then the initial aws_route53_record sensibly looks like this:

resource "aws_route53_record" "dbreader" {
    ...
    records = ["${aws_rds_cluster.foo.reader_endpoint}"]
    ...
    lifecycle = {
        ignore_changes = ["*"]
    }
}

The dns records managed by terraform point originally to aws_rds_cluster.foo.reader_endpoint - but now that database has been removed one of the following happens:

1) terraform needs to create the aws_rds_cluster again for it to be able to supply the aws_route53_record resource with the record field, despite ignore_changes being present.

2) if count=0 is set on the aws_rds_cluster (now deleted), then terraform will complain that aws_rds_cluster is no longer found wrt to the records field in aws_route53_record.

3) if count=0 is set on both aws_rds_cluster and aws_route53_record, terraform will attempt to delete the record resource, resulting in downtime within the system

In short, ignore_changes ignores changes in the apply phase, but not during the validate phase.

I believe this workflow would be solved if it is possible to add a lifecycle flag which would refuse to refresh resources. That way terraform would always rely on the statefile and not need to make any changes at all, regardless of what had happened to the actual resources. A kind of refresh=false flag, which affected a resource, and all its dependencies.

I would welcome any comments - perhaps you can already do what I am after..

apparentlymart commented 7 years ago

Hi @gtmtech! Thanks for writing up this proposal.

I just want to make sure I understand fully what you're suggesting here: in the example scenario you described, is your intent that you would set this "don't refresh" flag on the aws_rds_cluster.foo resource, so that it would remain available in the state even if the underlying physical resources is deleted/updated?


Although it's a bit of a tangent from what you requested here and a more AWS-specific question, I'm also curious about the "replace RDS instance via snapshot" problem itself. I did this in the past a couple times in on a more ad-hoc basis to restore from backups, rather than as a routine procedure, and was able to get it done with the following (rather awkward) workflow:

This is of course not a reasonable workflow for routine work, but I was wondering if this is something you'd tried and found it doesn't work anymore for some reason or if you'd just rejected it because of how awkward it is. If the above has become broken somehow, it'd be good to track that in an AWS provider bug. In the long run, it'd be nice to have a way for Terraform itself to manage this, but indeed it's rather awkward right now due to the mismatch of how Terraform sees it (replacing an existing instance) vs. how RDS sees it (making a new instance alongside the old).

gtmtech commented 7 years ago

Thanks for your writeup - I was able to follow the workflow above which is good, however as you say, restoring the db is quite a common workflow for us (every day) - so having to run 4 or so terraform commands each day per env is quite a hassle. But thanks for the guidance

rafalcieslak commented 5 years ago

Another use-case where such flag would be useful is discussed in this AWS provider issue. There are some resources that, in large amount, take a very long time to refresh. If a developer who prepares this resource knows that it shouldn't require to be ever refreshed (as it gets never updated by means other than terraform), they would be able to save a lot of time by marking such resource as "do not refresh". Any state inconsistencies introduced by the use of such flag would be at their own responsibility - though in practice it shouldn't be difficult to avoid them.

Note that in this case using -refresh=false CLI flag is insufficient, as all other resources may need to be regularly refreshed.

dgcaron commented 3 years ago

this would also be helpfull for resources that er in the "dataplane" of azure. for instance, if you are creating secrets in a keyvault or folders in a storage account and afterwards you restrict network access to these accounts. terraform will stall on these folders and secrets as the SDK can't reach the folder (as it is being blocked). seeding a folder structure is something we do once at the initial deploy

so-jelly commented 1 year ago

i am have @dgcaron 's exact problem. i have a read only account to perform terraform plan in CI and unless I give that account access to read secrets, I will get failure.

resource "google_secret_manager_secret_version" "secret" {
  lifecycle {
    ignore_changes = all
  }
  secret      = google_secret_manager_secret.secret.id
  secret_data = random_password.password.result
}

Plan as a read only user

Error: Error reading SecretVersion: googleapi: Error 403: Permission 'secretmanager.versions.access' denied for resource 'projects//secrets//versions/1' (or it may not exist).

watsonjm commented 11 months ago

This would also be helpful for the databricks_mount resource: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mount. I need to seed the mount but I don't care about it afterwards, but every Terraform plan brings the entire cluster up just so it can check the state, which takes a lot of time and unneeded compute.

gregnuj commented 7 months ago

I have an issue where I use terraform to create and apply branch protection to github repos, this leads to regular failures due too many github API requests. The branch protection does not really need refresh after it is applied initially but accounts for the bulk of the API calls. In this case this setting could prevent the unnecessary API calls to github.

image
AurimasNav commented 5 months ago

This would also be helpful for the databricks_mount resource: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mount. I need to seed the mount but I don't care about it afterwards, but every Terraform plan brings the entire cluster up just so it can check the state, which takes a lot of time and unneeded compute.

databricks mounts are annoying when developing/debugging locally, every terraform plan waits for cluster to come up, I would love some cli flag to ignore specific resources during refresh stage in these cases

jeffreybartosiewicz commented 4 months ago

Bumping this thread with another example. I have a TF repo to deploy and configure our Bigip F5 load balancers in azure. The script runs a Declarative Onboard process when the load balancer is first built but after that i dont care if the config changes. TF sometimes complains that the DO has changed and it adds a lot of time to deployments. Being able to set a lifecycle flag to no longer refresh the resource would be very handy.