hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
41.67k stars 9.41k forks source link

Do not delete a resource but create a new resource when change is detected #15485

Open Puneeth-n opened 7 years ago

Puneeth-n commented 7 years ago

Can terraform be configured to create a new resource but not delete the existing resource when it sees a change? For example with AWS step functions, one can either create or delete a state machine and not modify it.

I want terraform to create a new state machine each time it sees a change but not delete the old one as it might contain states.

apparentlymart commented 7 years ago

Interesting idea, @Puneeth-n! Thanks for suggesting it.

I think a question we'd need to figure out here is what does happen to the old instance. Should Terraform just "forget" it (no longer reference it from state) and leave some other process to clean it up? Or maybe it should become "deposed" but not deleted. In that case, a subsequent run would delete it, so that's probably not what you want here.

Puneeth-n commented 7 years ago

Thanks @apparentlymart I have been giving this some thought since the past few days as we plan to use AWS step functions in the near future.

My thoughts on this:

resource "aws_sfn_state_machine" "sfn_state_machine" {
  name     = "my-state-machine"
  role_arn = "${aws_iam_role.test_role.arn}"

  definition = <<EOF
{
  "Comment": "A Hello World example of the Amazon States Language using an AWS Lambda Function",
  "StartAt": "HelloWorld",
  "States": {
    "HelloWorld": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:337909755902:function:test_lambda",
      "Next": "wait_using_seconds"
    },
    "wait_using_seconds": {
      "Type": "Wait",
      "Seconds": 10,
      "End": true
    }
  }
}
EOF

destroy_action {
  create_new = true
  decommission = true    
  }
}

and a new cli option terraform --cleanup -force -target=resource to clean up decomissioned resources

apparentlymart commented 7 years ago

Okay... so this implies an entirely new instance state "decommissioned", in addition to "tainted" and "deposed", which behaves a bit like deposed but only gets deleted when specifically requested.

Ideally I'd rather avoid the complexity of introducing a new instance state, so I'd like to let this soak for a little while and see if we can find a way to build something similar out of our existing concepts, or to add a new concept that addresses a broader problem in this same area of gradually decommissioning things.

For example: a common problem is gracefully migrating between clusters that have actions to be taken when nodes leave, like Consul and Nomad clusters. Currently people do this in several manual steps of ramping up a new cluster, gradually phasing out the old cluster, and then destroying the old cluster. This seems like a similar problem to yours, and in both cases it seems like there's some action that is to be taken between the creation of the new thing and the destruction of the old thing. This idea has come up a number of times with different details.

cobusbernard commented 6 years ago

@apparentlymart: This would be really useful for dealing with i.e. AWS Launch Config changes. Each time you change them, you need to first create one, point your ASG to it, then destroy the old one via

lifecycle {
  create_before_destroy = true
}

It would be great to be able to somehow indicate to always create a new one and not delete the old one. That way you preserve the history of what the values were and can switch back easily.

maulik887 commented 6 years ago

Hi, Any chance of getting this in recent release?

Puneeth-n commented 6 years ago

@maulik887 what is your use case? When I was working with Step functions a year before I had this requirement Since there was no Update API and we didn't want Terraform to delete our step functions.

maulik887 commented 6 years ago

My case is, I'm creating API Gateway API and using it for Lambda proxy. Now I want to create API Stages per Lambda version, and don't want to delete old Stage version. e.g. On fresh start, I will create API, Lambda & Stage in API called V1_0, now when new lambda version comes, I want to create new API Stage v1_1 but don't want to delete older version.

ChappIO commented 6 years ago

I would like to check on the state of this, my usecase would be that I have an application which performs long running processes (hours, sometimes days) but I would like to roll out updates seemlessly. During an update I would place a new instance next to the current one (which I can do with create_before_destroy) and then leave the current instance running untill all processes are finished. (which I cannot do).

For me I would have 2 suggestions, either I would need a way to query the application for its status (http request) and create an endpoint for terraform, or I would like to schedule a resource to be deleted after X (in my case I would probably set it to a week). This way an update will also delete all older (finished) instances.

sjmh commented 6 years ago

I'd also find this useful. I have a use case where I'm deploying an s3 object, we're deploying them with the version tag in the object name, so 'myjs-.js'. When we change the version, I want a new s3 object deployed, but I don't want the old version removed.

jsmartt commented 6 years ago

I have a similar need to create a new version of a resource without destroying the old one, but in my use case, I don't really care about cleaning up old versions, so I'd be OK with Terraform just forgetting about the old resource. The way I'd see it working would be to have an additional attribute modifier similar to ForceNew, but just with a different workflow. It could be called ForceNewNoDelete for example, where it basically just skips the delete and verify via read steps.

I'm not sure how this would work with dependent resources though, which I wouldn't necessarily want destroyed, even though I need to grab the ID of the new resource that got created.

philnielsen commented 6 years ago

Similar use case to @sjmh , I want to keep my old lambda code versions around for a bit in s3, since they can now be attached to older lambda versions with alias and (soon) I should be able route traffic to those old versions, but there is no way to update and add a new code version and alias without deleting the old code version with aws_s3_bucket_object.

apparentlymart commented 6 years ago

The Terraform Core team is not currently doing any work in the area of this issue due to being focused elsewhere. In the mean I think some of the use-cases described here (thanks!) could be handled more locally by features within providers themselves, such as flags to skip deletion on a per-resource-type basis, so I'd encourage you all to open an issue within the relevant provider (unless I've missed someone, it looks like you're all talking about AWS provider stuff) to discuss the more specific use-case and see if a more short-term solution is possible.

There is already some precedent for resource-type-specific flags to disable destroying an object, such as nomad_job's deregister_on_destroy argument. Terraform Core still thinks it's destroying the job, but the provider ignores that request if the flag is set and leaves the job present in Nomad, no longer being tracked by Terraform at all.

Having some specific examples of solutions for this in individual providers is often a good way to figure out what a general solution might look like, or even to see if a general solution is warranted, so if you do open such a ticket please mention hashicorp/terraform#15485 in it so we can find them all again later when doing design/prototype work for this issue.

mohitm108 commented 5 years ago

I am also trying to get a similar feature. Whenever there is a feature change, it fires up a jenkins job and new tasks/services are created in AWS ECS cluster. I want to keep the old tasks/services also so that if anything goes wrong, I can roll my load balancer to the old tasks/services

Tensho commented 5 years ago

I build EBS volume with Packer, then take EBS volume snapshot with Terraform. For the potentioal rollback purpose, I don't want Terraform replaces (deletes) old EBS volume snapshot. ATM there is a hack with Terraform state manipulation in my script wrapper that runs Terraform commands to achieve desired behavior:

terraform apply ...
terraform state rm aws_ebs_snapshot.component

I just remove the resource from Terraform state to free place for the new one on the next apply.

It would be nice to have HCL resource declaration for this.

jpbochi commented 5 years ago

the solution my team implemented was similar to what @Tensho described. We just use terraform state rm ... in between applies so that the lambda aliases that we create wouldn't get destroyed.

WigglesMcMuffin commented 5 years ago

For some things we did recently. We actually use terraform state mv, to move the resource out of the way, and a -target to get a new one built. Then, subsequent applies will want to tear down the old resources, so we could apply that when we were ready, and that way the state was still tracked, and we didn't have to do manual clicking about

tomelliff commented 5 years ago

This is also not viable for me because our deploys are done entirely via CI with no manual interventions. We also deploy many times a day so we can't introduce manual steps each time and would be way better off just creating the task definition outside of Terraform but that loses our shared task configuration and adds more tooling when we'd like to keep it as simple as possible.

On Fri, 15 Feb 2019, 18:49 Tipene Moss, notifications@github.com wrote:

For some things we did recently. We actually use terraform state mv, to move the resource out of the way, and a -target to get a new one built. Then, subsequent applies will want to tear down the old resources, so we could apply that when we were ready, and that way the state was still tracked, and we didn't have to do manual clicking about

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform/issues/15485#issuecomment-464157650, or mute the thread https://github.com/notifications/unsubscribe-auth/AEJa5rJI17j8zMt-38zB7Z1GPCRTKSCtks5vNwE1gaJpZM4OPVsA .

thakkerdhawal commented 5 years ago

+1

deindorfer commented 5 years ago

Hard to believe Terraform doesn't have this feature. In the case of AWS snapshots, I somewhat obviously want more than one snapshot. I want to create a new snapshot, but KEEP THE EXISTING ONES, TOO!

Y'all don't support that? Seriously? "Make new, but keep existing"

Like when I install a new binary on my windows laptop, I don't want to delete and reinstall all of the other binaries, I want TO KEEP WHAT I'VE GOT and ADD THE NEW ONE.

Could this please get a look from The Hashicorp Core Dev Team?

hdryx commented 5 years ago

Same thing here with EMR cluster. I want to launch new cluster but keep the old one running. Terraform always destroy the old one and replace it with a new one. Hope there is a solution for that.

DavidGamba commented 4 years ago

Similar workflow here, want to deploy a new EC2 instance, leave the old one running until the load balancer marks the new one as good. Then run another plan/apply to delete the instance. I would like something like terraform plan -no-destroy.

It raises some issues around indexes, because it has to increment them, for example, my LB is using the instance count to add to the entries to the target group. During the no-destroy phase I expect the extra entry added (so somehow the state needs to increment its count) and then when we actually apply the plan that destroys then the count goes back to normal.

tomelliff commented 4 years ago

@DavidGamba you can already do that as long as your change to the EC2 instance forces a destroy (eg changing the AMI) by using the create_before_destroy lifecycle block.

It will also do this for you in a single terraform apply.

RaniSputnik commented 4 years ago

I think this feature would be useful in many cases and should be considered cross-provider.

I want to keep my old lambda code versions around for a bit in s3, since they can now be attached to older lambda versions with alias and (soon) I should be able route traffic to those old versions, but there is no way to update and add a new code version and alias without deleting the old code version with aws_s3_bucket_object.

This is exactly the same issue I face but with Google Cloud. Ideally I can have Terraform create new objects in cloud storage and publish new function versions without destroying the old ones.

Another area where I see this being useful is certificate management (again, cross provider), you never really want to delete your old cert, just provision a new one. This feature would also help there.

In terms of where I would expect this to be surfaced, I would like to see it as a lifecycle_rule perhaps duplicate_on_modify or abandon_on_destroy (as already suggested)? I think personally, I prefer abandon_on_destroy because I think if a resource could be modified, then it wouldn't be replaced, but perhaps there's a use case for that also?

milosbackonja commented 4 years ago

I hit same issue with API GW deployments in AWS. It would be great if I could keep old resources/versions.

ahmed1smael commented 4 years ago

+100

RaniSputnik commented 4 years ago

Hey folks, I've added an initial implementation of this with https://github.com/hashicorp/terraform/pull/23066 would love to know if it solves your use case - let me know what you think or anything that could be improved.

It's very simple to use, simply add the abandon_on_destroy = true to the lifecycle rules of the resource you want to be forgotten.

agurinov commented 4 years ago

Hi, @RaniSputnik!

Use case with API GW deployments. abandon_on_destroy allows to forget resource if this cannot be updated in-place. Deployment can almost always be updated in-place. Need some ability to forget always or something like abandon_on_update. My use case is about publishing new version of API GW (something like lambda version) that cannot be deleted or updated (for rollback functionality).

jacoor commented 4 years ago

I like @agurinov idea very much. I have the very same issue - I need the history of API Gateway deployments in case something blows up.

agurinov commented 4 years ago

@RaniSputnik , @jacoor there is same use case with aws_s3_bucket_object resource with enabled versioning on aws_s3_bucket resource.

RaniSputnik commented 4 years ago

Folks, for those who haven't seen it, @apparentlymart has written up a really informative response on my PR. It highlights some possible changes to prevent_destroy that may serve our purposes https://github.com/hashicorp/terraform/pull/23066#issuecomment-552052598

FrederikNygaardSvendsen commented 4 years ago

bump, i would also love to see such a feature

RobRoseKnows commented 4 years ago

I've been using a null_resource with a local-exec call to the AWS CLI to accomplish some of this, but native, stateful support would be good. Here's an example of uploading a public key to a bastion bucket.

resource "null_resource" "bastion_key" {
  triggers = {
    bastion_public_key_user = var.bastion_public_key_user
    bastion_public_key_path = var.bastion_public_key_path
    bucket                  = var.s3_pub_key_bucket_name
  }

  provisioner "local-exec" {
    command = "aws s3 cp ${var.bastion_public_key_path} s3://${var.s3_pub_key_bucket_name}/public-keys/${var.bastion_public_key_user}.pub"
  }
}
shadycuz commented 4 years ago

Also could use this for lambda aliases. When I change the name of the alias I don't want to remove the last one, I just want a new one created in its place. This is because I need an api gateway stage and lambda per version of my API I deploy. If my api has 10 versions that means I need 10 api-gateway stages and 10 aliases per function =/.

I do it externally from terraform now but @RobRoseKnows work around might be something to use for now.

tmshn commented 4 years ago

Let me link my old issue here: https://github.com/hashicorp/terraform/issues/9531

zopanix commented 3 years ago

Hey,

I see this issue has been open for quite a while and the idea of having an lifecycle of abandon_on_destroy makes a lot of sense in a particular use case on AWS (but probably also on GCP and Azure from what I know about them). I really think this feature would have it's place in terraform.

Scenario

AWS KMS. KMS is the Key Management System of AWS. It allows you to generate/import encryption key. Those key can be used to encrypt your EBS Volume, S3 Buckets and even data stored locally on your machine (the aws provider even has a data source to "un-ecrypt" them using the AWS API).

Now as always, nothing lasts forever, key WILL get compromised and because of that key rotation is mandatory. Now AWS offers a service to automatically rotate your keys, which is great but not sufficient. AWS does not offer the possibility to forcefully rotate a key at a given time. It's always "CRON" based (currently only yearly as far as I know).

Unfortunately, sometimes a key does get compromised and an urgent key rotation is needed. For those cases AWS leaves you with no other choice then using AWS Key Aliases. KMS Aliases are pointers to KMS Keys. It allows you to change the key behind an Alias and all resources using that Alias will now start using that new key, automatically (even auto-magically). Which is great right, problem solved. Terraform supports this, so tainting with a create_before_delete lifecycle solves this issue. Well not exactly, the problem will be solved only for data that managed by AWS. Data that has been encrypted using the KMS Service but is not managed by AWS will NOT be usable with that new key behind the alias. And since terraform does not allow to abandon a resource "automatically" (yes, you could do a state rm path/to/resource and let terraform add a new key but this is not easily added in an automated workflow, and it will probably be the workaround we'll currently be using) but it doesn't feel very elegant and "native" of the terraform way. Another objection that might come to surface for that use case is that keys are not immediately delete, there a X day window before the final deletion of the resource. But unfortunately, in larger organisations, people are not always aware of how, where and when a resource is used. So there is no telling if somebody used to encrypt a part of vital information locally, or made a mistake and is referencing that key specifically instead of the alias. So in order to avoid potential data loss, we would like to consider the possibility to to have this abandon_on_destroy lifecycle rules and we are aware that it might be used in non-elegant cases as well but we as a company think it's far less risky to leave a resource billed, then having inaccessible data, that is, by definition because it was encrypted, important to the company.

The idea is that terraform removes the resource from state leaving some other workflow handle that resource from now on. I know this might be opposed to a lot of the principles of terraform, but the option already exists under the form of a state rm command. It would be about just adding that process automatically in the terraform workflow on user demand. I'm sure the documentation would have a lot of red flags, stop signs, and sentences like "Think about it before using this feature."

Mock up

resource "aws_kms_key" "this" {
  lifecycle {
    abandon_on_destroy = true
  }
}

resource "aws_kms_alias" "this" {
  name = "alias/my-key"
  target_key_id = aws_kms_key.this.key_id
}

Known Issues

This feature might cause confusion for some users as in a lot of cases (cases where to me, it is simply not justifiable to use such a lifecycle) will cause weird errors. For example, if such a lifecycle is used on an AWS RDS instance to make sure it does not get automatically deleted by terraform. It will prevent the recreation of that RDS Instance because the name attribute has to be unique for one. And I know some people will say it is justifiable to have such a lifecycle on an RDS instance because it certainly will hold important data. But that's why final snapshots exists ! to prevent the loss of data. It allows you to have a snapshot of the database right before it got destroyed and allows you to restore the exact same database as the one you destroyed.

If you have any question about my use case, or want to argument against, or even have another solution that would prevent destroying the KMS key, I'd be happy to talk about it.

avoidik commented 3 years ago

@zopanix it seems to be able to handle mentioned known issue an another fundamental entity type will be required (in addition to resource, data, provider and others), which should have a mixed logic of resource and data blocks in one (e.g. create if not found, use if found), and apparently should have the same structure for output data.

# create if not found, otherwise try to use
reuse "aws_kms_key" "this" {
  name = "alias/my-key"
}

output "kms_key_id" {
  value = aws_kms_key.this.key_id
}
jamiesonbates commented 3 years ago

Adding a use case here where this concept would be useful:

We have a GCP Cloud SQL instance + replica. We connect to this instance via Cloud SQL Proxy, which takes "connection names". These are generated dynamically when an instance is created. There are some circumstances where the replica needs to be entirely replaced.

We are connecting from a GKE cluster, but have not yet implemented this infrastructure piece in Terraform. Thus, whenever the replica has to be replaced (i.e. when scaling up the instance size) we have to do a 2 step process. First, add a replica and point our deployment at the new replica "connection_name" and then destroy the old replica.

The concept of "replace_but_do_not_destroy" would be helpful in this case.

lobsterdore commented 3 years ago

Adding another use case here.

I am using Terraform to deploy services to Fargate, part of the deployment involves provisioning SSM parameters. Ideally I want to preserve the existing parameters so the current running version of the service stays in tact, in it's current state TF will remove the old parameters when adding new ones whilst the old version of the service is still running, which is a bit dangerous to say the least.

For this use case I want TF to just leave the old parameters in place and not care about their state at all so the abandon_on_destroy approach would work here.

To get around this issue at the moment I am shelling out and using the AWS CLI to create the SSM parameters.

stiliajohny commented 3 years ago

Following, Currently facing the same issue. Primarily with KMS keys

Adding another use case here.

I am using Terraform to deploy services to Fargate, part of the deployment involves provisioning SSM parameters. Ideally I want to preserve the existing parameters so the current running version of the service stays in tact, in it's current state TF will remove the old parameters when adding new ones whilst the old version of the service is still running, which is a bit dangerous to say the least.

For this use case I want TF to just leave the old parameters in place and not care about their state at all so the abandon_on_destroy approach would work here.

To get around this issue at the moment I am shelling out and using the AWS CLI to create the SSM parameters.

akashcommit commented 3 years ago

In my use case. Trying to create multiple vm instead of destroying previous vms and previous resources. Tried variabalising tfstate file but i think there is no service like. Whenever I trigger the pipeline it destroys the previous vm and creates new one with different name inputted. This limitation of terraform in limiting to me to not to create multiple vm. I am using Azure cloud and data resource to be Resource group, storage account, vnet and subnet. Does terraform has some feature, so that everytime vm is created it creates new tfstate file tagged with that particular hostname. This is serious issue and its limiting.

mhodgesest commented 3 years ago

I also have this issue with imagebuilder-components. I want to increment the version number, but incrementing the version destroys the previous version of the resource. Would be nice to be able to create new resources on version change but keep the old versions because thats how versioning is supposed to work.

kyontan commented 3 years ago

I have same issue with AWS CloudFront: aws_cloudfront_realtime_log_config and aws_cloudfront_distribution.

+1 for https://github.com/hashicorp/terraform/issues/15672

crazy-matt commented 3 years ago

Same issue here with Google cloud api gateway config. The abandon_on_destroy suggestion would help a lot. So that the devs ci/cd pipeline can manage their swagger file in a blue/green fashion style, enabling us platform to focus only on pushing changes related to the api gateway, like its service account.

avoidik commented 3 years ago

How do you plan to pick up resources left after abandon_on_destroy?

MeNsaaH commented 3 years ago

In my case I create an AMI from an image everytime I run terraform. It will be great if terraform can create new AMIs without having to destroy the old AMIs. We can have like a lifecycle for max_history and terraform will delete the older resource after the number of resource created is same as the max_history. The variable can be set to null if there should be no limit.

something like replacing a resource should use the max_history and create a new resource instead while outright destruction should destroy all the resources tracked.

I don't think abandon_on_destroy is such a great approach, it'll create a lot of orphaned resources. But having terraform keep a historical context of resources should be the best way to go.

okedeji commented 2 years ago

terraform state rm aws_ebs_snapshot.component

This works all well for me. I am able to forget the resource from state and keep history of the resource in AWS

ccmattr commented 2 years ago

My use case is we are trying to migrate to a new aws account and we want to create the new resources in the target destination first, test them thoroughly. Then do a cutover. Then tidy up on success. Ideally i would like to do this in 3 apply's:

  1. create resources
  2. cutover
  3. destroy old resources

On those lines, abandoning the resource wouldn't be ideal as i would have to manually clean up the old resources.

Is that something that could be done?

jammymalina commented 2 years ago

My use case is not to destroy the Lambda layer version when the source code changes. I want Terraform to deploy the new layer version and keep the old ones.

siran commented 2 years ago

I am sharing Layer between accounts, since this is done by version when the layer is deleted we have to update/redeploy the lambda functions from other accounts that use this layer (update the version number, since the previous layer is destroyed).

blafry commented 2 years ago

In our case, abandon_on_destroy would solve the problem in the case of certificate rotation on Azure Key Vault and purge protection turned on.