hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.57k stars 9.54k forks source link

Removed block - instance support #34439

Open drandell opened 10 months ago

drandell commented 10 months ago

Terraform Version

v1.7.0-beta1

Use Cases

All of our internal modules accept more than one input either in the form of a list of objects or a map of objects. If a user accidentally deletes an object and ends up with state drift or there is a provider issue leading to some form of state drift (this is often apparent with some resources in GCP) then I should be able to remove that singular entity.

Attempted Solutions

N/A

Proposal

removed {
  from = module.gcs.google_storage_bucket.this["my-single-bucket"]
  lifecycle {
    destroy = false
  }
}

module "gcs" {
  source  = "drandell/cloud-storage/google"
  version = "1.0.3"

  buckets = {
    #"my-single-bucket": {}, ## Lets presume this entity was deleted in GCP by a user and now leads to state drift
    "my-other-bucket": {}
  }
}

References

No response

jbardin commented 10 months ago

Hi @drandell,

Thanks for filing the issue! A removed block implies that a target "resource" has been removed from the configuration. There are a few problems with identifying individual instances via a removed block, but I think a primary concern is that there is still a resource block managing that resource, which may add conflicting instance keys at a later time, or have other conflicting lifecycle parameters defined in ways which can't be resolved. There's also problems based on the internal representation, where the necessary metadata is tacked on a per-resource basis, rather than per-instance.

For control of individual instances, it's more likely that there will be additional lifecycle options for that resource's block, rather than a removed block, since existing resources instances are always controlled by the resource block. You can also see some discussion in an older version of the issue #15485, about how we might go about "forgetting" individual instances.

I would note however, that your example should not require a removed block. If the object was deleted in GCP, and you removed the instance key from the configuration, the provider should indicate to Terraform that the instance no longer exists resulting in no changes in the plan.

spkane commented 8 months ago

I am adding this here, and it was in its own feature request but has been deemed as possibly related to this...

The idea here, is not to support telling Terraform that a resource has already been destroyed, but to make it possible to tell Terraform to simply delete the resource from the state during a remove workflow instead of trying to actually delete it. This would be useful when destroying a higher-level object is more reliable and will result in the destruction of the other object anyhow (although it could also allow for abandoning an object as well).


Use Cases

The core idea is to create a way to tell Terraform to remove a resource from the state file during a destroy workflow instead of contacting the owning API to delete the object.

This would make it possible to handle nested objects, like kubernetes_namespace resources, that exist inside a Kubernetes cluster, which you will destroy in the same workflow, but you do not need Terraform to remove via the owning API.

Attempted Solutions

Hashicorp would recommend that people use a single Terraform workflow to spin up a k8s cluster and then install things into that cluster in another Terraform workflow.

However, there are many times that at least some bootstrapping will occur in the initial Terraform workflow. This fix would allow users to quickly identify resources that need not be destroyed via their API.

Proposal

lifecycle { 
  # This would cause Terraform to remove the resource from the state file instead of calling the owning API to delete.
  state_only_destroy = true
}
gk-drw commented 7 months ago

I believe state_only_destroy = true might not be specific enough compared to implementing full support for instances in removed blocks. That is, it would not allow us to specify that an individual, specific resource instance - but not any other instances - should be "forgotten" instead of destroyed.

Curious, why does this work for import and moved blocks but not for removed blocks? It seems natural and good for DX to support symmetry here.

jhancock93 commented 7 months ago

I am using the following S3 module https://registry.terraform.io/modules/terraform-aws-modules/s3-bucket/aws/3.15.2 that creates several conditional resources including an aws_s3_bucket_public_access_block. Historically, I have created many buckets with attach_public_policy = true that creates a indexed resource module.bucket.aws_s3_bucket_public_access_block.this[0] containing the policy block. However, our InfoSec team has recently removed our ability to create or modify the public_access_block resources. Setting attach_public_policy = true fails to create new buckets because of the new policy, but setting attach_public_policy to false will try to remove the policy from existing buckets and fail. So I have a use case where I was hoping to use the "removed" block to remove the indexed resource from the state file without deleting the actual resource (since I can't do that). Is there an alternative approach?

paul-hugill commented 6 months ago

I've got another use case for needing to remove an instance.

With the AzureRM provider, we have it automatically registering all the Azure Providers (skip_provider_registration = false) but on some of the older versions, it didn't include some services, so we had a single azurerm_resource_provider_registration with for_each for those additional ones we needed to enable.

Now I want to upgrade to a provider version that tries to automatically register the additional ones and that resource block causes plans to fail. Being able to use removed with azurerm_resource_provider_registration.provider["Microsoft.DataFactory"] would solve that.

I don't want to unregister the provider, which is what removing it from the for_each would do and we have about 200 workspaces that would need this, so not something I want to do terraform state rm on.

jcogilvie commented 5 months ago

For any google spelunkers coming to this issue, it was not at all clear to me that using a removed block for something that has an index as a means of toggling on/off will work if you simply omit the index.

module "foo" {
  count = var.enable_foo ? 1 : 0
}
# inside module.foo
resource "something" "bar" {
  count = var.enable_bar ? 1 : 0
}

If at the root level you specify:

removed {
  from = module.foo.something.bar
  lifecycle {  
    destroy = false
  }
}

In most terraform contexts this would throw an error because it's unable to traverse the address, but for a removed block, this Just Works for removing bar.

jskinne3 commented 4 months ago

I also have a use case for this: forgetting items after an incomplete resource move

Shocktrooper commented 4 months ago

I just ran into this where I wanted to delete a specific instance in a map where the key has spaces and escaping via the terraform state rm <resource address> has not been a pleasant experience considering there isn't an example of how to do so in the official documentation

dsbibby commented 4 months ago

+1 for this.

In case it helps anyone, I came up with this workaround... as these state actions are chained, you can move the resource first, then remove the moved resource all in the same apply, e.g.:

moved {
  from = module.foo.aws_iam_user[0]
  to   = module.foo.aws_iam_user_0
}
removed {
  from = module.foo.aws_iam_user_0
  lifecycle {
    destroy = false
  }
}
VladislavDubrovenski commented 4 months ago

@dsbibby I confirm this works. Here is another example (my use case is moving infra to another repo):

moved {
  from = module.rg.azurerm_resource_group.rg["rg2"]
  to   = module.rg.azurerm_resource_group.rg_2
}

removed {
  from = module.rg.azurerm_resource_group.rg_2
  lifecycle {
    destroy = false
  }
}
Shocktrooper commented 4 months ago

Well this is a nice workaround. This should be added to the official docs as a workaround for removing instances till/if official functionality is added.

I think this also opens up an official way of implementing this as terraform could do the move behind the scenes then the remove would work instead of chaining 2 blocks. That way we avoid the issues that were mentioned earlier in this issue.

Shocktrooper commented 4 months ago

@jbardin what do you think about the above in order to make an official way to remove an instance?

jbardin commented 4 months ago

@Shocktrooper, this is definitely a reasonable method to remove an instance via configuration.

Always requiring destroy=false for the resource type would still need a new lifecycle option and/or additional protocol changes, but would allow for terraform to automatically forget the instance when the key no longer exists. It would also allow for the instance to be re-added without having to clean up the moved or removed blocks.

For the occasional one-off exception however, this may end up being the preferred config-driven method, since you somehow need to both indicate that you wish to separate the instance from the existing resource configuration, and then remove it from the state.

apparentlymart commented 3 months ago

The intended design for removed blocks was that in the long run they allow any of the lifecycle options that would normally be allowed for the kind of object that has been removed, excepting any that wouldn't make sense for something that has been removed anyway (like ignore_changes, which is relevant only for update/replace actions).

The recent addition of support for destroy-tume provisioners in removed blocks is a development in that direction, for example.

Therefore the following is what we imagined as the design for this situation of "whenever an instance of this resource is removed, just forget about it instead of destroying it":

resource "whatever" "example" {
  for_each = ...

  # ...

  lifecycle {
    destroy = false
  }
}

The above didn't make it into the first release of destroy = false because it is potentially confusing with the legacy "prevent destroy" feature, which means "fail planning if any instance of this would be destroyed" instead. We wanted to get more experience with the new feature before developing it further.

I think the question of interaction with the existing prevent_destroy is still in need of an answer, and that would be part of designing a solution to this issue.

In the meantime, I'm glad folks have found the workaround of moving the instance in question into an undeclared resource address. That is not something we explicitly intended but it's a logical consequence of the behaviors of both moved and removed, so it seems safe to assume that behavior isn't going to get broken or removed in future.

gk-drw commented 3 months ago

Regarding the eample:

resource "whatever" "example" {
  for_each = ...

  # ...

  lifecycle {
    destroy = false
  }
}

I was wondering how would this work for the scenario described in the first comment - forgetting only a single particular instance but none of the others due to accidental removal? It seems to me like this will cause terraform to ignore all of the instances of that resource block instead. This could potentially cause any additional accidental removals to go unnoticed (or at least, go through without terraform reporting errors)

fardarter commented 3 months ago

Is there any intention of allowing a for_each in the removed block?

I have a bunch of containers I want to remove without destroying as I'm converting them to gen2 filesysystems. The import works beautifully, but then I'm left with the container resource dangling. If I remove it, the actual containers get destroyed.

nomeelnoj commented 1 month ago

We have an additional use case where I would love to see further development on the removed block.

We have a requirement to keep rds snapshots for compliance reasons. We associate a unique KMS key with each database cluster. In order to restore those snapshots, we need the key that was used to create the snapshots. However, if we run terraform destroy and keep a final snapshot around, the key is gone and we cannot actually restore those final snapshots.

I would like to be able to use a condition inside the removed block to validate if any manual snapshots have been created (those that we will be keeping long term) and only destroy the resource if there are none. If there are snapshots present, then the remove block would just remove the object from state and leave it around in perpetuity so we can restore the snapshot.

possible config (maybe we could do this with some sort of lifecycle precondition, but that is also not supported (and theres the issue that the data object for cluster snapshots wont return empty if nothing is there unlike data.aws_iam_roles, but it captures the idea)):

removed {
  from = module.rds_cluster.aws_kms_key.default

  lifecycle {
    destroy = data.aws_db_cluster_snapshot.manual.snapshot_creation_time == null ? true : false
  }
}