gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.04k stars 975 forks source link

Remove resources on folder level #433

Open ibrahim-alzant opened 6 years ago

ibrahim-alzant commented 6 years ago

Currently, if I need to remove a folder containing some resources, terragrunt won't be able to detect the removed folder since the state is removed as well. A workaround would be to set the count = 0 for each resource. I think it would be nice to have the ability to set the count on the folder level instead of setting it for each resource I have since I am not aware of any other solution for such a case.

Thanks

brikis98 commented 6 years ago

Not sure I follow your question. Could you provide some examples?

wuan commented 6 years ago

We have the usecase that we want to fully remove a component from time to time (e. g due to end of life of it or a refactoring of the structure). For now we added count = "0" to all contained terraform resources, but this can be a difficult task because missing references have to be handled correctly. The question is, if it has already been considered to include a feature which eases the removal of all resources of a component through a special configuration. A proposal would be to have a count in terragrunt.tfvars like:

terragrunt = {
  count = "0"
}

This would trigger the removal of the complete component during a call of terragrunt apply-all (by executing terraform destroy on this particular module.

Of course there are more details like broken dependencies which would have to be considered.

Does this seem to be a valid approach or is there a current best practice, which makes such an operation feasible?

brikis98 commented 6 years ago

I'm probably missing something, but why can't you just run terragrunt destroy when you need to clean up those modules?

wuan commented 6 years ago

We are having a complex setup where the configuration is applied from within an AWS account by running terragrunt apply-all in a CodeBuild environment.

We are not able to execute terraform destroy manually there, therefore it would help a lot to destroy a component based on its configuration.

brikis98 commented 6 years ago

Hm, that strikes me as running counter to the "principle of least surprise." I imagine running apply in one of my modules and then, to my great horror, it instead runs destroy because of some left over configuration in it...

geekifier commented 6 years ago

@brikis98 In a world where we want all Terraform changes to happen inside of a CI/CD pipeline, how do you propose to avoid having to manually run terragrunt destroy?

We could have some logic within our deploys to run that command against a module given some trigger, e.g. keyword with a PR, or .destroy in the module directory, for example.

But if we want all of the changes to be visible and reviewable via a PR process, it certainly makes sense to support an explicit destroy trigger within terragrunt itself.

Ideal (to me) process:

  1. Developer decomissions a module by setting destroy = true for that terragrunt module declaration.
  2. Developer creates a PR, which runs unit tests and terragrunt plan.
  3. Terragrunt interprets the destroy = true option, and runs a destroy plan against that module.
  4. Team members review the PR, and once approved, an apply is triggered.
brikis98 commented 6 years ago

In a world where we want all Terraform changes to happen inside of a CI/CD pipeline, how do you propose to avoid having to manually run terragrunt destroy?

Make your CI / CD process more flexible!

For example, provide a way to specify the Terraform command that's executed by your CI / CD process and the folder where it runs that command. You could pass that in via a commit message (for CircleCI or TravisCI systems) or, if using something like Jenkins, via a parametrized build. Of course, the CI / CD process can have defaults for each of those params (e.g., apply-all in the root), but it's very useful to be able to override them.

See also atlantis for another way to run commands in your CI / CD system. Note, however, Atlantis does not currently support Terragrunt, but the basic idea of triggering CI / CD builds via GitHub comments is a very good one.

geekifier commented 6 years ago

We are actually planning to roll out something similar to atlantis locally :).

I guess this is just a matter of preference, as I think declarative approach of destroy = true scoped to the particular resource (similar to ensure.absent in config management) is more explicit, and better reflected in commit history, vs. a comment on a PR that actually destroys infrastructure.

This behavior could also be optional, activated by a runtime parameter passed to terragrunt :).

brikis98 commented 6 years ago

I guess this is just a matter of preference, as I think declarative approach of destroy = true scoped to the particular resource (similar to ensure.absent in config management) is more explicit, and better reflected in commit history, vs. a comment on a PR that actually destroys infrastructure.

I would agree with that if everything in Terraform was managed through code, and the binary only had an apply command. However, there's destroy, import, state, taint, and a bunch of other stuff that is not reflected in the code or commit history. Therefore, it strikes me as unintuitive to have a special case just for destroy, especially given the destructive nature of that command.

Also, how does such a configuration interact with other commands? If I run plan, does it automatically run plan -destroy? If I run terraform state commands and that destroy flag is in my code, how do they interact? If the infrastructure is already destroyed, but I forgot to remove the flag, does apply do nothing?

geekifier commented 6 years ago

Those are good question. The main difference here being the fact that when you remove resources from a vanilla Terraform project (e.g. your merge diff shows resource declarations are being removed), they will explicitly be destroyed.

However, as OP indicated, if you remove a Terragrunt-initiated module (subfolder), then those changes will not be considered when you run plan-all post removal. This deviates from the vanilla Terraform behavior.

In any event, I appreciate your responses here, and thank you for your continued work on the wrapper. This is a small issue that can be worked around as needed, with a solution that fits each team's needs.

wuan commented 6 years ago

@brikis98, thank you for the discussion. In our daily use terraform apply is indeed the only command in use. The other commands like destroy, import, state, etc. are only rarely used in the case when there is a need for manual adaptions. Usually we try to avoid that by modifying resources from inside the configuration.

wuan commented 6 years ago

We have now found a solution, which works fine for us. We defined an 'empty' terraform module which we reference from the parts we want to have removed during an apply-all operation. If we have a service defined through its terraform.tfvars file with the following content:

terragrunt = {
  include {
    path = "../../terraform.tfvars"
  }

  terraform {
    source = "../../../module/public-standalone-service"
  }

  dependencies {
    paths = [
      "../../cluster/public"]
  }
}

service_name = "foo"
service_version = "0.0.22"

We just remove all resources in this folder's *.tf file and reference the empty module as a dummy:

terragrunt = {
  include {
    path = "../../terraform.tfvars"
  }

  terraform {
    source = "../../../module/empty"
  }

  dependencies {
    paths = [
      "../../cluster/public"]
  }
}

service_name = "foo"
service_version = "0.0.22"

During the next terragrunt apply-all all resources contained in that folder are removed.

PaulVipond commented 4 years ago

@wuan That right there is a fantastic idea :-) (Although slightly painful when you have a set of connected resources - I want to delete a developers system instance.) I'd expected Terragrunt to delete resources if you delete a folder, as Terraform destroys resources that have been removed. However, the tricky part is that Terragrunt won't know the state is gone because the state is managed per folder. Slightly off topic, but that's one of the most fragile things about Terragrunt to me. If I want to reorganize my folder structure, it's not a trivial thing. It would be much nicer to tag resources, and declare dependencies and state storage using those tags. Then I could move things around without fear of breakage.

willemveerman commented 4 years ago

The idea behind the terraform / gitops approach is that the environment should reflect the state of the code. Hence, a developer can establish the state of the env by simply looking at the code.

If you delete a folder containing a terragrunt.hcl file, the resources specified therein will not be destroyed, hence the code will deviate from the env, so this breaks the gitops paradigm and means that terragrunt's behaviour fundamentally differs from terraform's.

For this reason, we need some sort of flag within terragrunt to indicate that the resources specified within a particular terragrunt.hcl file are not passed to the terraform binary. Note that I did not say that a destroy instruction should be sent. Rather, you should remove those resources from the input and then run whatevever terraform command the user specifies.

This would cover all use cases.

If the user runs apply and the resources are already applied, they will be destroyed.

If the user runs apply and they are already destroyed, nothing will happen - hence this flag can be persisted in the codebade without issue.

A plan command will print out the changes which will be produced from a subsequent apply - just as the user expects.

tgourley01 commented 4 years ago

As mentioned in previous posts - the simple use-case of a terragrunt module being "disabled" can be handled by manipulating the source value. i.e. source = local.enable ? "git::https://git…" : "/module/empty"

..and perhaps this has been stated in other posts as well, but I'll add - This type of "disable" is completely broken when a terragrunt module has a dependency, even when that dependency uses the same "enable/disable" switch. This is because when applying a module, which is now disabled, the modules need to be applied in reverse order, opposite of what is used for construction. "apply-all" and "destroy-all" operate in different order because of this. Apply-all always processes dependencies first so removing modules with dependencies can not supported with apply-all.

Additionally destroy/destroy-all case in terragrunt is always painful because you need to have dependency outputs. If those dependencies are destroyed you are stuck. (you can never run destroy-all twice, second run will always bomb on the destroyed dependencies. ..yeah there are skip options but I'm pretty sure that is not a complete solution)

Seems to me, enable/disable modules (w/dependencies) needs to be formally supported by terragrunt so that the dependency ordering issues can be dealt with properly.

willemm commented 4 years ago

That type of behaviour is broken more generally, as I recently found out. Suppose you have one module that creates, say, 5 vms and another resource that adds those vms to a cluster. Now I update the config so that there are only 4 vms and run apply-all.

What I (naively) expected to happen: The 5th vm is removed from the cluster and then destroyed.

What actually happened: The 5th vm is destroyed and then it tries to remove it from the cluster (which fails because the clusrer is in a degraded state and doesn't accept changes).

It gets even worse if you update the config so that one of the vms is different (and therefore needs to be destroyed and recreated), because in that case you can't fix it by tweaking the module order. It would need to first run the 'destroy' part of the second module, then run the first and finally run the 'add' part of the second resource... Or vice versa depending on what you want the intermediate state to be.

tgourley01 commented 4 years ago

True. I never expected terragrunt to understand how changes in dependencies propagate through outputs. I always use the example of changing aws availability zones from 3 to 2. for the apply-all the vpc module would be processed first but would not be able to delete the subnet due to resources from other modules referring to it. This always seemed like a migration operation which would require more admin intervention. I'm not even sure that would work if all the resources were in the same terraform module.

bcha commented 3 years ago

Ran into this today as well. Unfortunately all the solutions offered here feel quite hacky, would love if Terragrunt came up with some better way to deal with this.

michaaelw commented 3 years ago

Has a solution for this arisen?

SarmaKHCL commented 3 years ago

Seeing similar issue while removing the resources(folders) and applying the changes. The order of apply is making it difficult to remove the dependent resources. Is there any solution that terragrunt is currently working on?

headincl0ud commented 1 year ago

+1

jmauro commented 1 year ago

+1

Noamshmueli commented 1 year ago

+1

Noamshmueli commented 1 year ago

+1

anderson-marques commented 1 year ago

+1

cdelgehier commented 1 year ago

+1

cdelgehier commented 1 year ago

We have now found a solution, which works fine for us. We defined an 'empty' terraform module which we reference from the parts we want to have removed during an apply-all operation. If we have a service defined through its terraform.tfvars file with the following content:

terragrunt = {
  include {
    path = "../../terraform.tfvars"
  }

  terraform {
    source = "../../../module/public-standalone-service"
  }

  dependencies {
    paths = [
      "../../cluster/public"]
  }
}

service_name = "foo"
service_version = "0.0.22"

We just remove all resources in this folder's *.tf file and reference the empty module as a dummy:

terragrunt = {
  include {
    path = "../../terraform.tfvars"
  }

  terraform {
    source = "../../../module/empty"
  }

  dependencies {
    paths = [
      "../../cluster/public"]
  }
}

service_name = "foo"
service_version = "0.0.22"

During the next terragrunt apply-all all resources contained in that folder are removed.

It does the job, I confirm

terraform {
  source = "/dev/null"
}
aceqbaceq commented 1 year ago

really need smth real solution

charlseo commented 1 year ago

+1

thiagofernandocosta commented 1 year ago

Did someone find a doable solution for that ? The advantage of having infrastructure by using folders you can figure out what you have in place at a glance, however it seems a bit hacky to source an empty terraform source in order to delete specific terraform module.

Therefore, there is no advantage, since you have to get deep into folders and check whether module is being used in fact.

mohdsabir-ciq commented 1 month ago

+1