Open ibrahim-alzant opened 6 years ago
Not sure I follow your question. Could you provide some examples?
We have the usecase that we want to fully remove a component from time to time (e. g due to end of life of it or a refactoring of the structure).
For now we added count = "0"
to all contained terraform resources, but this can be a difficult task because missing references have to be handled correctly.
The question is, if it has already been considered to include a feature which eases the removal of all resources of a component through a special configuration.
A proposal would be to have a count
in terragrunt.tfvars
like:
terragrunt = {
count = "0"
}
This would trigger the removal of the complete component during a call of terragrunt apply-all
(by executing terraform destroy
on this particular module.
Of course there are more details like broken dependencies which would have to be considered.
Does this seem to be a valid approach or is there a current best practice, which makes such an operation feasible?
I'm probably missing something, but why can't you just run terragrunt destroy
when you need to clean up those modules?
We are having a complex setup where the configuration is applied from within an AWS account by running terragrunt apply-all
in a CodeBuild environment.
We are not able to execute terraform destroy
manually there, therefore it would help a lot to destroy a component based on its configuration.
Hm, that strikes me as running counter to the "principle of least surprise." I imagine running apply
in one of my modules and then, to my great horror, it instead runs destroy
because of some left over configuration in it...
@brikis98 In a world where we want all Terraform changes to happen inside of a CI/CD pipeline, how do you propose to avoid having to manually run terragrunt destroy
?
We could have some logic within our deploys to run that command against a module given some trigger, e.g. keyword with a PR, or .destroy in the module directory, for example.
But if we want all of the changes to be visible and reviewable via a PR process, it certainly makes sense to support an explicit destroy trigger within terragrunt itself.
Ideal (to me) process:
destroy = true
for that terragrunt module declaration.destroy = true
option, and runs a destroy plan against that module.In a world where we want all Terraform changes to happen inside of a CI/CD pipeline, how do you propose to avoid having to manually run terragrunt destroy?
Make your CI / CD process more flexible!
For example, provide a way to specify the Terraform command that's executed by your CI / CD process and the folder where it runs that command. You could pass that in via a commit message (for CircleCI or TravisCI systems) or, if using something like Jenkins, via a parametrized build. Of course, the CI / CD process can have defaults for each of those params (e.g., apply-all
in the root), but it's very useful to be able to override them.
See also atlantis for another way to run commands in your CI / CD system. Note, however, Atlantis does not currently support Terragrunt, but the basic idea of triggering CI / CD builds via GitHub comments is a very good one.
We are actually planning to roll out something similar to atlantis locally :).
I guess this is just a matter of preference, as I think declarative approach of destroy = true
scoped to the particular resource (similar to ensure.absent
in config management) is more explicit, and better reflected in commit history, vs. a comment on a PR that actually destroys infrastructure.
This behavior could also be optional, activated by a runtime parameter passed to terragrunt :).
I guess this is just a matter of preference, as I think declarative approach of destroy = true scoped to the particular resource (similar to ensure.absent in config management) is more explicit, and better reflected in commit history, vs. a comment on a PR that actually destroys infrastructure.
I would agree with that if everything in Terraform was managed through code, and the binary only had an apply
command. However, there's destroy
, import
, state
, taint
, and a bunch of other stuff that is not reflected in the code or commit history. Therefore, it strikes me as unintuitive to have a special case just for destroy
, especially given the destructive nature of that command.
Also, how does such a configuration interact with other commands? If I run plan
, does it automatically run plan -destroy
? If I run terraform state
commands and that destroy
flag is in my code, how do they interact? If the infrastructure is already destroyed, but I forgot to remove the flag, does apply
do nothing?
Those are good question. The main difference here being the fact that when you remove resources from a vanilla Terraform project (e.g. your merge diff shows resource declarations are being removed), they will explicitly be destroyed.
However, as OP indicated, if you remove a Terragrunt-initiated module (subfolder), then those changes will not be considered when you run plan-all
post removal. This deviates from the vanilla Terraform behavior.
In any event, I appreciate your responses here, and thank you for your continued work on the wrapper. This is a small issue that can be worked around as needed, with a solution that fits each team's needs.
@brikis98, thank you for the discussion. In our daily use terraform apply
is indeed the only command in use. The other commands like destroy
, import
, state
, etc. are only rarely used in the case when there is a need for manual adaptions. Usually we try to avoid that by modifying resources from inside the configuration.
We have now found a solution, which works fine for us. We defined an 'empty' terraform module which we reference from the parts we want to have removed during an apply-all operation.
If we have a service defined through its terraform.tfvars
file with the following content:
terragrunt = {
include {
path = "../../terraform.tfvars"
}
terraform {
source = "../../../module/public-standalone-service"
}
dependencies {
paths = [
"../../cluster/public"]
}
}
service_name = "foo"
service_version = "0.0.22"
We just remove all resources in this folder's *.tf file and reference the empty module as a dummy:
terragrunt = {
include {
path = "../../terraform.tfvars"
}
terraform {
source = "../../../module/empty"
}
dependencies {
paths = [
"../../cluster/public"]
}
}
service_name = "foo"
service_version = "0.0.22"
During the next terragrunt apply-all
all resources contained in that folder are removed.
@wuan That right there is a fantastic idea :-) (Although slightly painful when you have a set of connected resources - I want to delete a developers system instance.) I'd expected Terragrunt to delete resources if you delete a folder, as Terraform destroys resources that have been removed. However, the tricky part is that Terragrunt won't know the state is gone because the state is managed per folder. Slightly off topic, but that's one of the most fragile things about Terragrunt to me. If I want to reorganize my folder structure, it's not a trivial thing. It would be much nicer to tag resources, and declare dependencies and state storage using those tags. Then I could move things around without fear of breakage.
The idea behind the terraform / gitops approach is that the environment should reflect the state of the code. Hence, a developer can establish the state of the env by simply looking at the code.
If you delete a folder containing a terragrunt.hcl file, the resources specified therein will not be destroyed, hence the code will deviate from the env, so this breaks the gitops paradigm and means that terragrunt's behaviour fundamentally differs from terraform's.
For this reason, we need some sort of flag within terragrunt to indicate that the resources specified within a particular terragrunt.hcl file are not passed to the terraform binary. Note that I did not say that a destroy
instruction should be sent. Rather, you should remove those resources from the input and then run whatevever terraform command the user specifies.
This would cover all use cases.
If the user runs apply
and the resources are already applied, they will be destroyed.
If the user runs apply
and they are already destroyed, nothing will happen - hence this flag can be persisted in the codebade without issue.
A plan
command will print out the changes which will be produced from a subsequent apply
- just as the user expects.
As mentioned in previous posts - the simple use-case of a terragrunt module being "disabled" can be handled by manipulating the source value. i.e. source = local.enable ? "git::https://git…" : "/module/empty"
..and perhaps this has been stated in other posts as well, but I'll add - This type of "disable" is completely broken when a terragrunt module has a dependency, even when that dependency uses the same "enable/disable" switch. This is because when applying a module, which is now disabled, the modules need to be applied in reverse order, opposite of what is used for construction. "apply-all" and "destroy-all" operate in different order because of this. Apply-all always processes dependencies first so removing modules with dependencies can not supported with apply-all.
Additionally destroy/destroy-all case in terragrunt is always painful because you need to have dependency outputs. If those dependencies are destroyed you are stuck. (you can never run destroy-all twice, second run will always bomb on the destroyed dependencies. ..yeah there are skip options but I'm pretty sure that is not a complete solution)
Seems to me, enable/disable modules (w/dependencies) needs to be formally supported by terragrunt so that the dependency ordering issues can be dealt with properly.
That type of behaviour is broken more generally, as I recently found out. Suppose you have one module that creates, say, 5 vms and another resource that adds those vms to a cluster. Now I update the config so that there are only 4 vms and run apply-all.
What I (naively) expected to happen: The 5th vm is removed from the cluster and then destroyed.
What actually happened: The 5th vm is destroyed and then it tries to remove it from the cluster (which fails because the clusrer is in a degraded state and doesn't accept changes).
It gets even worse if you update the config so that one of the vms is different (and therefore needs to be destroyed and recreated), because in that case you can't fix it by tweaking the module order. It would need to first run the 'destroy' part of the second module, then run the first and finally run the 'add' part of the second resource... Or vice versa depending on what you want the intermediate state to be.
True. I never expected terragrunt to understand how changes in dependencies propagate through outputs. I always use the example of changing aws availability zones from 3 to 2. for the apply-all the vpc module would be processed first but would not be able to delete the subnet due to resources from other modules referring to it. This always seemed like a migration operation which would require more admin intervention. I'm not even sure that would work if all the resources were in the same terraform module.
Ran into this today as well. Unfortunately all the solutions offered here feel quite hacky, would love if Terragrunt came up with some better way to deal with this.
Has a solution for this arisen?
Seeing similar issue while removing the resources(folders) and applying the changes. The order of apply is making it difficult to remove the dependent resources. Is there any solution that terragrunt is currently working on?
+1
+1
+1
+1
+1
+1
We have now found a solution, which works fine for us. We defined an 'empty' terraform module which we reference from the parts we want to have removed during an apply-all operation. If we have a service defined through its
terraform.tfvars
file with the following content:terragrunt = { include { path = "../../terraform.tfvars" } terraform { source = "../../../module/public-standalone-service" } dependencies { paths = [ "../../cluster/public"] } } service_name = "foo" service_version = "0.0.22"
We just remove all resources in this folder's *.tf file and reference the empty module as a dummy:
terragrunt = { include { path = "../../terraform.tfvars" } terraform { source = "../../../module/empty" } dependencies { paths = [ "../../cluster/public"] } } service_name = "foo" service_version = "0.0.22"
During the next
terragrunt apply-all
all resources contained in that folder are removed.
It does the job, I confirm
terraform {
source = "/dev/null"
}
really need smth real solution
+1
Did someone find a doable solution for that ? The advantage of having infrastructure by using folders you can figure out what you have in place at a glance, however it seems a bit hacky to source an empty terraform source in order to delete specific terraform module.
Therefore, there is no advantage, since you have to get deep into folders and check whether module is being used in fact.
+1
Currently, if I need to remove a folder containing some resources, terragrunt won't be able to detect the removed folder since the state is removed as well. A workaround would be to set the count = 0 for each resource. I think it would be nice to have the ability to set the count on the folder level instead of setting it for each resource I have since I am not aware of any other solution for such a case.
Thanks