hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.71k stars 9.55k forks source link

Update/replace resource when a dependency is changed #8099

Closed OJFord closed 2 years ago

OJFord commented 8 years ago
resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    depends_on = ["foo.bar"]
}

bar.foo is not modified if the file 'foobar' changed without otherwise changing the resource that includes it.

radeksimko commented 8 years ago

Hi @OJFord would you mind providing more concrete example with real resources that would help us reproduce the unexpected behaviour you described?

Thanks.

cemo commented 8 years ago

@radeksimko please check issue referenced #6613. This is pretty important and can be hit in other places as well. From my experiments, I observed that "depends on" is only related to order. It does not trigger a change.

apparentlymart commented 8 years ago

Hi @OJFord and @cemo,

In Terraform's design, a dependency edge (which is what depends_on creates explicitly) is used only for ordering operations. So in the very theoretical example given in the issue summary, Terraform knows that when it's doing any operation that affects both foo.bar and bar.foo it will always do the operation to foo.bar first.

I think you are expecting an additional behavior: if there is an update to foo.bar then there will always be an automatic update to bar.foo. But that is not actually how Terraform works, by design: the dependency edges are used for ordering, but the direct attribute values are used for diffing.

So in practice this means that the bar.foo in the original example will only get an "update" diff if any of its own attributes are changed. To @radeksimko's point it's hard to give a good example without a real use-case, but the way this would be done is to interpolate some attribute of bar.foo into foo.bar such that an update diff will be created whenever that attribute changes. Note that it's always attribute-oriented... you need to interpolate the specific value that will be changing.

In practice this behavior does cause some trouble on edge cases, and those edge cases are what #4846 and #8769 are about: allowing Terraform to detect the side effects of a given update, such as the version_id on an Amazon S3 object implicitly changing each time its content is updated.

Regarding your connection to that other issue @cemo, you are right that the given issue is another one of these edge cases, though a slightly different one: taking an action (deploying) directly in response to another action (updating some other resource), rather than using attribute-based diffing... though for this API gateway case in particular, since API gateway encourages you to create a lot of resources, the specific syntax proposed there would likely be inconvenient/noisy.

Again as @radeksimko said a specific example from @OJFord might allow us to suggest a workaround for a specific case today, in spite of the core mechanisms I've described above. In several cases we have made special allowances in the design of a resource such that a use-case can be met, and we may be able to either suggest an already-existing one of these to use or design a new "allowance" if we have a specific example to work with. (@cemo's API gateway example is already noted, and there were already discussions about that which I will describe in more detail over there.)

OJFord commented 8 years ago

I'm sorry that I never came back with an example; I'm afraid I can't remember exactly what I was doing - but:

I think you are expecting an additional behavior: if there is an update to foo.bar then there will always be an automatic update to bar.foo. But that is not actually how Terraform works, by design: the dependency edges are used for ordering, but the direct attribute values are used for diffing.

is exactly right, that was what I misunderstood.

Perhaps something like taint_on_dependency_change = true is possible? That is, if such a variable is true, change the semantic of "ordering" above from "do this after, if it needs to be done" to "do this after".

cemo commented 8 years ago

@OJFord the issue you don't remember might be #6613.

I second @OJFord's proposition and expect something like a simpler thing as taint_on_dependency_change. However I can not be considered an expert on terraform land and due to the fact that this is my first experiment with it my opinions might not weight enough.

apparentlymart commented 8 years ago

This taint_on_dependency_change idea is an interesting one. I'm not sure I would actually implement it using the tainting mechanism, since that's more of a workflow management thing and indicates that the resource is "broken" in some way, but we could potentially think of it more like replace_on_dependency_change: artificially produce a "force new" diff any time a dependency changes.

I think this sort of thing would likely require some of the machinery from #6810 around detecting the presence of whole-resource diffs and correctly handling errors with them. There are some edge cases round what happens if B depends on A and A is changed but B encounters an error while replacing... since the intended change is not explicitly visible in the attributes, Terraform needs to make sure to do enough book-keeping that it knows it has more work to do when run again after the error is resolved.

It might work out conceptually simpler to generalize the triggers idea from null_resource or keepers from the random provider, so that it can be used on any resource:

resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    lifecycle {
        replace_on_change {
            foo_bar_foobar = "${foo.bar.foobar}"
        }
    }
}

In the above example, the lifecycle.replace_on_change attribute acts as if it were a resource attribute with "forces new resource" set on it: the arbitrary members of this map are stored in the state, and on each run Terraform will diff what's in the state with what's in the config and generate a "replace" diff if any of them have changed.

This effectively gives you an extra place to represent explicit value dependencies that don't have an obvious home in the resource's own attributes.

This is conceptually simpler because it can build on existing mechanisms and UX to some extent. For example, it might look like this in a diff:

-/+ bar.foo
    lifecycle.replace_on_change.foo_bar_foobar: "old_value" => "new value" (forces new resource)

In the short term we're likely to continue addressing this by adding special extra ForceNew attributes to resources where such behavior is useful, so that this technique can be used in a less-generic way where it's most valuable. This was what I'd proposed over in #6613, and has the advantage that it can be implemented entirely within a provider without requiring any core changes, and so there's much less friction to get it done. Thus having additional concrete use-cases would be helpful, either to motivate the implementation of a generic feature like above or to prompt the implementation of resource-specific solutions where appropriate.


For the moment I'm going to re-tag this one as "thinking" to indicate that it's an interesting idea but we need to gather more data (real use-cases) in order to design it well. I'd encourage other folks to share concrete use-cases they have in this area as separate issues, similar to what's seen in #6613, and mention this issue by number so that it can become a collection of links to relevant use-cases that can inform further design.

cemo commented 7 years ago

@mitchellh This issue might be considered for 0.8 release as you improved "depends_on" and this might be a quick win.

ckyoog commented 7 years ago
resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = "${var.max_capacity}"
  min_capacity       = "${var.min_capacity}"
  role_arn           = "${var.global_vars["ecs_as_arn"]}"

  resource_id        = "service/${var.global_vars["ecs_cluster_name"]}/${var.ecs_service_name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_cpu_scale_in" {
  adjustment_type         = "${var.adjustment_type}"
  cooldown                = "${var.cooldown}"
  metric_aggregation_type = "${var.metric_aggregation_type}"

  name                    = "${var.global_vars["ecs_cluster_name"]}-${var.ecs_service_name}-cpu-scale-in"
  resource_id             = "service/${var.global_vars["ecs_cluster_name"]}/${var.ecs_service_name}"
  scalable_dimension      = "ecs:service:DesiredCount"
  service_namespace       = "ecs"

  step_adjustment {
    metric_interval_upper_bound = "${var.scale_in_cpu_upper_bound}"
    scaling_adjustment          = "${var.scale_in_adjustment}"
  }

  depends_on = ["aws_appautoscaling_target.ecs_target"]
}

Hi @apparentlymart,

Here is another real user-case of my own.

resource aws_appautoscaling_policy.ecs_cpu_scale_in (let it be autoscaling policy) depends on resource aws_appautoscaling_target.ecs_target(let it be autoscaling target).

When I change the value of max_capacity, and then run terraform plan, it shows the autoscaling target is forced to new (it is going to be destroyed and re-added). But nothing will happen to autoscaling policy, which is supposed to be destroyed and re-added as well.

Why is it supposed to? Because in my practice, after terraform applysuccessfully (which destroys and re-adds the autoscaling target successfully), the autoscaling policy is gone automatically (if you login to aws console, you can see it's gone), so I have to run terraform apply again, the second time, and this time, it will add the autoscaling policy back.

(BTW. the both resources are actually defined in a module. maybe it matters, or not, I'm not sure).

apparentlymart commented 7 years ago

Hi @ckyoog! Thanks for sharing that.

What you described there sounds like what's captured in terraform-providers/terraform-provider-aws#240. If you think it's the same thing, it would be cool if you could post the same details in that issue since having a full reproduction case is very useful. I think in your particular case this is a bug that we ought to fix in the AWS provider, though you're right that if the feature I described in my earlier comment were implemented it could in principle be used as a workaround.

In the mean time, you might be able to already workaround this by including an additional interpolation in your policy name to force it to get recreated when the target is recreated:

  name = "${var.global_vars["ecs_cluster_name"]}-${var.ecs_service_name}-cpu-scale-in-${aws_appautoscaling_target.ecs_target.id}"

Since the name attribute forces new resource, this should cause the policy to get recreated each time the target is recreated.

ckyoog commented 7 years ago

Thank you @apparentlymart for the workaround. Sure, I will post my case to issue terraform-provider/terraform-provider-aws#240.

zopanix commented 7 years ago

Hey, just got an idea of how this might be solutionned, now the approach is inspired fron Google Cloud and I don't know if it will apply to all use cases. Basically, in google cloud you have the notion of used by and uses on resources. For example, the link between a boot_disk and an instance. The boot_disk can exist alone as a simple disk but the instance cannot exist without a boot disk. Therefore, in the data model, you can have a generic system that states, used_by.

Example:

resource "google_compute_disk" "bastion_boot" = {
  image = "centos-7"
  size    = "10"
  used_by = ["${google_compute_instance.bastion.name}"]
}

resource "google_compute_instance" "bastion" = {
  boot_disk = {
    source = "${google_compute_disk.bastion_boot.name}"
  }
  uses = ["${google_compute_disk.bastion_boot.name}"]
}

The uses and used_by could be implicitly set in well known cases but could be explicitly set in some user and/or corner cases. And it would become the provider's responsibility to know about the implicit uses and as a workaround, it would be possible to use the explicit form.

It would work like the implicit and explicit depends_on except.

Now I understand that there are some subtle differences in the problems that have been mentioned like, I don't want to destroy, I want to update a resource for example. I don't know how my case would fit into this.

Also, I think it would be best to stick with the cloud provider's semantics, and, in my case, it really reflects what I'm doing and how everything works. This system would be a reverse depends on creating a possible destruction cycle that would be triggered before the create cycle. Which would be fine in most cases, and if you cannot tolerate a destruction, you usually apply a blue-green model anyways which doesn't give such a pain. But in my case, during my maintenance windows, I can be destructive on most of my resources.

Just some related issues:

16065 #16200

alethenorio commented 6 years ago

I have run into the need for this issue myself.

The use case is the following:

I have a resource for a database instance (In this case an AWS RDS instance) which performs a snapshot of its disk upon destruction. If I destroy this resource and recreate it and destroy it again, AWS returns an error because it will attempt to create a snapshot with the same identifier as before.

This can be mitigated by using something like the "random_id" resource as a suffix/prefix to that identifier. The issue is that if I taint the database resource, I need to manually remember to taint the "random_id" resource as well otherwise the new instance will have the same "random_id" as before.

Attempting to use a "keepers" pointing to the database resource id does not work because it causes a cyclic dependency.

Any ideas on how one handles that?

DaveDeCaprio commented 5 years ago

I've run into this same issue with trying to get an EMR cluster to rebuild when the contents of a bootstrap operation change. See https://stackoverflow.com/questions/53887061/in-terraform-how-to-recreate-emr-resource-when-dependency-changes for details.

bukzor commented 5 years ago

My concrete case is similar to the ones already discussed. I want to destroy, recreate a disk resource when a VM startup-script changes. I really like the described lifecycle.replace_on_change solution, but I wonder if it it would work for me. My VM already has a reference to the disk, and the disk would grow a replace-on-change reference to the VM's startup-script. Would that cycle be a problem? I can represent the startup script as a separate template-file resource pretty easily, but cycles caused by replace-on-change should either work well or be an error.

bukzor commented 5 years ago

The other solution that I thought of looks like so:


resource my-vm { startup-script }
resource my-disk {}

resource null-resource _ {
  triggers {
    startup-script = my-vm.startup-script
  }
  provisioner "taint" {
    resources = ["my-disk"]
  }
}

But I don't know how this would look in tf-plan. You're right that the lifecycle.replace-on-change design leverages some existing patterns nicely.

jhoblitt commented 5 years ago

I frequently run into this problem if I'm using kubernetes provider resources that depend on a module that creates the gke or eks cluster. If a configuration change is made that causes the k8s cluster to be destroyed/recreated, obviously all kubernetes resources are lost.

jsirianni commented 5 years ago

I am running into this problem as well. I have two resources. One resource depends on the other. If I delete the dependent resource outside of terraform, I need BOTH resources to be recreated but terraform does not know that, it only offers to create the resource that I manually deleted.

Its a chicken and egg issue when an outside force modifies the infrastructure.

apeeters commented 5 years ago

Another example: rotating an EC2 keypair that is configured on an Elastic Beanstalk environment should trigger a rebuild of the environment.

resource "aws_elastic_beanstalk_environment" "test"
  ...
  setting {
    namespace = "aws:autoscaling:launchconfiguration"
    name      = "EC2KeyName"
    value     = "${aws_key_pair.test.key_name}"
  }
}
weaversam8 commented 5 years ago

Here's another example with Amazon Lightsail: if you recreate a aws_lightsail_instance you will need to recreate the aws_lightsail_static_ip_attachment between the aws_lightsail_instance and the aws_lightsail_static_ip

resource "aws_lightsail_instance" "instance_1" {
    name = "Instance 1"
    # ...
}

# yes, the below is really all that is needed for the aws_lightsail_static_ip resource
resource "aws_lightsail_static_ip" "instance_1_static_ip" {
    name = "Instance 1 Static IP"
}

resource "aws_lightsail_static_ip_attachment" "instance_1_static_ip_attachment" {
    static_ip_name = "${aws_lightsail_static_ip.instance_1_static_ip.name}"
    instance_name  = "${aws_lightsail_instance.instance_1.name}"
}

In this example, if you run terraform taint aws_lightsail_instance.instance_1 running terraform apply will recreate the aws_lightsail_instance resource, but then the aws_lightsail_static_ip_attachment resource will be automatically detached. You'll have to run terraform apply again to realize it has changed.

jenyss commented 5 years ago

Adding another use case related to this request.

I have a custom Provider which defines a "workflow_execution" resource. When created, it triggers an application deployment. I would like to have the "workflow_execution" created:

For the first point to be achieved the "workflow_execution" resource creation has to be dependent on a change in another resource, which is currently not supported by Terraform.

ravulachetan commented 5 years ago

Adding another use case related to this request.

I use AVI LB and create GSLB for all the services that we use. Right now the connection between AVI GSLB and the web apps are done through uuid only. when any attribute on the web app changes the uuid gets regenerated and results in out of sync with AVI GSLB.

Need a solution to recreate GSLB everytime there is a change done to web app.

heisian commented 5 years ago

If one needs to recreate an aws_lb_target_group that is currently the target of an aws_lb_listener_rule, the aws_lb_listener_rule needs to first be destroyed before the aws_lb_target_group can be recreated.

SaintSimmo commented 5 years ago

Piling on, this would be extremely useful for redeploying APIs via the AWS provider.

e.g., aws_api_gateway_deployment resource handles the deployment of an AWS API Gateway instance. However it must be manually redeployed if any API methods, resources, or integrations change.

A workaround might be setting the stage name of the deployment to the hash of the directory containing the volatile configurations, but the end result would be many stages.

edit - Naturally, it looks like there's already been a few issues created regarding this.

MathyV commented 5 years ago

I was running into the same issue with kubernetes as you @jhoblitt . I managed to find a workaround in the fact that (it seems) all kubernetes resources require that the name doesn't change. If you change the name, the resource will be recreated.

So I created a random id that is based on the cluster endpoint and I append that to the name of all my kubernetes resources.

// Generate a random id we can use to recreate k8s resources
resource "random_id" "cluster" {
    keepers = {
        // Normally a new cluster will generate a new endpoint
        endpoint = google_container_cluster.cluster.endpoint
    }
    byte_length = 4
}

resource "kubernetes_deployment" "tool" {
    metadata {
        name = "tool-deployment-${random_id.cluster.hex}"
        labels = {
            App = "tool"
        }
    }

    spec {
    ...
    }
}

It's not ideal (especially for naming services) but it works for me. The only issue I still have is with helm which I use to install traefik. If I add the id to those names creation works fine but on update of the id I get a cyclic dependency problem. Also the change in the name of the service account roles makes helm / tiller not work properly anymore, so I'll probably completely forgo helm and configure traefik manually.

atecce commented 5 years ago

@radeksimko

would you mind providing more concrete example with real resources that would help us reproduce the unexpected behaviour you described?

resource "kubernetes_config_map" "config" {

  data = {
    FOO = "bar"
  }

  metadata {
    name = "config"
  }
}

resource "kubernetes_deployment" "deployment" {

  depends_on = [ kubernetes_config_map.config ]

  metadata {
    name = "deployment"
  }

  spec {
    env_from {
      config_map_ref {
         name = kubernetes_config_map.config.metadata[0].name
      }
    }
  }
}

i want my k8s deployment to get patched every time i terraform apply a config change, for example, changing the env var FOO to baz. that's my use case

thrixton commented 5 years ago

Another example. Replacing an aws_key_pair does not update related aws_instance(s)

hogarthj commented 5 years ago

If one needs to recreate an aws_lb_target_group that is currently the target of an aws_lb_listener_rule, the aws_lb_listener_rule needs to first be destroyed before the aws_lb_target_group can be recreated.

That's similar to what I'm bumping into and trying to work around right now ... trying to evaluate a solution and a "force_recreate/taint" in lifecycle, or similar, would be incredibly useful right now ...

In my case I have a target group that needs to be recreated, but the listener (no rule involved here) is only getting a "update in place" change ... but then the target group cannot be destroyed because the listener isn't being destroyed ...

For reference for others searching the issue for this in the AWS provider is being tracked in terraform-providers/terraform-provider-aws#10233

psanzm commented 4 years ago

I was running into the same issue with the Google Provider and the resource google_compute_resource_policy & google_compute_disk_resource_policy_attachment.

When you create a policy for scheduling the snapshots of a GCE Disk you must attach the policy to the disk. That policy isn't editable so if you perform any changes Terraform has to recreate the resource but doesn't recreate the attachment resource, even if it's "linked" with the _dependson directive of Terraform.

Example of the resources:

resource "google_compute_resource_policy" "snapshot_schedule_wds" {
  name    = "snapshot-weekly-schedule-wds"
  region  = var.subnetwork_region
  project = google_project.mm-sap-prod.name

  snapshot_schedule_policy {
    schedule {
      weekly_schedule {
        day_of_weeks {
          day        = "SATURDAY"
          start_time = "20:00"
        }
      }
    }
    retention_policy {
      max_retention_days    = 366
      on_source_disk_delete = "KEEP_AUTO_SNAPSHOTS"
    }
    snapshot_properties {
      labels = {
        app     = "xxx"
      }
      storage_locations = ["europe-west6"]
      guest_flush       = false
    }
  }
}

resource "google_compute_disk_resource_policy_attachment" "gcp_wds_snap_schedule_pd_boot" {
  name = google_compute_resource_policy.snapshot_schedule_wds.name
  disk = google_compute_disk.web-dispatch-boot.name
  zone = var.zone
  project = google_project.mm-sap-prod.name

  depends_on = ["google_compute_resource_policy.snapshot_schedule_wds"]
}

Terraform version

Terraform v0.12.13
+ provider.external v1.2.0
+ provider.google v2.20.0
+ provider.google-beta v2.20.0

Any solution for this use case?

pdecat commented 4 years ago

@psanzm in this very specific use case, using the google_compute_resource_policy's id field, instead of name, in the google_compute_disk_resource_policy_attachment's name field allows to it work:

resource "google_compute_disk_resource_policy_attachment" "gcp_wds_snap_schedule_pd_boot" {
  name = google_compute_resource_policy.snapshot_schedule_wds.id
...

Note: it works because the actual values of name and id are the same, but the id is unknown upon recreation.

sean-nixon commented 4 years ago

To add another example use case I recently ran into with Azure PostgreSQL. I wanted to upgrade the version of the PostgreSQL engine on the server, which requires replacement. The dependent resources such as firewall rules and Postgres configurations were not re-created. I had to run through two applies. This is a common occurrence in Azure where most IDs are based on the name of the resource, so if it is re-created the ID stays the same and dependent resources don't register the change.

resource "azurerm_postgresql_server" "pgsql_server" {
  name                = "examplepgsql"
  resource_group_name = "my-rg"
  location            = "eastus"

  sku {
    name     = "GP_Gen5_2"
    capacity = "2"
    tier     = "GeneralPurpose"
    family   = "Gen5"
  }

  storage_profile {
    storage_mb            = "51200"
    backup_retention_days = 35
    geo_redundant_backup  = "Enabled"
  }

  administrator_login          = var.admin_username
  administrator_login_password = var.admin_password
  version                      = "11"
  ssl_enforcement              = "Enabled"
}

resource "azurerm_postgresql_firewall_rule" "azure_services_firewall_rule" {
  name                = "AzureServices"
  resource_group_name = azurerm_postgresql_server.pgsql_server.resource_group_name
  server_name         = azurerm_postgresql_server.pgsql_server.name
  start_ip_address    = "0.0.0.0"
  end_ip_address      = "0.0.0.0"
}

resource "azurerm_postgresql_configuration" "log_checkpoints_pgsql_config" {
  name                = "log_checkpoints"
  resource_group_name = azurerm_postgresql_server.pgsql_server.resource_group_name
  server_name         = azurerm_postgresql_server.pgsql_server.name
  value               = "on"
}
awilkins commented 4 years ago

Another use case :

I wanted to update an SSM parameter with the value of a AMI data block, but only when it changes.

This is for use with an Automation workflow like the example posted in the AWS docs.

My thought was : put in a null_resource that triggers when the AMI ID changes, and make the SSM parameter depend on this, but all null_resource emits is an ID.

Aha, I thought, I'll do this :

data "aws_ami" "windows" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["Windows_Server-2012-R2_RTM-English-64Bit-Base-*"]
  }
}

resource "null_resource" "new_windows_ami" {
  triggers = {
    base_ami_date = data.aws_ami.windows.creation_date
    force_update  = 1
  }
}

resource "aws_ssm_parameter" "current_windows_ami" {
  name  = "/ami/windows/2k12/current"
  value = data.aws_ami.windows.image_id
  type  = "String"

  tags = {
    BaseAmiTriggerId = null_resource.new_windows_ami.id
  }

  depends_on = [
    null_resource.new_windows_ami,
  ]
  # We only want the initial value from the data, we're going to replace this
  # parameter with the current "patched" release until there's a new base AMI
  overwrite = true
  lifecycle {
    ignore_changes = [
      value,
    ]
  }
}

... sadly ignore_change also implies block changes. What I was hoping was that the change to the tag would be enough to trigger an update of the whole resource. ignore_changes means that changes to the inputs of the attributes are ignored for all purposes, not just whether they trigger a lifecycle update.

This seems a shame because otherwise you could implement quite sophisticated lifecycle management with the null resource, concocting triggers with interpolations and such and only triggering an update to a dependent resource when the ID changed as a result.

MarkKharitonov commented 4 years ago

I came to this thread from https://github.com/terraform-providers/terraform-provider-azurerm/issues/763. I do not know how this is connected, but that issue was closed for the sake of https://github.com/terraform-providers/terraform-provider-azurerm/issues/326, which in turn was closed for the sake of this one.

So, if you guys understand how the connection was made, then here is another scenario and very real. We modify probing path on an azure traffic manager and boom - its endpoints are gone. This is very frustrating. Is there an ETA on the fix for this issue?

OJFord commented 4 years ago

@MarkKharitonov This issue is essentially a feature request, what you're describing with Azure sounds like a bug though (but I haven't used Azure or read through those issues) - so perhaps the link is 'sorry, nothing we can do without [this issue resolved], closing'.

I phrased it as a bug in the OP (and I should perhaps edit that) out of misunderstanding, but it's really a request for a form of dependency control that isn't possible (solely) with terraform today.

MarkKharitonov commented 4 years ago

I do not understand. I have a traffic manager resource. The change does not recreate the resource - it is reported as an in-place replacement. Yet it blows away the endpoints. How come it is a feature request?

OJFord commented 4 years ago

@MarkKharitonov As I said, "what you're describing with Azure sounds like a bug", but this issue is a feature request, for something that does not exist in terraform core today.

Possibly the Azure resolution was 'nothing we can do without a way of doing [what is described here]' - I have no idea - but this issue itself isn't a bug, and is labelled 'thinking'. There's no guarantee there'll ever be a way of doing this, nevermind an ETA.

(I don't work for Hashicorp, I just opened this issue, there could be firmer internal plans for all I know, just trying to help.)

MarkKharitonov commented 4 years ago

I do not know what to do. There are real issues in the provider that are being closed claiming the issues is because of this one. But this one is something apparently huge in scope. So, I do not understand what am I supposed to do. Should I open yet another issue in terraform providers making reference to the already closed ones and to this one? How do we attract attention to the real bug without it being closed for nothing, which has already happened twice?

sean-nixon commented 4 years ago

@MarkKharitonov I'm not expert on Terraform or Terraform provider development so someone else please correct me if I'm wrong but I don't think there's anything that can be done in the provider. The issues in the Azure provider are caused by a limitation of Terraform, not a bug in the AzureRM provider that can be fixed. Based on the comments in this issue, there is a fundamental challenge with how the Azure API works and how Terraform handles dependencies. Azure's API does not have unique IDs for resources. So if you have a child resource that references a parent resource by ID, even if that parent resource is re-created the ID doesn't change. From Terraform's perspective, that means that no attribute was changed on the child-resource since the ID it's referencing is the same, even though in actuality the child resource was also destroyed together with the parent resource. The feature request here, as I understand it, is to add additional intelligence to Terraform dependencies to use them not just for ordering resource creation, but also to detect that a dependency (e.g. parent resource) was destroyed/re-created and trigger a destroy/re-create on the dependent resource (e.g. child resource), irrespective of if any attributes on the child resource have changed.

steph-moto commented 4 years ago

This issue appears really critical and not a feature request at all. The fundamental core of terraform is to make sure to apply any missing changes if required. In this case not having terraform create dependency on a parent resource recreation is fundamentally an issue.

Could someone clarify if the authorization rules not being created when the event hub associated with is re-created has been present a long time ago. Is there any previous version of azureRM or Terraform that would mitigate the issue until this gets resolved?

Because the only approach that I can see to work around this issue is to invoke twice terraform deployment which to me is a non sense.

Backscratcher commented 4 years ago

Hey ! I have another example of this behaviour. Changes done to modules that force recreation of resources inside the module, used by dashboard, won't update and it will result in dashboard referencing configuration from before-apply. Another apply will actually pickup those changes and alter the dashboard_json template. Weird thing is that changes done to aws_instance.cron will be picked up at the time of the first apply but changes to module will not.

data "template_file" "dashboard_json" {
  template = file("${path.module}/templates/cloudwatch_dashboard/dashboard.tpl")
  vars = {
    rds_instance_id                      = module.database.rds_instance_id
    region                               = var.aws_region
    asg_normal_name                      = module.autoscaling_group.aws_autoscaling_group_name-normal
    cron_instance_id                     = aws_instance.cron.id
    lb_arn_suffix                        = module.load_balancer.aws_lb_arn_suffix
    lb_target_group_arn_suffix           = module.load_balancer.aws_lb_target_group_target_group_arn_suffix
    lb_blackhole_target_group_arn_suffix = module.load_balancer.aws_lb_target_group_target_group_blackhole_arn_suffix
    lb_redash_target_group_arn_suffix    = aws_lb_target_group.redash.arn_suffix
    procstats_cpu                        = (length(var.cron_procstats[local.environment]) > 0) ? data.template_file.dashboard_procstats_cpu.rendered : ""
    procstats_mem                        = (length(var.cron_procstats[local.environment]) > 0) ? data.template_file.dashboard_procstats_mem.rendered : ""
    # force recreation of the dashboard due to weird behaviour when changes to modules above
    # are not picked up by terraform and dashboard is not being updated
    force_recreation = var.force_dashboard_recreation[local.environment] ? "${timestamp()}" : ""
  }
}

resource "aws_cloudwatch_dashboard" "main" {
  dashboard_name = "${var.project_name}-${local.environment}-dashboard"
  dashboard_body = data.template_file.dashboard_json.rendered
}

I tried using depends_on - maybe the ordering would help with it - but it didn't help I end up using timestamp to force recreation.

kustodian commented 4 years ago

We have the exact same problem on GCP which is described in details in this issue https://github.com/terraform-providers/terraform-provider-google/issues/6376.

Here is part of the relevant config:

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = google_compute_instance_group.s1
    content {
      group = backend.value.self_link
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

I'm not sure is this a general TF problem or a Google provider problem, but here it goes. Currently it's not possible to lover the number of google_compute_instance_group that are used in a google_compute_region_backend_service. In the code above if we lower the number of google_compute_instance_group resources and try to apply the configuration, TF will first try to delete the not needed instance groups and then update the backend configuration, but that order doesn't work because you cannot delete an instance group that is used by the backend service, the order should be the other way around.

So to sum it up, when I lower the number of the instance group resources TF does this:

  1. delete surplus google_compute_instance_group -> this fails
  2. update google_compute_region_backend_service

It should do this the other way around:

  1. update google_compute_region_backend_service
  2. delete surplus google_compute_instance_group -> this fails

What I don't understand is why doesn't TF know that it should do the update first, then remove instance groups? When I run destroy, TF does it correctly: first destroys the backend service, then instance groups.

Also this is very hard to fix, because you need to make a temp config change, apply, then set the final config you want and again apply.

lorengordon commented 4 years ago

@kustodian Can you use create_before_destroy in google_compute_instance_group?

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link

  lifecycle {
    create_before_destroy = true
  }
}
kustodian commented 4 years ago

@lorengordon I can, but it doesn't help. TF works exactly the same im my example with or without create_before_destroy = true.

To be honest I'm not entirely sure that my issue is the same thing as what the issue reporter is describing.

OJFord commented 4 years ago

@apparentlymart May I suggest locking this issue? I suspect you and the team probably have enough examples and use cases to consider this feature now?

I could 'unsubscribe' of course, it's just that I would like to be notified if/when there's a decision, some progress, or something to help test. Cheers. :slightly_smiling_face:

brandocorp commented 4 years ago

Edit: It turns out this is really a function of kubernetes, and not really a terraform concern.

Just adding my 0.02. This is also an issue with the kubernetes provider and secrets/config maps. A service using an updated config map or secret doesn't detect the change because the underlying pods of the service need to be restarted or recreated to detect the changes.

resource "kubernetes_secret" "value" {
  metadata {
    name      = "k8s-secret-value"
    namespace = "private"
  }

  data {
    secret = var.secret_value
  }
}

resource "kubernetes_deployment" "service" {
  metadata {
    name      =  "internal-service"
    namespace = "private"
  }
  spec {

    template {

      spec {
        container {

          env {
            name = "SECRET_VALUE"

            value_from {
              secret_key_ref {
                name = kubernetes_secret.value.metadata.0.name
                key  = "secret"
              }
            }
          }
        }
      }
    }
  }
}

If the value for the secret key is updated, nothing seems to happen with the deployment.

danieldreier commented 4 years ago

I'm going to lock this issue for the time being, because the remaining discussion seems largely to be people supporting each other in workarounds.

I’m happy to see people are helping each other work around this, and I've created a thread for this on the community forum so that people can continue these discussions without creating excess noise for people who just want succinct updates in GitHub.