Reverse dependencies aren't respected when re-creating resources

JulianCBC commented 5 years ago

AWS generally doesn't let people remove resources if they're referenced by another resource by ARN or ID.

For example, a load balancer listener rule might reference a target group. This means that the target group isn't removable until that listener rule no-longer references it.

The AWS provider doesn't seem to understand these reverse dependencies, causing errors when otherwise easily resolvable changes take place, requiring the operator to manually change things in the AWS Management Console before applying the changes. (Or hack the configuration)

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.8

provider.acme v1.4.0
provider.aws v2.28.1
provider.tls v2.1.0

Affected Resource(s)

This seems to be an issue across many different resources, I've hit it most while working with load balancers, so the affected load balancer resources are listed here.

aws_lb_listener
aws_lb_listener_rule
aws_lb_target_group

Terraform Configuration Files

Example configuration: (Modified from a real configuration, var.* values will need to be filled out before applying.)

resource "aws_lb_target_group" "tg" {
  name        = "rdep-tg"
  target_type = "ip"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
}

resource "aws_lb" "lb" {
  name               = "rdep-lb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = var.security_groups
  subnets            = var.subnets
}

resource "aws_lb_listener" "front-end" {
  load_balancer_arn = "${aws_lb.lb.arn}"
  port              = 80
  protocol          = "HTTP"

  default_action {
    type = "fixed-response"

    fixed_response {
      content_type = "text/html"
      message_body = "FAIL"
      status_code  = "503"
    }
  }
}

resource "aws_lb_listener_rule" "forward" {
  listener_arn = "${aws_lb_listener.front-end.arn}"

  action {
    type             = "forward"
    target_group_arn = "${aws_lb_target_group.tg.arn}"
  }

  condition {
    field  = "host-header"
    values = ["${var.server_name}"]
  }
}

Expected Behavior

Terraform creates a plan which changes the aws_lb_target_group and aws_lb_listener_rule resources in an order that prevents it from removing the aws_lb_target_group while it's referenced by the aws_lb_listener_rule.

E.g.

Delete the listener rule
Delete the target group
Create the target group
Create the listener rule

Or alternatively:

Edit the listener rule to not reference the old target group
Remove the target group
Create the target group
Edit the listener rule to reference the new target group

(We can't edit the name of a target group, so we can't create the new one and then edit it to have the correct name after everything's been updated.)

Actual Behavior

Terraform attempts to perform these operations in this order:

Remove the target group
Create the target group
Edit the listener rule to reference the new target group

This fails at step 1 as the target group cannot be removed while the listener rule references it.

Steps to Reproduce

Apply the initial configuration above
Edit the aws_lb_target_group resource so that it will be re-created. E.g. changing target_type.
Apply the new configuration

Notes

This particular scenario could be hit in the wild by changing a load-balanced server from a EC2 instance to a container - doing this would require changing the target group it's part of from a "instance" type to an "ip" type.

The AWS provider or Terraform itself needs to be made smarter so that it understands when referenced data can be left "dangling" and when it can't.

JulianCBC commented 5 years ago

Same issue with security groups.

If you have a security group which is used by some other resources, then taint it, terraform apply waits forever for the security group to be deleted.

If you attempt to delete it from the AWS management console, it prompts you to remove it from the network interfaces which are using it first.

JulianCBC commented 5 years ago

Same issue with ACM certificates.

If you have an ACM certificate used by a load balancer, then you cannot re-create the certificate as it's used by that load balancer and the AWS provider isn't smart enough to determine a work-around.

kumadee commented 3 years ago

Is there any progress on this topic?

justinretzolk commented 2 years ago

Hey @JulianCBC 👋 Thank you for taking the time to file this issue! Given that there's been a number of AWS provider releases since you initially filed it, can you confirm whether you're still experiencing this behavior?

JulianCBC commented 2 years ago

@justinretzolk our Terraform configuration has had "hacks" added to it to work around this issue, so it's not causing us problems anymore, however those hacks are inelegant and not easy to discover for novice users, so the issue is still relevant.

I just removed one of those hacks and checked this behaviour on version 3.69 of the AWS provider and the issue is still present.

The relevant part of our configuration looks like this:

resource "aws_security_group" "test" {
    name_prefix = "test"
    vpc_id = aws_vpc.test.id

    lifecycle {
        create_before_destroy = true
    }
}

resource "aws_instance" "test" {
  instance_type = "t2.micro"
  ami           = data.aws_ami.crm_ubuntu.id
  key_name      = "test"
  vpc_security_group_ids = [
    aws_security_group.test.id,
  ]
  subnet_id                   = aws_subnet.test_a.id
}

And I tested this by:

Running terraform apply to clear out any pending changes and ensure that the resources are present in AWS
Commented out the create_before_destroy rule
Tainted the security group
Run terraform apply It got stuck destroying the security group as AWS won't let you do that while the EC2 instance references it.

JouHouFin commented 4 months ago

@JulianCBC Care to share your "hacks" here? A very similar issue also plagues the vcd provider.

JulianCBC commented 4 months ago

@JouHouFin literally just the create_before_destroy lifecycle rule, it's easy to work around, but non-obvious and requires knowledge of which things could potentially have these sorts of reverse dependencies. Arguably the provider should hold that knowledge as these "rules" are static properties of the service being used, not something that is context- or configuration-dependent.

Documentation is here: https://developer.hashicorp.com/terraform/language/meta-arguments/lifecycle

JulianCBC commented 4 months ago

Oh, and this is why it was "hacks" in quotation marks: I view these lifecycle rules as a way to work around stuff that is weird or complicated, not fundamental properties of a service - a "proper" configuration shouldn't need them, hence this bug.

JouHouFin commented 4 months ago

Thanks, I'll check if that's something I could use. Although I already tried that exact lifecycle rule, but maybe my understanding of it was limited so maybe worth revisiting it. 👍

hashicorp / terraform-provider-aws