Lifecycle Meta not Working Properly in aws_instance

eHildy commented 3 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

We have a module that creates EC2 instance, as well as attaching some security groups and other configs not seen here. Most of the time we can do this in a more or less vanilla way, and we end up with a private IP that is assigned randomly. However, in other cases, we'd like to be able to supply a list of private IPs to assign to the primary interface. We want the primary interface, since just adding another interface at say, index = 1 would require further configs inside the machine and in our security groups. Also, secondary_private_ips = var.private_ips wasn't quite useful either, since that seems to be at the AWS-networking-level, and traffic coming from the machine does not appear as one of these IPs when received by other instances. Those IPs also don't appear in the instance's OS, if you did an ifconfig or similar command.

The below code works beautifully. The machine receives the required private IP(s), and they appear as said IPs in the instance using ifconfig etc, you can SSH in on the IPs, and their traffic appears to originate from them in other servers. The issue is when the machine needs to be replaced.

We use a data look up to find the AMI ID, and when it finds a new one, the plan indicates replacing the instance, as expected. However it can't do so, because the ENI that's being created by aws_network_interface is still attached to the running instance, as it creates the new instance before terminating the old. I thought that setting create_before_destroy = false would correct this, but it seems to have no effect and the same thing happens.

Terraform CLI and Terraform AWS Provider Version

TF: v0.14.3 Provider: v3.28.0

Affected Resource(s)

aws_instance
aws_network_interface

Terraform Code

resource "aws_network_interface" "with_private_supplied" {
  count = var.private_ips != null ? 1 : 0
  subnet_id = var.subnet_id
  private_ips = var.private_ips
}

resource "aws_network_interface" "without_private_supplied" {
  count = var.private_ips == null ? 1 : 0
  subnet_id = var.subnet_id
}

resource "aws_instance" "ec2_server" {
  ami = var.ami_id
  instance_type = var.instance_type
  key_name = var.key_name
  iam_instance_profile = aws_iam_instance_profile.ec2_iam_profile.name

  lifecycle {
    create_before_destroy = false
  }

  root_block_device {
    volume_size = var.volume_size
  }
  dynamic "network_interface" {
    for_each = aws_network_interface.with_private_supplied
    content {
      network_interface_id = aws_network_interface.with_private_supplied[0].id
      device_index = 0
    }
  }
  dynamic "network_interface" {
    for_each = aws_network_interface.without_private_supplied
    content {
      network_interface_id = aws_network_interface.without_private_supplied[0].id
      device_index = 0
    }
  }
  user_data = var.user_data
}

Debug Output

Plan: 4 to add, 2 to change, 4 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.geoserver_ec2_alb[0].aws_network_interface_sg_attachment.sg_attachments[0]: Destroying... [<redact>]
module.geoserver_ec2_alb[0].aws_lb_target_group_attachment.alb_tg_attachments[0]: Destroying... [<redact>]
module.geoserver_ec2[0].aws_instance.ec2_server: Creating...
module.geoserver_ec2_alb[0].aws_lb_target_group_attachment.alb_tg_attachments[0]: Destruction complete after 1s
module.geoserver_ec2_alb[0].aws_network_interface_sg_attachment.sg_attachments[0]: Destruction complete after 2s

Error: Error launching source instance: InvalidNetworkInterface.InUse: Interface: [<redact>] in use.
        status code: 400, request id: 1588ac3f-2b49-40eb-8eea-ae717e99d419

Expected Behavior

Adding lifecycle meta create_before_destroy should terminate the running instance before the new one is created, so that the ENI will be available for the new instance.

Actual Behavior

The new instance is still created before the old one is terminated, despite setting the lifecycle meta.

Steps to Reproduce

Using the code above, run a terraform apply with some AMI ID
Run the same code again with a new AMI ID, prompting a new instance

ripa1993 commented 2 years ago

Any news on this? I'm experiencing the same behaviour with 3.75.0 and Terraform 1.0.9.

ripa1993 commented 2 years ago

I was able to trace down the root cause in my situation that was similar to this one.

Logging at trace level, look for:

forcing create_before_destroy on for "xxx"
"xxx" has CBD descendent "yyy"

Where xxx is your aws_instance and yyy is what is causing the create before destroy.

So basically in my case it was a downstream resource marked with create_before_destroy = true that made also the aws_instance assume the same behavior

dylan-shipwell commented 2 years ago

I was unable to resolve this "xxx" has CBD descendent "yyy" relationship in my case. related rabbit hole, being able to control CBD is required to test if another tf provider behavior which reslies on CBD!=true: can be worked around.

some behavior feedback notes

I would like to see
- terraform to refuse to create a valid plan if a resource explicitly has create_before_destroy=false and some other resources requires it to have CBD=true
- terraform report a complete list of decendant resources that conflict with CBD=false resources
- terraform option to never silently promote a resource to CBD=true, producing an plan invalidation instead

some heuristics from my time troubleshooting

I've gone as far as to hard code all of the variables, and outputs of the module that creates the aws_instance resource, and still, the TF_LOG=TRACE feedback claims this specific aws_instance has a "CBD descendent" on a completely unrelated other resource. I've made huge efforts to traverse the terraform graph to identify any path between these two relationships. the visual graph of first-order relationships of the two conflated nodes has zero interconnectivity other than that they share a provider. if there are literally zero outside inputs, and zero inside outputs, how can the expansion of this module possibly have descendants.

My only next move is to completely isolate the resource into a new tf root.

it feels like, if the parent module has any other modules that have any descendants with lifecycle create_before_destroy=true, this resources gets contaminated, and worse; lifecycle {\n create_before_destroy = false\n} is practically a non effective statement.

github-actions[bot] commented 2 months ago

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

hashicorp / terraform-provider-aws