Support use of computed values in import blocks IDs

Rwwn commented 11 months ago

Terraform Version

Terraform v1.6.2
on darwin_arm64

Use Cases

I have an EC2 instance which I would like to optionally restore from a backup AMI via Terraform. This AMI contains snapshots of the instance's EBS volumes (these volumes are also managed in my Terraform configuration as their own resources), so when an instance is built with it, it builds the EBS volumes as well. The downside of this is that Terraform has no way of knowing about these volumes created by the AMI restore, so they won't be tracked in the state. So the only way to do it I can see is this:

Run a Terraform apply with a variable such as restore_from_backup set to true, which tells Terraform to use the backup AMI to rebuild the server, and not to build the EBS volumes
Manually import the EBS volumes to the state
Apply Terraform again with restore_from_backup set to false, to make sure the tags etc. on the volumes are correct.

Having to manually import the EBS volumes and run two applies is not ideal. I'd like to be able to do all of this with a single apply.

Attempted Solutions

The first thing I tried was to use the ebs_block_device blocks inline on the aws_instance resource, rather than separate ebs_volume resources. Terraform recognised the restored EBS volumes in this case, but it always wanted to destroy and recreate them, along with the server itself, even when the parameters on the ebs_block_device blocks in the code matched the ones in the state. So that didn't work.

My second attempt involved using Terraform's new import blocks. The relevant code looks like this:

locals {
  ebs_volumes = {
    "gocd-server" = {
      device_name = "/dev/sdh"
      size        = var.gocd_server_volume_size
    }
    "gocd-artifacts" = {
      device_name = "/dev/sdg"
      size        = var.gocd_artifacts_volume_size
    }
  }
}

data "aws_ebs_volume" "restored_volumes" {
  for_each = var.restore_from_backup_enabled ? local.ebs_volumes : {}
  depends_on = [aws_instance.gocd_server]
  most_recent = true

  filter {
    name   = "snapshot-id"
    values = [
        for volume in data.aws_ami.gocd_ami.block_device_mappings : 
        volume.ebs.snapshot_id if volume.device_name == each.value.device_name
    ]
  }
}

# import blocks don't support for_each yet so having to hardcode the two volume imports
# for_each will be added in v1.7: https://github.com/hashicorp/terraform/pull/33932
import {
  to = aws_ebs_volume.persistent_volumes["gocd-server"]
  id = data.aws_ebs_volume.restored_volumes["gocd-server"].id
}
import {
  to = aws_ebs_volume.persistent_volumes["gocd-artifacts"]
  id = data.aws_ebs_volume.restored_volumes["gocd-artifacts"].id
}

resource "aws_ebs_volume" "persistent_volumes" {
  for_each = local.ebs_volumes

  availability_zone = data.aws_subnet.ec2_private_subnet.availability_zone
  encrypted         = true
  kms_key_id        = module.kms.kmskey_arn
  size              = each.value.size
  tags = merge(local.tags, {
    Name = each.key
  })
}

So I'm using the snapshot IDs from my AMI's data source as inputs to an EBS volume data source to find the volumes built from these snapshot IDs. I've used a depends_on for the EBS volume data source to delay its creation until after the EC2 instance is created, since this is when the EBS volumes associated with the AMI are restored from snapshots. I've then added import blocks to import each of these using the EBS volume data source to retrieve the volume IDs.

The code I've provided won't work currently for the case where var.restore_from_backup_enabled == false, since the EBS volume data sources won't be created, but I can get around that when for_each for the imports is released. I think the logic should work for the case where var.restore_from_backup_enabled == true though, but when I try it, Terraform complains that the IDs for the imports are not known at plan time:

│ Error: Invalid import id argument
│ 
│   on ebs.tf line 34, in import:
│   34:   id = data.aws_ebs_volume.restored_volumes["gocd-server"].id
│ 
│ The import block "id" argument depends on resource attributes that cannot
│ be determined until apply, so Terraform cannot plan to import this
│ resource.
╵
╷
│ Error: Invalid import id argument
│ 
│   on ebs.tf line 38, in import:
│   38:   id = data.aws_ebs_volume.restored_volumes["gocd-artifacts"].id
│ 
│ The import block "id" argument depends on resource attributes that cannot
│ be determined until apply, so Terraform cannot plan to import this
│ resource.

Proposal

It should be possible for import blocks to use values only computed at apply time in their inputs, similar to how many resources do it. The plan output would show this:

id = (known after apply)

And then the import would be performed at apply time, after the value is computed.

Without this feature, import blocks are of limited usefulness, since you have to know the IDs of what you're importing ahead of running Terraform. This is an unfortunate limitation when Terraform is quite capable of finding these IDs for you from data sources.

References

No response

jbardin commented 11 months ago

Hi @Rwwn,

Thanks for filing the issue. You cannot import a resource if the import data is unknown, because Terraform needs to have the imported first resource in order to complete the plan. If what you are looking for is a method for handling multiple phases of apply to deal with unknown values, we are tracking the general problem space under #30937.

Your attempt with the data source to lookup the import data is the right approach, but from what I can see here it's only prevented by the depends_on = [aws_instance.gocd_server], which is why the id is unknown (if there's any change planned to aws_instance.gocd_server). If the restored volumes already exist to be imported, there should be no reason for the explicit dependency to prevent the data source from being read. Does your configuration actually require the depends_on for some other reason?

Rwwn commented 11 months ago

Hi @jbardin, thanks for the response. What I'm trying to achieve relies on the depends_on being there, because the volumes are only created during the Terraform run - specifically, the creation of the EC2 from the AMI is what creates the EBS volumes I want to import. The AMI has several block devices associated with it, which result in volumes being created from snapshots when an instance is created using the AMI - but these won't be present in the state.

So in short, all I have at the start of the plan is the AMI and the snapshots; the volumes get created during apply, and I need to delay the data source and import blocks from running until the volumes are created, hence the depends_on. Sorry if that wasn't clear, I understand it's a bit convoluted. I might be barking up the wrong tree with this solution, so I'm happy to hear any other workarounds as well.

On reflection, I suppose the problem with my solution is similar to the one with using computed values for for_each and count given in the issue you linked at https://github.com/hashicorp/terraform/issues/30937 - if the volume ID for the import isn't known until apply, what is Terraform supposed to do when planning the aws_ebs_volume resources? It makes the plan conditional on the outcome of the apply, so I see the problem. If this was allowed, you'd have situations where the plan could not predict what Terraform would do to the EBS volumes on the apply, which is probably not desired. I may have to resort to importing these volumes manually after the apply after all, which is a bit of a shame.

The comment here https://github.com/hashicorp/terraform/issues/30937#issuecomment-1154173481 describing lazy applies sounds like it would solve my use case. Whether allowing this is a good idea or not is another question.

jbardin commented 11 months ago

Thanks @Rwwn, that's helpful information. The fact that this is attempting to import something which is implicitly created during the same run is not something the current import process can handle. This is also close to other issues providers want to work on where an existing resource could be "adopted" during creation, or be able to import or create depending on whether something exists. That would give the provider the tools to communicate to the user what the actions on the resource will be during apply (a provider could technically make this work now in the existing resource lifecycle, but it would not be clear in the plan what is happening, and we tend to discourage breaking the strict lifecycle of resources since it can lead to other side effects).

Rwwn commented 11 months ago

Just in case anyone else comes across this, I've got a workaround using Terragrunt post hooks to run Terraform output and import commands in. It works well, but of course it requires an external tool in the shape of Terragrunt. I couldn't find any way to do it in Terraform natively, outside of writing a script to do multiple Terraform invocations.

kclinden commented 8 months ago

@jbardin I ran into a similar issue with the vpc_route_table's default local route that is created. Do you know of any way in tf 1.7 to be able to manage these routes within the initial plan/apply?

The below code would throw the following error if doing it from a clean apply.

│ Error: Invalid import id argument
│ 
│   on vpc.tf line 224, in import:
│  224:   id       = "${aws_route_table.public[each.key].id}_${local.vpc_cidr}"
│ 
│ The import block "id" argument depends on resource attributes that cannot be determined until apply, so Terraform cannot plan to import this resource.

Snippet of code:

################################################################################
# Publiс Subnets
################################################################################

resource "aws_subnet" "public" {
  #checkov:skip=CKV_AWS_130:Public subnet needs public ips
  for_each = toset(local.azs)

  availability_zone       = each.key
  cidr_block              = local.public_subnet_cidr[index(local.azs, each.key)]
  map_public_ip_on_launch = true
  vpc_id                  = aws_vpc.this.id

  tags = merge(
    { "Name" = "${local.name}-public-${each.key}" },
    local.tags,
    local.public_subnet_tags
  )
}

# Public Route Table
resource "aws_route_table" "public" {
  for_each = toset(local.azs)

  vpc_id = aws_vpc.this.id

  tags = merge(
    { "Name" = "${local.name}-public-rt-${each.key}" },
    local.tags,
  )
}

import {
  for_each = toset(local.azs)
  to       = aws_route.public_to_firewall[each.key]
  id       = "${aws_route_table.public[each.key].id}_${local.vpc_cidr}"
}

resource "aws_route" "public_to_firewall" {
  for_each = toset(local.azs)

  route_table_id         = aws_route_table.public[each.key].id
  destination_cidr_block = local.vpc_cidr
  vpc_endpoint_id        = tomap(local.az_to_endpoint)[each.key]

  timeouts {
    create = "5m"
  }
}

jbardin commented 8 months ago

@kclinden,

No, there's no way to do this with a single operation right now. Import must happen during the planning phase, but the resources you want to import don't exist until after apply is complete.

kclinden commented 8 months ago

@kclinden,

No, there's no way to do this with a single operation right now. Import must happen during the planning phase, but the resources you want to import don't exist until after apply is complete.

Yea in that case i think my only option is to manage the routes as part of the route table. That won't throw the same error.

hashicorp / terraform