Un-destroyable partially applied infrastructure

radeksimko commented 5 years ago

Applying the config below results in the following (expected) errors:

$ terraform apply

...

aws_instance.web: Creating...
aws_instance.web: Still creating... [10s elapsed]
aws_instance.web: Still creating... [20s elapsed]
aws_instance.web: Still creating... [30s elapsed]
aws_instance.web: Still creating... [40s elapsed]
aws_instance.web: Creation complete after 44s [id=i-0892a89162ec437a6]

Error: Invalid index

  on main.tf line 35, in resource "aws_instance" "web2":
  35:     Name = "volume has ${aws_instance.web.root_block_device.1.volume_size} G"
    |----------------
    | aws_instance.web.root_block_device is list of object with 1 element

The given key does not identify an element in this collection value.

Error: Invalid index

  on main.tf line 40, in output "volume_size":
  40:   value = "${aws_instance.web.root_block_device.1.volume_size}"
    |----------------
    | aws_instance.web.root_block_device is list of object with 1 element

The given key does not identify an element in this collection value.

which brings me to a state where I cannot destroy this "partially" applied infrastructure.

$ terraform destroy -force

data.aws_ami.ubuntu: Refreshing state...
aws_instance.web: Refreshing state... [id=i-0892a89162ec437a6]

Error: Invalid index

  on main.tf line 35, in resource "aws_instance" "web2":
  35:     Name = "volume has ${aws_instance.web.root_block_device.1.volume_size} G"
    |----------------
    | aws_instance.web.root_block_device is list of object with 1 element

The given key does not identify an element in this collection value.

Error: Invalid index

  on main.tf line 40, in output "volume_size":
  40:   value = "${aws_instance.web.root_block_device.1.volume_size}"
    |----------------
    | aws_instance.web.root_block_device is list of object with 1 element

The given key does not identify an element in this collection value.

I'm forced to touch the config and remove pieces of it which were never actually applied, but are interpolated anyway and preventing Terraform from destroying other parts.

I assume the problem is more complex than I can imagine, but can't we just interpolate blocks of config for which we have existing state during refresh and destroy - or is there a reason we need to interpolate whole config?

Terraform Version

e8ee3f14a4ffcc5df888b2ec2b4e3bec1162dce7

It's worth noting this is not a regression and this bug exists in 0.11 and most likely all previous versions too.

Terraform Configuration Files

provider "aws" {
  region = "us-west-2"
}

data "aws_ami" "ubuntu" {
  most_recent = true

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  owners = ["099720109477"] # Canonical
}

resource "aws_instance" "web" {
  ami           = "${data.aws_ami.ubuntu.id}"
  instance_type = "t2.micro"

  tags = {
    Name = "HelloWorld"
  }
}

output "volume_size" {
  value = "${aws_instance.web.root_block_device.1.volume_size}"
}

Steps to Reproduce

terraform init
terraform apply
terraform destroy

This might theoretically also be solved by https://github.com/hashicorp/terraform/issues/18994 but only partially as we'd still need to parse config for refresh anyway.

apparentlymart commented 5 years ago

I suspect the problem here is mainly just that we are running full configuration validation during the terraform destroy step.

We could potentially have a more minimal validation mode that ignores resource and data blocks, since those are not considered during terraform destroy anyway -- refresh and destroy are mostly state-only operations.

We'd still need to validate any provisioner blocks that have when = destroy set, because those are used from configuration during destroy. Given that, I expect we'd still need to put the resource nodes in the graph when we do that walk to get the opportunity to validate the provisioners but have an exception to skip over validating the resource block body, count, etc.

We don't retain local or output values (except the root module, for terraform_remote_state use only) in the state between runs, so we would need to re-evaluate those if either provider blocks or destroy-time provisioners refer to them. That might mean that the second output "volume_size" would still fail at destroy time, unless we can prove during graph construction that the output is not used in any provider or provisioner blocks and then drop it from the validate graph.

vkoradb123 commented 5 years ago

Hi, I am also facing same problem,how can I solve it?any clues here?

$ terraform plan -destroy var.aws_access_key_id Enter a value:

var.aws_region Enter a value:

var.aws_secret_access_key Enter a value:

Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage.

data.aws_availability_zones.available: Refreshing state... data.aws_region.current: Refreshing state... aws_vpc.pkb: Refreshing state... [id=vpc-0976f6cc531a4fa77] aws_internet_gateway.pkb: Refreshing state... [id=igw-0e9de9120f18015bb] aws_subnet.pkb[0]: Refreshing state... [id=subnet-0324b98509d9bdab5] aws_subnet.pkb[1]: Refreshing state... [id=subnet-0907688a4cbfe7a1e]

Error: Invalid index

on vpc.tf line 51, in resource "aws_route_table_association" "pkb": 51: subnet_id = "${aws_subnet.pkb.*.id[count.index]}" |---------------- | aws_subnet.pkb is empty tuple | count.index is 0

The given key does not identify an element in this collection value.

Error: Invalid index

on vpc.tf line 51, in resource "aws_route_table_association" "pkb": 51: subnet_id = "${aws_subnet.pkb.*.id[count.index]}" |---------------- | aws_subnet.pkb is empty tuple | count.index is 1

The given key does not identify an element in this collection value.

jbardin commented 4 years ago

A recent duplicate led me to a clearer reproduction.

The failure here is during the refresh walk, which is partially driven by the configuration. Because data sources need to be fully re-evaluated during refresh, the entire config is loaded. Problems arise when the managed resource instances in the config do not match what is in the state. This leads to the situation where expressions from numerous places could require evaluating non-existent resource instances.

The workaround for refresh-time failures is destroy -refresh=false to prevent refreshing at all. If a refresh is required (i.e. this isn't a destroy and the resources need to be refreshed for a valid plan), then temporarily changing the config to match the state should work

jnixon-blue commented 3 years ago

I can confirm this is still happening as of terraform 1.0.3. Is there any chance we can see some action here?

jbardin commented 3 years ago

Hi @jnixon-blue,

The issue here is a case where an invalid root module output configuration cannot be applied, and since destroy is an implied apply operation, it requires fixing the invalid configuration to move forward.

Unfortunately, due to the obscure nature of the issue, and availability of a direct workaround, we have not gotten to the work necessary to fix the destroy evaluation process for root outputs in this case. If you happen to have a similar error that is not caused by an invalid configuration, I suggest you file a new issue with the information requested in the issue template.

Thanks!

hashicorp / terraform