hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.77k stars 9.12k forks source link

[Bug]: Resources disappearing from state after successful apply #27671

Open uristernik opened 1 year ago

uristernik commented 1 year ago

Terraform Core Version

1.2.8

AWS Provider Version

4.20.1

Affected Resource(s)

aws_rds_cluster_instance aws_backup_vault

Expected Behavior

Terraform should report no changes to the infrastructure

Actual Behavior

Resources disappeared from the state with no apparent reason between two sequential runs.

We had a terraform apply run that ran and finished successfully. After a few hours, another terraform apply started running, and it tried to create resources that were already existing (the resources that disappeared from the state). We use S3 to store the state remotely. Looking at the versions of the state files we see that there were no changes to the state file in between those runs. It looks like the first run deleted these resources from the state, although we didn't see any message/error and the apply ended successfully.

This had happened multiple times. One time we saw the info message that says the resources were changed (in our case, deleted) outside of terraform, in other instances this happened we didn't get that message.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_rds_cluster_instance" "cluster_instances" {
  count                   = var.instance_count
  cluster_identifier      = aws_rds_cluster.main_rds_cluster.id
  engine                  = var.instance_engine
  identifier              = var.name_override != "" ? "${var.name_override}-${count.index}" : "${var.workspace.env_type}-${var.workspace.env_data_center_name}-${count.index}"
  instance_class          = var.instance_class
  availability_zone       = var.availability_zones[count.index]
  apply_immediately       = var.instance_apply_immediately
  db_parameter_group_name = local.db_instance_parameter_group_name

  performance_insights_enabled = var.performance_insights_enabled

  tags = merge(local.common_tags, {
    Name = var.name_override != "" ? "${var.name_override}-${count.index}" : "${var.workspace.env_type}-${var.workspace.env_data_center_name}-${count.index}"
  })

  lifecycle {
    prevent_destroy = true
  }
}
resource "aws_backup_vault" "rds-cluster" {
  name = var.name_override != "" ? "rds-${var.name_override}" : "rds-${var.workspace.env_type}-${var.workspace.env_data_center_name}"

  lifecycle {
    prevent_destroy = true
  }
}

Steps to Reproduce

This is quite rare and sporadic, we weren't able to find how to reproduce it, but it happened a few times already.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

No response

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Hey @uristernik 👋 Thank you for taking the time to raise this. I know this might be difficult given that you reported that this is happening sporadically, but in this case, logs are going to be our best bet in order to be able to assist. Would it be possible to try to capture debug logs during an occurrence of this?

uristernik commented 1 year ago

Hey, thanks for the quick replay. I am working on it. We need to figure out a strategy to enable debug logs without spamming huge amounts of logs. (It happens rarely and we have lots of runs in a day)

justinretzolk commented 1 year ago

@uristernik No worries at all, I totally understand that it might be tricky/take a bit of time.

uristernik commented 1 year ago

Updating that we enabled logging - will update if anything comes up

sebastianrothbucher commented 3 months ago

a versioned bucket really did save me - this is NOT cool

dit-darius commented 2 months ago

For me, terraform lost 2 resources after a few applies after I created a workspace and pushed default.tfstate to that workspace's s3.

Good thing I still had the default.tfstate locally and could reinsert the missing ones manually and re-push

EDIT: I still have console output I was working in. One terraform apply worked as expected, created all the new/modified resources, ant the next terraform apply says it's about to create resources it actually created days ago (and they are still very much present and unchanged). terraform state pull >dev.tfstate and diff default.tfstate dev.tfstate revealed those 2 resources are missing in the dev workspace. IDK why or how, but they got lost.

I was using s3 bucket for my dev.tfstate.