uristernik commented 1 year ago

Terraform Core Version

1.2.8

AWS Provider Version

4.20.1

Affected Resource(s)

aws_rds_cluster_instance aws_backup_vault

Expected Behavior

Terraform should report no changes to the infrastructure

Actual Behavior

Resources disappeared from the state with no apparent reason between two sequential runs.

We had a terraform apply run that ran and finished successfully. After a few hours, another terraform apply started running, and it tried to create resources that were already existing (the resources that disappeared from the state). We use S3 to store the state remotely. Looking at the versions of the state files we see that there were no changes to the state file in between those runs. It looks like the first run deleted these resources from the state, although we didn't see any message/error and the apply ended successfully.

This had happened multiple times. One time we saw the info message that says the resources were changed (in our case, deleted) outside of terraform, in other instances this happened we didn't get that message.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_rds_cluster_instance" "cluster_instances" {
  count                   = var.instance_count
  cluster_identifier      = aws_rds_cluster.main_rds_cluster.id
  engine                  = var.instance_engine
  identifier              = var.name_override != "" ? "${var.name_override}-${count.index}" : "${var.workspace.env_type}-${var.workspace.env_data_center_name}-${count.index}"
  instance_class          = var.instance_class
  availability_zone       = var.availability_zones[count.index]
  apply_immediately       = var.instance_apply_immediately
  db_parameter_group_name = local.db_instance_parameter_group_name

  performance_insights_enabled = var.performance_insights_enabled

  tags = merge(local.common_tags, {
    Name = var.name_override != "" ? "${var.name_override}-${count.index}" : "${var.workspace.env_type}-${var.workspace.env_data_center_name}-${count.index}"
  })

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_backup_vault" "rds-cluster" {
  name = var.name_override != "" ? "rds-${var.name_override}" : "rds-${var.workspace.env_type}-${var.workspace.env_data_center_name}"

  lifecycle {
    prevent_destroy = true
  }
}

Steps to Reproduce

This is quite rare and sporadic, we weren't able to find how to reproduce it, but it happened a few times already.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

No response

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

justinretzolk commented 1 year ago

Hey @uristernik 👋 Thank you for taking the time to raise this. I know this might be difficult given that you reported that this is happening sporadically, but in this case, logs are going to be our best bet in order to be able to assist. Would it be possible to try to capture debug logs during an occurrence of this?

uristernik commented 1 year ago

Hey, thanks for the quick replay. I am working on it. We need to figure out a strategy to enable debug logs without spamming huge amounts of logs. (It happens rarely and we have lots of runs in a day)

justinretzolk commented 1 year ago

@uristernik No worries at all, I totally understand that it might be tricky/take a bit of time.

uristernik commented 1 year ago

Updating that we enabled logging - will update if anything comes up

sebastianrothbucher commented 3 months ago

a versioned bucket really did save me - this is NOT cool

dit-darius commented 2 months ago

For me, terraform lost 2 resources after a few applies after I created a workspace and pushed default.tfstate to that workspace's s3.

Good thing I still had the default.tfstate locally and could reinsert the missing ones manually and re-push

EDIT: I still have console output I was working in. One terraform apply worked as expected, created all the new/modified resources, ant the next terraform apply says it's about to create resources it actually created days ago (and they are still very much present and unchanged). terraform state pull >dev.tfstate and diff default.tfstate dev.tfstate revealed those 2 resources are missing in the dev workspace. IDK why or how, but they got lost.

I was using s3 bucket for my dev.tfstate.

hashicorp / terraform-provider-aws

[Bug]: Resources disappearing from state after successful apply #27671

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

Community Note