Closed ryanking closed 4 years ago
Hi @ryanking 👋 Thank you for reporting this and sorry you ran into trouble here. The state upgrade code could definitely be more defensive in this situation by performing a nil
check itself (or potentially an upstream fix in the Terraform Plugin SDK to prevent the passing of nil
state to the upgrade function), but could you please provide us some additional details if possible?
terraform state rm
and terraform import
Thank you.
Do you know the history of this particular resource in the Terraform configuration? e.g. when it was created (Terraform AWS Provider version ideally)?
The resource is a few months old. It looks like it was created with version 2.17.0 of the the aws provider.
Was any configuration update part of updating the provider?
No.
Is the remote resource existing now or was the expectation this plan would recreate it?
The resource is existing (and has for awhile).
Were you able to workaround this scenario? e.g. terraform state rm and terraform import
I didn't look into any work arounds other than reverting the provider upgrade.
I just tried state rm
+ import
on the resource that I thought was the problem and am getting the same error.
I think it is highly likely that I don't know which resource is causing this. Do you have any pointers on figuring out which one it could be?
It'll be an "older" aws_db_instance
resource (created prior to Terraform AWS Provider version 2.49.0), but not sure if the debug logs give more information prior to the execution of the state upgrade functions. Might be able to find out easier executing Terraform with -parallelism=1
Thanks @ryanking , and @bflad for looking into this.
Follow up: after reverting to provider.aws v2.44.0
, if I run the plan, it will complain that
Error: Resource instance managed by newer provider version
The current state of module.xxx.aws_db_instance.db-dev[0] was created
by a newer provider version than is currently selected. Upgrade the
registry.terraform.io/-/aws provider to work with this state.
module.xxx.aws_db_instance.db-dev[0]
is a aws_db_instance
.
I tried running with parallel=1 to find the resource that was causing this issue. Unfortunately it seems like it was all (~30) of the aws_db_instance resources in this component. I did a state rm + import for all of them and we no longer see any errors.
@bflad what would help debug this further? I have snapshots of the state before and after the re-import if that helps
@ryanking very strange. Are any of these true (recently, prior to the original panic)?
terraform state mv
for the whole module or individual resourcescount
/for_each
with the resourceJust trying to rule out any odd behaviors that might be found in the state itself, its versioning, or the Terraform Plugin SDK handling (which is the logic responsible for running the resource state upgrades if necessary).
If you have a sanitized copy of the aws_db_instance
Terraform configuration of one of these and its associated state prior to the panic and prior to any state rm
/import
operations, that could be immensely helpful.
Thank you for your information so far!
terraform state mv
for the whole module or individual resources
Not that I know of, and I am pretty sure I would know.
- Using
count
/for_each
with the resource
These databases are all created by a module which uses count to optionally create the database, so count is always 0 or 1.
- Terraform 0.11 potentially involved at all?
Not any time recently. We use a bunch of 0.12-specific syntax so I don't think it would be possible to run 0.11 on our code base anymore. However many of these resources were created pre 0.12.
If you have a sanitized copy of the
aws_db_instance
Terraform configuration of one of these and its associated state prior to the panic and prior to anystate rm
/import
operations, that could be immensely helpful.
Here is a sanitized example.
State before: https://gist.github.com/1ddc1d1edfcdef01d95ae48fea973394 State after: https://gist.github.com/ryanking/d225e2fcfc4b39fc925da6148a006479
For configuration, this is created in a module, and the code for that resource looks like:
resource "aws_db_instance" "db-dev" {
count = var.skip_database ? 0 : 1
identifier = "${var.db_fake_data ? "db-${var.username}" : "db-${var.username}-date-${var.db_date}-num-${var.db_override_num}"}"
storage_type = "gp2"
engine = "postgres"
engine_version = "11.4"
instance_class = var.aws_db_instance_type
port = 5432
publicly_accessible = false
availability_zone = var.aws_db_instance_availability_zone
security_group_names = []
vpc_security_group_ids = ["${var.db_security_group_id}"]
db_subnet_group_name = var.db_subnet_group_name
parameter_group_name = var.db_parameter_group_name
auto_minor_version_upgrade = false
multi_az = false
backup_retention_period = 0
backup_window = "10:10-10:40"
maintenance_window = "sat:16:00-sat:20:00"
storage_encrypted = true
skip_final_snapshot = true
snapshot_identifier = "${var.db_fake_data ? "arn:aws:rds:us-west-2:950587841421:snapshot:fake-data-allocated-1000gb"
: "arn:aws:rds:us-west-2:950587841421:snapshot:traject-${var.db_date}"}"
ca_cert_identifier = "rds-ca-2019"
lifecycle {
create_before_destroy = true
}
apply_immediately = var.aws_db_instance_apply_immediately
tags = {
"Type" = "Dev Database"
"Owner" = var.username
}
}
While we may not know the exact cause for it, the fix for the panic (returning an empty state when given an empty state) has been merged and will release with version 2.69.0 of the Terraform AWS Provider, Thursday next week. 👍
This has been released in version 2.69.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.
For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!
Community Note
Terraform Version
Affected Resource(s)
Terraform Configuration Files
I think there is too much config to replicate here and it seems that the problem is purely related to state management.
Debug Output
https://gist.github.com/ca3825d27e669d6e42c4d3e40f9acaff
Panic Output
The crash is in the aws provider, not terraform. The above logs include an aws provider panic. Relevant log lines are:
Expected Behavior
Not crash.
Actual Behavior
Crash
Steps to Reproduce
terraform plan
Important Factoids
This was an attempt to upgrade the provider from 2.44.0 to 2.56.0.
References
0000