Upgrading Global Database Clusters Yields Inconsistent Plan

theherk commented 8 months ago

Description

There is an issue with global database clusters that is documented in the provider but not yet accounted for in the module. It only appears when both using global clusters and when upgrading to a new engine version. And even then... not always; it is inconsistent.

Given an implementation:

module "this" {
  source  = "terraform-aws-modules/rds-aurora/aws"
  version = "~> 7.1.0"
  name    = "${var.ctx.prefix_regional}-${var.name}"

  allowed_cidr_blocks                   = var.subnet_cidrs
  apply_immediately                     = true
  allow_major_version_upgrade           = var.allow_major_version_upgrade
  auto_minor_version_upgrade            = var.auto_minor_version_upgrade
  backup_retention_period               = var.backup_retention_period
  copy_tags_to_snapshot                 = var.copy_tags_to_snapshot
  create_db_subnet_group                = true
  create_security_group                 = false
  database_name                         = var.database_name
  db_parameter_group_name               = var.db_cluster_db_instance_parameter_group_name != null ? var.db_cluster_db_instance_parameter_group_name : aws_db_parameter_group.this[0].id
  db_cluster_parameter_group_name       = var.db_cluster_parameter_group_name != null ? var.db_cluster_parameter_group_name : aws_rds_cluster_parameter_group.this[0].id
  deletion_protection                   = var.deletion_protection
  enabled_cloudwatch_logs_exports       = var.enabled_cloudwatch_logs_exports
  engine                                = "aurora-postgresql"
  engine_mode                           = var.engine_mode
  engine_version                        = var.engine_version
  global_cluster_identifier             = var.global_cluster_identifier
  instances                             = var.instances
  instance_class                        = var.instance_class
  is_primary_cluster                    = var.is_primary_cluster
  kms_key_id                            = var.kms_key_id
  performance_insights_enabled          = var.performance_insights_enabled
  performance_insights_kms_key_id       = var.performance_insights_enabled ? var.performance_insights_kms_key_id : null
  performance_insights_retention_period = var.performance_insights_enabled ? var.performance_insights_retention_period : null
  port                                  = var.port
  preferred_backup_window               = var.preferred_backup_window
  preferred_maintenance_window          = var.preferred_maintenance_window
  serverlessv2_scaling_configuration    = var.instance_class != "db.serverless" ? {} : var.scaling_configuration
  storage_encrypted                     = true
  subnets                               = var.subnet_ids
  vpc_id                                = var.vpc_id
  vpc_security_group_ids                = [aws_security_group.this.id]
}

If the variable engine_version is changed to upgrade the cluster, we usually get the error given further down.

Set aside that this isn't the latest version. I have worked with that as well, and will. The issue, I believe, is here. This needs engine_version to be ignored in the case of global clusters. However, since dynamic lifecycle blocks are not supported, the change I'm proposing is to have both aws_rds_cluster.this and aws_rds_cluster.this_ignore_engine_version. Then in the locations that reference this resource, add a ternary to select the correct instance of the resource.

What are your thoughts, @antonbabenko? Maybe there is a more simple workaround I'm overlooking.

[x] ✋ I have searched the open/closed issues and my issue is not listed.

Versions

Module version [Required]: ~> 7.1.0" and 9.0.0
Terraform version: 1.5.7 and 1.6.6
Provider version(s):
- provider registry.terraform.io/hashicorp/aws v5.33.0
- provider registry.terraform.io/hashicorp/external v2.3.2
- provider registry.terraform.io/hashicorp/local v2.4.1
- provider registry.terraform.io/hashicorp/null v3.2.2
- provider registry.terraform.io/hashicorp/random v3.6.0

Expected behavior

I expect this to happen given the note in the provider. What I would expect given the proposed change is that the isn't an inconsistent plan and all upgrades go well.

Actual behavior

The error is given as documented in the provider.

│ Error: Provider produced inconsistent final plan
│
│ When expanding the plan for module.ocf_pg_db.module.this.aws_rds_cluster.this[0] to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value for .engine_version: was
│ cty.StringVal("15.5"), but now cty.StringVal("13.6").
│
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

This is because when upgrading a global cluster, AWS upgrades the members, then when terraform attempts to upgrade the member, if that happens in a certain order the state of the member is not the same as in the state. Ignoring the change to engine_version in the member cluster would avoid the issues.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

theherk commented 7 months ago

This issue isn't stale aside from its resolution awaiting review.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

theherk commented 6 months ago

Same as before. The resolution is awaiting review.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

theherk commented 4 months ago

Still alive and awaiting feedback.

alisson276 commented 3 months ago

I'd say this is worse than that... If you have engine_version set and have auto_minor_version_upgrade set to true or even unset (null), if the cluster get's un update, the terraform code becomes inconsistent, because a cluster can't be downgraded.

I'd like a way of having an ignore_changes block only when auto_minor_version_upgrade is not false

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 2 months ago

This issue was automatically closed because of stale in 10 days

github-actions[bot] commented 1 month ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

terraform-aws-modules / terraform-aws-rds-aurora