hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.76k stars 9.12k forks source link

Switching Dynamodb table/gsi from on-demand to provisioned sets replica autoscaler values incorrectly #23738

Closed coderdude1 closed 4 months ago

coderdude1 commented 2 years ago

Community Note

Terraform CLI and Terraform AWS Provider Version

Terraform CLI: 1.1.4 linux_amd64 AWS provider: 4.5.0 Atlantis and github enterprise

Affected Resource(s)

Terraform Configuration Files

I've included several steps of files that replicate the issue.

1. Starting Condition

Create table and GSI via terraform with PAY_PER_REQUEST, streams enabled, and PITR enabled.

resource "aws_dynamodb_table" "rw_error_ddb" {
    name = "table_demo"
    hash_key = "demo_id"
    range_key = "name"
    billing_mode = "PAY_PER_REQUEST"
    stream_enabled = true
    stream_view_type = "NEW_AND_OLD_IMAGES"
    point_in_time_recovery {
        enabled = true
    }

    attribute {
        name = "demo_id"
        type = "S"
    }

    attribute {
        name = "name"
        type = "S"
    }

    global_secondary_index {
        name            = "name_index"
        hash_key        = "name"
        projection_type = "ALL"
    }

    replica {
        region_name = "eu-west-1"
    }

    tags = {
        Name            = "rw_error_test"
        app             = "rw_error_test"
        env             = "stage"
        region          = "us-east-1"
        team            = "my-team"
    }
}
Results
  1. Creates a table in the us-east-1 with a replica in eu-west-1
  2. both tables have a provisioning of 'on-demand'
  3. The primary table (us-east-1) has PITR enabled and dynamodb stream enabled
  4. The replica table (eu-west-1) has PITR disabled and dynamodb stream disabled (incorrect).
  5. The GSI in the primary and the replica are both 'on-demand' provisioning.
  6. The primary table has tags populated. The replica does not have tags.

2. Change table and GSI from on-demand to provisioned, with an autoscaler for the table AND the GSI

   resource "aws_dynamodb_table" "rw_error_ddb" {
        name = "table_error_demo"
        hash_key = "demo_id"
        range_key = "name"
        billing_mode = "PROVISIONED"
        read_capacity    = 50
        write_capacity   = 50
        stream_enabled = true
        stream_view_type = "NEW_AND_OLD_IMAGES"
        point_in_time_recovery {
            enabled = true
        }

        attribute {
            name = "demo_id"
            type = "S"
        }

        attribute {
            name = "name"
            type = "S"
        }

        global_secondary_index {
            name            = "name_index"
            hash_key        = "name"
            projection_type = "ALL"
            read_capacity    = 50
            write_capacity   = 50
        }

        replica {
            region_name = "eu-west-1"
        }

        tags = {
            Name            = "rw_error_test"
            app             = "rw_error_test"
            env             = "stage"
            region          = "us-east-1"
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_gsi_read_target" {
        max_capacity       = 500
        min_capacity       = 50
        resource_id        = "table/table_error_demo/index/name_index"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:index:ReadCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "table_error_demo_staging_gsi_read_policy" {
        name               = "DynamoDBReadCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBReadCapacityUtilization"
            }

            target_value = 70
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_read_target" {
        max_capacity       = 500
        min_capacity       = 50
        resource_id        = "table/${aws_dynamodb_table.rw_error_ddb.id}"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:table:ReadCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "table_error_demo_staging_read_policy" {
        name               = "DynamoDBReadCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_read_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_read_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_read_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_read_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBReadCapacityUtilization"
            }

            target_value = 70
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_write_target" {
        max_capacity       = 500
        min_capacity       = 50
        resource_id        = "table/${aws_dynamodb_table.rw_error_ddb.id}"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:table:WriteCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "table_error_demo_staging_write_policy" {
        name               = "DynamoDBWriteCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_write_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_write_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_write_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_write_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBWriteCapacityUtilization"
            }

            target_value = 70
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_gsi_write_target" {
        max_capacity       = 500
        min_capacity       = 50
        resource_id        = "table/table_error_demo/index/name_index"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:index:WriteCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "error_table_demo_staging_gsi_write_policy" {
        name               = "DynamoDBWriteCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBWriteCapacityUtilization"
            }

            target_value = 70
        }
    }
Results
  1. Primary table (us-east-1) is now provisioned with read and write autoscaler set to 50-1000.
  2. Primary table GSI (us-east-1) is now provisioned with read and write autoscaler set to 50-1000
  3. replica table (eu-west-1) is now provisioned incorrectly (read/write are both 50-40000)
  4. replica table GSI (eu-west-1) is now provisioned incorrectly (read/write are both 50-40000)
  5. In practice (prod) I've seen ranges of 100,000 and 200,000 assigned for upper limits which exceed our account limit.
  6. We have to manually fix these upper limits before running the next terraform

3. Change the upper/lower limits via terraform (after manually fixing them)

   resource "aws_dynamodb_table" "rw_error_ddb" {
        name = "table_error_demo"
        hash_key = "demo_id"
        range_key = "name"
        billing_mode = "PROVISIONED"
        read_capacity    = 60
        write_capacity   = 100
        stream_enabled = true
        stream_view_type = "NEW_AND_OLD_IMAGES"
        point_in_time_recovery {
            enabled = true
        }

        attribute {
            name = "demo_id"
            type = "S"
        }

        attribute {
            name = "name"
            type = "S"
        }

        global_secondary_index {
            name            = "name_index"
            hash_key        = "name"
            projection_type = "ALL"
            read_capacity    = 60
            write_capacity   = 100
        }

        replica {
            region_name = "eu-west-1"
        }

        tags = {
            Name            = "rw_error_test"
            app             = "rw_error_test"
            env             = "stage"
            region          = "us-east-1"
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_gsi_read_target" {
        max_capacity       = 100
        min_capacity       = 60
        resource_id        = "table/table_error_demo/index/name_index"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:index:ReadCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "table_error_demo_staging_gsi_read_policy" {
        name               = "DynamoDBReadCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_gsi_read_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBReadCapacityUtilization"
            }

            target_value = 70
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_read_target" {
        max_capacity       = 100
        min_capacity       = 60
        resource_id        = "table/${aws_dynamodb_table.rw_error_ddb.id}"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:table:ReadCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "table_error_demo_staging_read_policy" {
        name               = "DynamoDBReadCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_read_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_read_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_read_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_read_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBReadCapacityUtilization"
            }

            target_value = 70
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_write_target" {
        max_capacity       = 100
        min_capacity       = 60
        resource_id        = "table/${aws_dynamodb_table.rw_error_ddb.id}"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:table:WriteCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "table_error_demo_staging_write_policy" {
        name               = "DynamoDBWriteCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_write_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_write_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_write_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_write_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBWriteCapacityUtilization"
            }

            target_value = 70
        }
    }

    resource "aws_appautoscaling_target" "table_error_demo_staging_gsi_write_target" {
        max_capacity       = 100
        min_capacity       = 60
        resource_id        = "table/table_error_demo/index/name_index"
        role_arn           = data.aws_iam_role.autoscaling_role.arn
        scalable_dimension = "dynamodb:index:WriteCapacityUnits"
        service_namespace  = "dynamodb"
    }

    resource "aws_appautoscaling_policy" "error_table_demo_staging_gsi_write_policy" {
        name               = "DynamoDBWriteCapacityUtilization:${aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.resource_id}"
        policy_type        = "TargetTrackingScaling"
        resource_id        = aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.resource_id
        scalable_dimension = aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.scalable_dimension
        service_namespace  = aws_appautoscaling_target.table_error_demo_staging_gsi_write_target.service_namespace

        target_tracking_scaling_policy_configuration {
            predefined_metric_specification {
                predefined_metric_type = "DynamoDBWriteCapacityUtilization"
            }

            target_value = 70
        }
    }
Results
  1. primary table and gsi (us-east-1) are correctly sized for read/write at 60-100
  2. replica table and gsi in eu-west-1 is incorrectly sized at 50-1000 for read and write
  3. Note in other production attempts at this, I've seen the max set to 100,000 and 200,000 but was not able to replicate this time.
  4. I have seen scenarios where we corrected these limits manually and the next time I run terraform the lifecycle ignore changes (which I didn not try to replicate in the above terraform, but was added to prod code) is ignored for the GSI
  5. I have seen scnearios where the primary table has it's upper limit set to 100,000 and/or 200,000 (not just the replicas)

Expected Behavior

  1. Replica tables and GSI should have autoscaler min/max limits set to the same values as the primary table, when changing the values via terraform.
  2. PITR should be enabled in replicas when the primary has it enabled.
  3. Streams should be enabled in replicas when the primary has it enabled.
  4. when using the lifecycle ignore changes for read and write capacity defined, changes due to autoscaler should not cause a terraform to detect changes to the table or the GSI.

Actual Behavior

  1. Replica tables have no way to set tags, nor do they inherit tags from the primary table
  2. Replica tables/GSI have their upper limits on the autoscaler set to values other than requested (I've seen 40,000, 100,000, and 200,000 values)
  3. Replica tables do not have PITR enabled when the primary table does have it enabled via terraform.
  4. Replica tables do not have streams enabled when the primary table does have it enabled via terraform.

Steps to Reproduce

Apply the terraforms in steps as listed. Note the outputs.

Important Factoids

We can replicate this very easily simply by creating an on-demand table (the tags/stream/PITR won't be set on the replicas). When we terraform it to provisioned, the replicas always have issues with the autoscaler limits.

References

NA

github-actions[bot] commented 5 months ago

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

github-actions[bot] commented 3 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.