hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.74k stars 9.1k forks source link

[Bug]: The AWS Redshift Serverless resources created using resource "aws_redshiftdata_statement" are NOT maintaining its state after few days #37301

Open math77j opened 4 months ago

math77j commented 4 months ago

Terraform Core Version

1.7.1

AWS Provider Version

5.39.0

Affected Resource(s)

aws_redshiftdata_statement

Expected Behavior

We are using the terraform resources "aws_redshiftserverless_namespace" & "aws_redshiftserverless_workgroup" to create a Redshift serverless environment and "aws_redshiftdata_statement" to create the database resources within this serverless environment. These database resources include an external schema, and a few materialized views to load stream data from Kafka topics on the AWS MSK cluster.

When we run the Terraform script, the resources are created successfully. But after a few days, even though there was NO change to the script, the Terraform plan shows that these resources needed to be added again. Also, you can see that the Terraform state shows these resources already exist.

The expected behavior is that it should show that the resources don't change state after a few days.

Actual Behavior

The terraform plan shows that all the redshift resources created using "aws_redshiftdata_statement" are new or needed to be added.

Relevant Error/Panic Output Snippet

(base) mjoseph@MathewJ redshift_poc % terraform plan     
aws_cloudwatch_event_rule.rs_mv_refresh_schedule_rule: Refreshing state... [id=rs-poc-mv-refresh-schedule-rule]
aws_iam_role.rs_iam_role: Refreshing state... [id=rs-poc-iam-role]
aws_redshiftserverless_namespace.rs_poc_ns: Refreshing state... [id=rs-poc-ns]
aws_redshiftserverless_workgroup.rs_poc_wg: Refreshing state... [id=rs-poc-wg]
aws_redshiftdata_statement.create_external_schema: Refreshing state... [id=bf4d488f-5f1b-457a-869f-0f74c08fec3b]
aws_redshiftdata_statement.create_mv_subscriber: Refreshing state... [id=841662f3-bc43-4299-8beb-a4a42003b65b]
aws_redshiftdata_statement.create_view_device_metadata: Refreshing state... [id=fb353278-ec32-48c2-89f4-87e5831404c9]
aws_redshiftdata_statement.create_mv_device_metadata_json: Refreshing state... [id=da90b707-7696-41a9-ae08-fa94866b71dd]
aws_cloudwatch_event_target.rs_mv_refresh_schedule_target: Refreshing state... [id=rs-poc-mv-refresh-schedule-rule-rs_mv_refresh_schedule_target]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_redshiftdata_statement.create_external_schema will be created
  + resource "aws_redshiftdata_statement" "create_external_schema" {
      + database       = "rs_poc_db"
      + id             = (known after apply)
      + secret_arn     = "arn:aws:secretsmanager:us-east-X:XXXXXXXX:secret:redshift!rs-poc-ns-rsadmin-Nbii95"
      + sql            = <<-EOT
            DROP SCHEMA IF EXISTS rs_poc_kafka_schema CASCADE;
            CREATE EXTERNAL SCHEMA rs_poc_kafka_schema 
                FROM MSK 
                    IAM_ROLE default 
                    AUTHENTICATION none 
                    CLUSTER_ARN 'arn:aws:kafka:us-east-X:XXXXXX:cluster/dev-data-hub-Kafka/5b087aa2-54a9-43fa-8185-b33fa1d45d30-7';
        EOT
      + workgroup_name = "rs-poc-wg"
    }

  # aws_redshiftdata_statement.create_mv_subscriber will be created
  + resource "aws_redshiftdata_statement" "create_mv_subscriber" {
      + database       = "rs_poc_db"
      + id             = (known after apply)
      + secret_arn     = "arn:aws:secretsmanager:us-east-2:XXXXXXXX:secret:redshift!rs-poc-ns-rsadmin-Nbii95"
      + sql            = <<-EOT
            DROP MATERIALIZED VIEW IF EXISTS mv_subscriber;
                            CREATE MATERIALIZED VIEW mv_subscriber DISTKEY(1) SORTKEY(2) AS
                                SELECT
                                    JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'id', true)::varchar(64) AS id,
                                    NULLIF(JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'timestamp', true), ''):: BIGINT as event_ts,
                                    JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'identifiers', 'Account Number', true)::varchar(64) as account_number,
                                    JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'attributes', 'zipCode', true)::varchar(64) as zip_code
                                FROM
                                    rs_poc_kafka_schema."subscriber"
                                WHERE CAN_JSON_PARSE(kafka_value);
                            REFRESH MATERIALIZED VIEW mv_subscriber;
        EOT
      + workgroup_name = "rs-poc-wg"
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Terraform Configuration Files

1. Enable AWS Provider

provider "aws" { region = "us-east-2" }

2. Create Redshift Serverless Namespace

resource "aws_redshiftserverless_namespace" "rs_poc_ns" { namespace_name = "${var.resource_prefix}-ns" admin_username = var.rs_admin_username manage_admin_password = true db_name = var.rs_db_name iam_roles = [ aws_iam_role.rs_iam_role.arn ] default_iam_role_arn = aws_iam_role.rs_iam_role.arn tags = { Environment = var.environment } }

2.1 Create Redshift Serverless Workgroup

resource "aws_redshiftserverless_workgroup" "rs_poc_wg" { depends_on = [ aws_redshiftserverless_namespace.rs_poc_ns ] namespace_name = "${var.resource_prefix}-ns" workgroup_name = "${var.resource_prefix}-wg" base_capacity = 8 publicly_accessible = false enhanced_vpc_routing = true subnet_ids = var.rs_poc_subnet_ids security_group_ids = var.rs_poc_security_group_ids config_parameter { parameter_key = "enable_case_sensitive_identifier" parameter_value = true } tags = { Environment = var.environment } }

3 Create Redshift external schema using - aws_redshiftdata_statement

resource "aws_redshiftdata_statement" "create_external_schema" { workgroup_name = aws_redshiftserverless_workgroup.rs_poc_wg.workgroup_name database = aws_redshiftserverless_namespace.rs_poc_ns.db_name secret_arn = aws_redshiftserverless_namespace.rs_poc_ns.admin_password_secret_arn sql = <<-EOT DROP SCHEMA IF EXISTS ${var.rs_msk_external_schema_name} CASCADE; CREATE EXTERNAL SCHEMA ${var.rs_msk_external_schema_name} FROM MSK IAM_ROLE default AUTHENTICATION none CLUSTER_ARN '${var.msk_cluster_arn}'; EOT }

4 Create Redshift materialized view using - aws_redshiftdata_statement

resource "aws_redshiftdata_statement" "create_mv_subscriber" { depends_on = [ aws_redshiftdata_statement.create_external_schema ] workgroup_name = aws_redshiftserverless_workgroup.rs_poc_wg.workgroup_name database = aws_redshiftserverless_namespace.rs_poc_ns.db_name secret_arn = aws_redshiftserverless_namespace.rs_poc_ns.admin_password_secret_arn sql = <<EOT DROP MATERIALIZED VIEW IF EXISTS mv_subscriber; CREATE MATERIALIZED VIEW mv_subscriber DISTKEY(1) SORTKEY(2) AS SELECT JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'id', true)::varchar(64) AS id, NULLIF(JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'timestamp', true), ''):: BIGINT as event_ts, JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'identifiers', 'Account Number', true)::varchar(64) as account_number, JSON_EXTRACT_PATH_TEXT(FROM_VARBYTE(kafka_value, 'utf-8'), 'attributes', 'zipCode', true)::varchar(64) as zip_code FROM ${var.rs_msk_external_schema_name}."subscriber" WHERE CAN_JSON_PARSE(kafka_value); REFRESH MATERIALIZED VIEW mv_subscriber; EOT }

Steps to Reproduce

Run terraform plan/apply to create the resources, and after a few days run: terraform plan This will show that the same resources need to be added again.

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 4 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

math77j commented 3 months ago

Any update on this request? It is showing the same behavior even on the latest version of Terraform v1.8.5 with the provider aws v5.50.0.

math77j commented 3 months ago

As per an old ticket (https://github.com/hashicorp/terraform-provider-aws/pull/26343), I tried adding:
lifecycle { ignore_changes = all } but, there was no difference. Can someone shed some light on a reliable way of using the "aws_redshiftdata_statement" or any other way of deploying and managing the auto-deployment of Redshift database resources?