terraform-aws-modules / terraform-aws-msk-kafka-cluster

Terraform module to create AWS MSK (Managed Streaming for Kafka) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/msk-kafka-cluster/aws
Apache License 2.0
55 stars 53 forks source link

feat: Allow MSK configuration changes on running clusters #17

Closed GreggSchofield closed 5 months ago

GreggSchofield commented 8 months ago

Description

Motivation and Context

For the latest version of this module v2.3.0, an operator cannot change the MSK configuration for a cluster which has already been created with this module. As pointed out by @ascpikmin in issue #16, the aws_msk_configuration resource requires a lifecycle block with create_before_destroy = true set. This in turn requires the name attribute of the aws_msk_configuration resource to be unique.

This pull-request aims to resolve issue #16.

Breaking Changes

This change preserves backwards compatibility with the current major version.

How Has This Been Tested?

This has been tested by executing a terraform apply using the following module declaration:

module "complete_mks_cluster_test" {
  source = "github.com/GreggSchofield/terraform-aws-msk-kafka-cluster" # Head of fork

  name                   = format("%s-test-cluster", terraform.workspace)
  kafka_version          = "3.6.0"
  number_of_broker_nodes = 3

  broker_node_client_subnets  = data.aws_subnets.subnets.ids
  broker_node_instance_type   = "kafka.m7g.large"
  broker_node_security_groups = [module.complete_security_group.security_group_id]

  broker_node_storage_info = {
    ebs_storage_info = {
      volume_size = 100
    }
  }
  scaling_max_capacity = 200
  scaling_target_value = 80

  encryption_at_rest_kms_key_arn = try(data.aws_kms_key.id_env_kms_key.arn, null)

  encryption_in_transit_client_broker = "TLS"
  encryption_in_transit_in_cluster    = true

  configuration_name              = format("%s-test-configuration", terraform.workspace)
  configuration_server_properties = {}

then setting the configuration_server_properties attribute to {"auto.create.topics.enable" = true} to force a new MSK Cluster configuration version to be created :

module "complete_mks_cluster_test" {
  source = "github.com/GreggSchofield/terraform-aws-msk-kafka-cluster" # Head of fork

  name                   = format("%s-test-cluster", terraform.workspace)
  kafka_version          = "3.6.0"
  number_of_broker_nodes = 3

  broker_node_client_subnets  = data.aws_subnets.subnets.ids
  broker_node_instance_type   = "kafka.m7g.large"
  broker_node_security_groups = [module.complete_security_group.security_group_id]

  broker_node_storage_info = {
    ebs_storage_info = {
      volume_size = 100
    }
  }
  scaling_max_capacity = 200
  scaling_target_value = 80

  encryption_at_rest_kms_key_arn = try(data.aws_kms_key.id_env_kms_key.arn, null)

  encryption_in_transit_client_broker = "TLS"
  encryption_in_transit_in_cluster    = true

  configuration_name              = format("%s-test-configuration", terraform.workspace)
  configuration_server_properties = {"auto.create.topics.enable" = true}

Given the current composition of this module, in particular the fact that the aws_msk_configuration resource is created within the module scope, executing terraform plan will yield a single in-place update:

Terraform will perform the following actions:

  # module.complete_mks_cluster_test.aws_msk_configuration.this[0] will be updated in-place
  ~ resource "aws_msk_configuration" "this" {
        id                = "arn:aws:kafka:eu-west-1:account-id:configuration/data-streaming-eu-west-1-stable-int-test-configuration-4909326019387697745/cluster-id"
      ~ latest_revision   = 1 -> (known after apply)
        name              = "data-streaming-eu-west-1-stable-int-test-configuration-4909326019387697745"
      + server_properties = "auto.create.topics.enable = true"
        # (2 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Only once this has been applied, can the operator then execute terraform plan again to yield the desired in-place update for the cluster itself:

Terraform will perform the following actions:

  # module.complete_mks_cluster_test.aws_msk_cluster.this[0] will be updated in-place
  ~ resource "aws_msk_cluster" "this" {
        id                           = "arn:aws:kafka:eu-west-1:account-id:cluster/data-streaming-eu-west-1-stable-int-test-cluster/cluster-id"
        tags                         = {}
        # (12 unchanged attributes hidden)

      ~ configuration_info {
          ~ revision = 1 -> 2
            # (1 unchanged attribute hidden)
        }

        # (5 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

https://github.com/terraform-aws-modules/terraform-aws-msk-kafka-cluster/assets/28576265/1c35263e-ea3e-4081-b5cc-36efa80fb5e2

GreggSchofield commented 8 months ago

Note that when upgrading an existing cluster from v2.3.0 to the head of my fork it will result in a plan like the following:

Terraform will perform the following actions:

  # module.complete_mks_cluster.aws_msk_cluster.this[0] will be updated in-place
  ~ resource "aws_msk_cluster" "this" {
        id                           = "arn:aws:kafka:eu-west-1:account-id:cluster/data-streaming-eu-west-1-stable-int-cluster/cluster-id"
        tags                         = {}
        # (13 unchanged attributes hidden)

      ~ configuration_info {
          ~ arn      = "arn:aws:kafka:eu-west-1:account-id:configuration/data-streaming-eu-west-1-stable-int-configuration/89f1e362-f4e0-4964-ae38-12fae754a66c-8" -> (known after apply)
            # (1 unchanged attribute hidden)
        }

        # (6 unchanged blocks hidden)
    }

  # module.complete_mks_cluster.aws_msk_configuration.this[0] must be replaced
+/- resource "aws_msk_configuration" "this" {
      ~ arn             = "arn:aws:kafka:eu-west-1:account-id:configuration/data-streaming-eu-west-1-stable-int-configuration/configuration-id" -> (known after apply)
      ~ id              = "arn:aws:kafka:eu-west-1:account-id:configuration/data-streaming-eu-west-1-stable-int-configuration/configuration-id" -> (known after apply)
      ~ latest_revision = 1 -> (known after apply)
      ~ name            = "data-streaming-eu-west-1-stable-int-configuration" # forces replacement -> (known after apply) # forces replacement
        # (1 unchanged attribute hidden)
    }

  # module.complete_mks_cluster.random_id.this will be created
  + resource "random_id" "this" {
      + b64_std     = (known after apply)
      + b64_url     = (known after apply)
      + byte_length = 8
      + dec         = (known after apply)
      + hex         = (known after apply)
      + id          = (known after apply)
    }

Plan: 2 to add, 1 to change, 1 to destroy.

Whilst this shouldn't constitute a breaking change in the module API, let me know if you want this documented somewhere.

vl-kp commented 7 months ago

When this PR can be merged?

mvoitko commented 7 months ago

@antonbabenko why does no one review this PR?

mvoitko commented 7 months ago

@GreggSchofield welcome to the club) https://github.com/terraform-aws-modules/terraform-aws-msk-kafka-cluster/pull/12 @bryantbiggs @nawarajshahi I hope you now have evidence that the configuration should change the version.

github-actions[bot] commented 6 months ago

This PR has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this PR will be closed in 10 days

zoonoo commented 5 months ago

This seems to be a reasonable solution to the problem that a lot of us have faced, as specified on the PR description. Do we have an ongoing discussion somewhere else or are we waiting for something else?

@GreggSchofield @mvoitko @bryantbiggs

antonbabenko commented 5 months ago

This PR is included in version 2.5.0 :tada:

github-actions[bot] commented 4 months ago

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

bryantbiggs commented 1 day ago

Are there issues when upgrading MSK clusters with this module that result in naming conflicts similar to what this PR was intended to solve? - #40