mongodb / terraform-provider-mongodbatlas

Terraform MongoDB Atlas Provider: Deploy, update, and manage MongoDB Atlas infrastructure as code through HashiCorp Terraform
https://registry.terraform.io/providers/mongodb/mongodbatlas
Mozilla Public License 2.0
242 stars 169 forks source link

`replication_specs` do not support deep diff #1544

Closed yulifelimited closed 11 months ago

yulifelimited commented 12 months ago

Hello!

Changes to replication_specs in mongodbatlas_cluster always show the entire replication_specs as having to be re-created instead of a diff on individual properties that have changed.

Terraform CLI and Terraform MongoDB Atlas Provider Version

Terraform v1.4.6
on darwin_arm64
+ provider registry.terraform.io/mongodb/mongodbatlas v1.12.0

We use terragrunt, but that won't change the output.

Terraform Configuration File

Those are excerpts from our Dedicated terraform modules. The exact config doesn't matter so much, you can replicate by making any changes to replication_specs with any cluster configuration.

main.tf

resource "mongodbatlas_cluster" "this" {
  project_id                     = var.mongodb_project_id
  name                           = local.name
  cloud_backup                   = var.mongodb_cloud_backup
  auto_scaling_disk_gb_enabled   = var.mongodb_auto_scaling_disk_gb_enabled
  mongo_db_major_version         = var.mongodb_major_version
  termination_protection_enabled = var.termination_protection_enabled

  provider_name               = "AWS"
  disk_size_gb                = var.mongodb_disk_size_gb
  provider_disk_iops          = var.mongodb_provider_disk_iops
  provider_volume_type        = var.mongodb_provider_volume_type
  provider_instance_size_name = var.mongodb_provider_instance_size_name

  replication_specs {
    num_shards = var.replication_specs.num_shards
    id         = var.replication_specs.id
    zone_name  = var.replication_specs.zone_name

    dynamic "regions_config" {
      for_each = var.replication_specs.regions_config

      content {
        region_name     = regions_config.value.region_name
        electable_nodes = regions_config.value.electable_nodes
        priority        = regions_config.value.priority
        read_only_nodes = regions_config.value.read_only_nodes
        analytics_nodes = regions_config.value.analytics_nodes
      }
    }
  }

  lifecycle {
    ignore_changes = [disk_size_gb]
  }
}

variables.tf

// ... skipping irrelevant variables

variable "replication_specs" {
  description = "Replication Spec for MongoDB Cluster. See the terraform documentation on [Multi-Region clusters](https://registry.terraform.io/providers/mongodb/mongodbatlas/latest/docs/resources/cluster#multi-region-cluster) for more information."

  type = object({
    num_shards = number
    id         = optional(string)
    zone_name  = optional(string)

    regions_config = list(object({
      region_name     = string
      electable_nodes = optional(number)
      priority        = optional(number)
      read_only_nodes = optional(number)
      analytics_nodes = optional(number)
    }))
  })
}

Steps to Reproduce

Terraform a MongoDB cluster using replication_specs. Make a change to any of the replication_specs nested values, and the entire replication_specs will show as being re-created in the terraform plan. This is further emphasised the more complex configuration you use in the replication_specs (such as a multi-regional cluster with RO and/or Analytics nodes)

The changes below remove an analytics_node from the cluster:

variables

replication_specs = {
    num_shards = 1

    regions_config = [
      {
        region_name     = local.region_vars.atlas.region
        electable_nodes = 3
-        analytics_nodes = 1
+        analytics_nodes = 0
        priority        = 7
      }
    ]
  }

Expected Behavior

Deep diff highlights the exact changes to the analytics_nodes. Quick and easy review (region_name should always be displayed to identify the relevant region)

plan

Terraform will perform the following actions:

  # mongodbatlas_cluster.this will be updated in-place
  ~ resource "mongodbatlas_cluster" "this" {
        # (32 unchanged attributes hidden)

      ~ replication_specs {
          # (3 unchanged attributes hidden)

          ~ regions_config {
              ~ analytics_nodes = 1 -> 0
                region_name     = "EU_WEST_2"

              # (4 unchanged attributes hidden)
            }
        }

        # (2 unchanged blocks hidden)
    }

Actual Behavior

Entire replication_specs show as being re-created.

plan

Terraform will perform the following actions:

  # mongodbatlas_cluster.this will be updated in-place
  ~ resource "mongodbatlas_cluster" "this" {
        # (32 unchanged attributes hidden)

      - replication_specs {
          - id         = "5d5bc620f2a30b3ed5d04076" -> null
          - num_shards = 1 -> null
          - zone_name  = "ZoneName managed by Terraform" -> null

          - regions_config {
              - analytics_nodes = 1 -> null
              - electable_nodes = 3 -> null
              - priority        = 7 -> null
              - read_only_nodes = 0 -> null
              - region_name     = "EU_WEST_2" -> null
            }
        }
      + replication_specs {
          + id         = (known after apply)
          + num_shards = 1
          + zone_name  = "ZoneName managed by Terraform"

          + regions_config {
              + analytics_nodes = 0
              + electable_nodes = 3
              + priority        = 7
              + read_only_nodes = 0
              + region_name     = "EU_WEST_2"
            }
        }

        # (2 unchanged blocks hidden)
    }
github-actions[bot] commented 12 months ago

Thanks for opening this issue! Please make sure you've followed our guidelines when opening the issue. In short, to help us reproduce the issue we need:

The ticket INTMDB-1216 was created for internal tracking.

marcosuma commented 12 months ago

HI @yulifelimited thanks for reporting this to us. Is this happening only when you try to change the analytics node? Could you provide a more exhaustive information on what changes within replication_specs trigger this behaviour?

Also any debug outputs would be appreciated in this case.

We want to understand if the issue is within the Terraform script or it's actually the expected behaviour since replication specs might not be edited but only recreated depending on what param you are specifying.

yulifelimited commented 12 months ago

Any changes in the replication_specs show as a re-creation. We've recently proceeded to update 10+ clusters to the new replication_specs attributes (instead of using deprecated attributes such as replication_factor).

Examples

Change to zone_name

  # mongodbatlas_cluster.this will be updated in-place
  ~ resource "mongodbatlas_cluster" "this" {
        # (33 unchanged attributes hidden)

      - replication_specs {
          - id         = "5d14ad26553855c8b92557ad" -> null
          - num_shards = 1 -> null
          - zone_name  = "ZoneName managed by Terraform" -> null

          - regions_config {
              - analytics_nodes = 1 -> null
              - electable_nodes = 3 -> null
              - priority        = 7 -> null
              - read_only_nodes = 0 -> null
              - region_name     = "EU_WEST_2" -> null
            }
        }
      + replication_specs {
          + id         = (known after apply)
          + num_shards = 1
          + zone_name  = "test"

          + regions_config {
              + analytics_nodes = 1
              + electable_nodes = 3
              + priority        = 7
              + read_only_nodes = 0
              + region_name     = "EU_WEST_2"
            }
        }

        # (2 unchanged blocks hidden)
    }

Change to electable_nodes

  # mongodbatlas_cluster.this will be updated in-place
  ~ resource "mongodbatlas_cluster" "this" {
        # (33 unchanged attributes hidden)

      - replication_specs {
          - id         = "5d14ad26553855c8b92557ad" -> null
          - num_shards = 1 -> null
          - zone_name  = "ZoneName managed by Terraform" -> null

          - regions_config {
              - analytics_nodes = 1 -> null
              - electable_nodes = 3 -> null
              - priority        = 7 -> null
              - read_only_nodes = 0 -> null
              - region_name     = "EU_WEST_2" -> null
            }
        }
      + replication_specs {
          + id         = (known after apply)
          + num_shards = 1
          + zone_name  = "ZoneName managed by Terraform"

          + regions_config {
              + analytics_nodes = 1
              + electable_nodes = 5
              + priority        = 7
              + read_only_nodes = 0
              + region_name     = "EU_WEST_2"
            }
        }

        # (2 unchanged blocks hidden)
    }

There is an id marked as (known after apply) but it doesn't actually change once you apply. This is the diff of the state before and after (IAM identity details are redacted such as the role arn and user_id).

❯ autogrunt state pull > after.json
❯ diff before.json after.json
6c6
<   "serial": 12,
---
>   "serial": 13,
326c326
<                 "zone_name": "ZoneName managed by Terraform"
---
>                 "zone_name": "test"
334c334
<                 "next_snapshot": "2023-10-19T11:04:19Z",
---
>                 "next_snapshot": "2023-10-20T23:04:51Z",

There is a Gist with a bare terraform module to deploy a M10 cluster to get debug logs: https://gist.github.com/yulifelimited/4c86f3a8b588feebc468cca750457a33#file-debug-log

I only tried to change the replication_specs zone_name

Thanks

marcosuma commented 11 months ago

After some investigation @yulifelimited, I think this is related to the fact that replication_specs is defined as schema.TypeSet. You can see further context at https://github.com/hashicorp/terraform/issues/21901 and https://github.com/hashicorp/terraform/issues/10520

I am going to look at our https://github.com/mongodb/terraform-provider-mongodbatlas/blob/master/CHANGELOG.md because I am spotting some previous cases.

My answer right now is that I don't think we plan to change it. The only way we could change it is by moving to a TypeList but I don't think we want that. This TypeSet behaviour is something controlled by Terraform itself.

Question: what is this practically implying? I re-produced the scenario (btw thanks for the detailed description) and in the end the update works as expected.

yulifelimited commented 11 months ago

@marcosuma thanks for looking into it in details. The problem is the noise this generates when reviewing plans, which makes it easy for mistakes to be missed. To give a different perspective, imagine the same changes on a multi-regional cluster, deployed on 4 AWS regions (which was our use case when opening this ticket). We end up with this kind of diff in the plan:

Scenario: adding AF_SOUTH_1 region with a read only node.

  # mongodbatlas_cluster.this will be updated in-place
  ~ resource "mongodbatlas_cluster" "this" {
        # (32 unchanged attributes hidden)

      - replication_specs {
          - id         = "62fcaab0e1540a4e43fdadf9" -> null
          - num_shards = 1 -> null
          - zone_name  = "ZoneName managed by Terraform" -> null

          - regions_config {
              - analytics_nodes = 0 -> null
              - electable_nodes = 1 -> null
              - priority        = 5 -> null
              - read_only_nodes = 0 -> null
              - region_name     = "US_EAST_1" -> null
            }
          - regions_config {
              - analytics_nodes = 0 -> null
              - electable_nodes = 1 -> null
              - priority        = 6 -> null
              - read_only_nodes = 0 -> null
              - region_name     = "AP_NORTHEAST_1" -> null
            }
          - regions_config {
              - analytics_nodes = 1 -> null
              - electable_nodes = 1 -> null
              - priority        = 7 -> null
              - read_only_nodes = 0 -> null
              - region_name     = "EU_WEST_2" -> null
            }
        }
      + replication_specs {
          + id         = (known after apply)
          + num_shards = 1
          + zone_name  = "ZoneName managed by Terraform"

          + regions_config {
              + analytics_nodes = 0
              + electable_nodes = 1
              + priority        = 5
              + read_only_nodes = 0
              + region_name     = "US_EAST_1"
            }
          + regions_config {
              + analytics_nodes = 0
              + electable_nodes = 1
              + priority        = 6
              + read_only_nodes = 0
              + region_name     = "AP_NORTHEAST_1"
            }
          + regions_config {
              + analytics_nodes = 0
              + electable_nodes = 1
              + priority        = 7
              + read_only_nodes = 0
              + region_name     = "EU_WEST_2"
            }
          + regions_config {
              + analytics_nodes = 0
              + electable_nodes = (known after apply)
              + priority        = 4
              + read_only_nodes = 1
              + region_name     = "AF_SOUTH_1"
            }
        }

        # (2 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

What you may not see above is that there was an analytics_node in EU_WEST_2 that was going to be removed because the Data engineering team added it on the console (this happened to us!) without telling us, which is easy to miss in the plan because of the way it is displayed. With a nested diff, this problem would have been spotted in seconds.

More region, more noise, to the point it may not fit on a single monitor depending on how many nodes are in your cluster.

This is a serious problem when you consider the risks associated with unexpected changes to a MongoDB cluster - this can end up with service affecting changes that may take a long time to recover from since we would first have to wait for mongodb.com to finish applying the first set of changes before being able to re-configure the cluster again.

marcosuma commented 11 months ago

@yulifelimited let me discuss with the team if we have other options but as I mentioned it doesn't really depend on us.

Would it be worth it for you to raise an issue to Terraform? Maybe they can further address this situation.

As I mentioned there is not really much we can do here. If we were to switch (back) to schema.TypeList we would end up with a worse situation because order mattered in that case so you would find yourself with undesired changes just because perhaps the Atlas Admin API returned the object in a different order.

marcosuma commented 11 months ago

@yulifelimited I created an internal issue to track of this work: https://jira.mongodb.org/browse/INTMDB-1234

We decided we should migrate from TypeSet to TypeList similarly to what we did for mongodbatlas_advanced_cluster. The consequence of this is that:

Closing this issue for now as this is not a bug but a FR and we'll track this internally.

Lastly I want to thank you for reporting this to us and help us improving our code base. Let us know if you have other questions.

yulifelimited commented 11 months ago

Thanks Marco! It is appreciated you are able to sort this out.

If mongodbatlas_advanced_cluster is already using this type internally, I should be able to migrate to using this resource and benefit from detailed diff?

marcosuma commented 11 months ago

@yulifelimited yes you can indeed migrate it to the _advanced_ one - it is in fact the best solution for customers, it's just that we haven't still started the deprecation process hence we don't explicitly ask customers to do so. But if you can migrate, please do so.