mongodb / terraform-provider-mongodbatlas

Terraform MongoDB Atlas Provider: Deploy, update, and manage MongoDB Atlas infrastructure as code through HashiCorp Terraform
https://registry.terraform.io/providers/mongodb/mongodbatlas
Mozilla Public License 2.0
242 stars 168 forks source link

Regression: Data Source for mongodbatlas_cluster makes terraform hang indefinitely using version 1.0 #521

Closed devon65 closed 3 years ago

devon65 commented 3 years ago

Terraform CLI and Terraform MongoDB Atlas Provider Version

Terraform v1.0.4
on darwin_amd64
+ provider registry.terraform.io/mongodb/mongodbatlas v1.0.0

Terraform Configuration File

terraform {
  required_version = ">= 0.14.4"

  required_providers {
    mongodbatlas = {
      source = "mongodb/mongodbatlas"
      version = "= 1.0"
    }
  }
}

provider "mongodbatlas" {}

data "mongodbatlas_cluster" "cluster" {
  project_id   = var.mongodb_project_id
  name         = var.cluster_name
}

output "cluster_name" {
  value = data.mongodbatlas_cluster.cluster.name
}

variable "mongodb_project_id" {}
variable "cluster_name" {}

Steps to Reproduce

  1. Create a MongoDB cluster
  2. Create an API key with access to your cluster
  3. Pass in your cluster's project id, name and API key info
  4. terraform init
  5. terraform plan or terraform apply

Expected Behavior

The terraform should output my cluster's name in a timely manner (within 60 seconds max).

Actual Behavior

The terraform will hang indefinitely, as far as I can tell. I've allowed it to run for over 5 minutes and it still outputs nothing.

Crash Output

I have created a crash.log for this issue, but I couldn't guarantee that I had removed all sensitive data, so I only included an excerpt of after hitting clt-C. I've duplicated this issue on three different computers (two windows and a mac), so it shouldn't be too difficult to get logs from the replication process. crash-abridged.log

Additional Context

I have tested this in versions 0.9.0 and 0.9.1 and the mongodbatlas_cluster Data Source works as expected in both versions, but not in 1.0

nikhil-mongo commented 3 years ago

@devon65 Thank you for sharing the details and log. We will go through it and get back to you.

themantissa commented 3 years ago

Internal ticket INTMDB-247

themantissa commented 3 years ago

@devon65 can you provide some more details on what type of cluster and network connections you are creating? We believe this is due to a fix we did based on issue #422. The cluster data source was returning without a connection string because it was returning before the string was available. We added in a timeout but if you do not have a privatelink connection it should return quicker. Hence it would be good to know what you are creating and waiting on.

devon65 commented 3 years ago

When running the terraform posted in the description, the cluster is already made under a separate terraform plan. I'm just trying to retrieve data from the created cluster. Here's the config I used to create the cluster (cluster creation terraform has already succeeded):

resource "mongodbatlas_cluster" "mongo_cluster" {
  project_id   = var.mongodb_project_id
  name         = var.cluster_name
  cluster_type = var.cluster_type
  provider_region_name = var.cluster_az_region

  auto_scaling_disk_gb_enabled = var.cluster_autoscale_disk_space_enabled
  mongo_db_major_version       = var.cluster_mongodb_major_version

  //Provider Settings "block"
  provider_name               = "AZURE"
  provider_disk_type_name     = var.cluster_disk_type
  provider_instance_size_name = var.cluster_instance_size
}

variable "mongodb_project_id" {
    type = string
}

variable "cluster_name" {
    type = string
}

variable "cluster_type" {
    type = string
    default = "REPLICASET"
}

variable "cluster_az_region" {
    type = string
    default = "US_WEST_2"
}

variable "cluster_autoscale_disk_space_enabled" {
    type = bool
    default = false
}

variable "cluster_mongodb_major_version" {
    type = string
    default = "5.0"
}

variable "cluster_disk_type" {
    type = string
    description = "Determines initial memory size of cluster"
    default = "P2"
}

variable "cluster_instance_size" {
    type = string
    description = "Tier size of the cluster instance"
    default = "M10"
}
devon65 commented 3 years ago

Another thing I noticed (a separate bug that I still need to create an issue for) is that the container_id return value for the cluster datasource is declared, but never set in the data_source_mongodbatlas_cluster.go file. At least, that's what it looks like. I'm new to go, so I could be wrong, but in the resource_mongodbatlas_cluster.go file, the container_id field is declared and set later.

If your wait command is waiting for all fields to get set in the cluster datasource, it will wait indefinitely for the container_id field to be set. If it's just waiting for the connection string to return, then I'm completely wrong and you can ignore this comment 👌

nicolas-nannoni commented 3 years ago

I can confirm I see the same behaviour since 1.0.0 with an identical setup. A cluster and PrivateLink endpoint created in a Terraform module (weeks ago), that is then used as data source in another Terraform module used to create a simple database user will hang for about 3 minutes before returning the expected response.

In the trace logs, I see my cluster's data being returned fast, and that contains all the connection strings I want (including the PrivateLink ones). I then see this line:

2021-08-24T13:37:42.063-0700 [INFO]  provider.terraform-provider-mongodbatlas_v1.0.0: 2021/08/24 13:37:42 [DEBUG] MongoDB Atlas API Response Details:
[cluster config]
2021-08-24T13:37:42.064-0700 [INFO]  provider.terraform-provider-mongodbatlas_v1.0.0: 2021/08/24 13:37:42 [DEBUG] Waiting for state to become: [PRIVATE_ENDPOINTS_EXISTS NORMAL]: timestamp=2021-08-24T13:37:42.064-0700

And then following lines that keep on being printed every 5 seconds until the plan is finally made:

2021-08-24T13:38:14.965-0700 [TRACE] dag/walk: vertex "module.atlas.provider[\"registry.terraform.io/mongodb/mongodbatlas\"] (close)" is waiting for "module.atlas.data.mongodbatlas_cluster.cluster (expand)"
2021-08-24T13:38:16.245-0700 [TRACE] dag/walk: vertex "module.atlas.aws_ssm_parameter.root_password (expand)" is waiting for "module.atlas.local.uri_with_creds (expand)"
2021-08-24T13:38:16.524-0700 [TRACE] dag/walk: vertex "module.atlas.local.uri (expand)" is waiting for "module.atlas.data.mongodbatlas_cluster.cluster (expand)"
2021-08-24T13:38:16.524-0700 [TRACE] dag/walk: vertex "module.atlas (close)" is waiting for "module.atlas.aws_ssm_parameter.root_password (expand)"
2021-08-24T13:38:16.524-0700 [TRACE] dag/walk: vertex "module.atlas.local.uri_with_creds (expand)" is waiting for "module.atlas.local.uri (expand)"
2021-08-24T13:38:16.525-0700 [TRACE] dag/walk: vertex "meta.count-boundary (EachMode fixup)" is waiting for "module.atlas (close)"
2021-08-24T13:38:16.525-0700 [TRACE] dag/walk: vertex "root" is waiting for "provider[\"registry.terraform.io/hashicorp/aws\"] (close)"
2021-08-24T13:38:16.525-0700 [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)" is waiting for "module.atlas.aws_ssm_parameter.root_password (expand)"

(...)

2021-08-24T13:40:42.067-0700 [INFO]  provider.terraform-provider-mongodbatlas_v1.0.0: 2021/08/24 13:40:42 [DEBUG] MongoDB Atlas API Request Details:
2021-08-24T13:40:43.596-0700 [INFO]  provider.terraform-provider-mongodbatlas_v1.0.0: 2021/08/24 13:40:43 [DEBUG] MongoDB Atlas API Response Details:
[cluster config (identical to the one retrieved 3 minutes earlier)]

The same modules were working fine and were fast before 1.0.0.

themantissa commented 3 years ago

@nicolas-nannoni thank you for the additional context and @devon65 as well. It feels like a regression as noted, along with the improvement. I'll have the team pursue further.

devon65 commented 3 years ago

@nicolas-nannoni You mentioned that you tested it with a cluster that has PrivateLink connection strings. My cluster doesn't have any privateLink Connection strings. By coincidence, I've been messing around with the private link stuff today, and it turns out that the data source will return (after 3 minutes) when there are PrivateLinks present in the cluster, but it will hang indefinitely if the cluster has no PrivateLinks present.

I took a look at the fix that helped to wait for the connection strings, and it looks like that code change is waiting for the PrivateLinks connection strings, which is an empty list whether the cluster has PrivateLinks or not.

At least, that's how I understood the code from a quick glance. Once again, I'm new to Go, so feel free to correct me if I'm wrong.

devon65 commented 3 years ago

If the data source waits 3 minutes each time, it could be good to have an optional "wait_for_connection_strings" flag.

themantissa commented 3 years ago

We have a pre-release of 1.0.1 ready - we'll release the GA version tomorrow. If you have time to try it out before then it's here: https://github.com/mongodb/terraform-provider-mongodbatlas/releases/tag/v1.0.1-pre.1

themantissa commented 3 years ago

Fixed in recent release 1.0.1