hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.32k stars 1.73k forks source link

Terraform crash on plan/apply/destroy operation on google-beta container cluster #4756

Closed chrissng closed 4 years ago

chrissng commented 4 years ago

Community Note

Terraform Version

Terraform v0.12.11
+ provider.google v2.18.1
+ provider.google-beta v2.18.1
+ provider.kubernetes v1.7.1
+ provider.null v2.1.2
+ provider.random v2.2.1

Affected Resource(s)

I am using the terraform-google-kubernetes-engine tf module:

module "gke" {
  source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  version = "~> 5.0"

  project_id  = var.project_id
  name        = var.cluster_name
  description = var.cluster_description

  region   = var.region
  regional = var.regional
  zones    = var.zones

  network            = var.network_name
  subnetwork         = var.subnetwork
  ip_range_pods      = var.ip_range_pods
  ip_range_services  = var.ip_range_services
  network_project_id = var.project_id

  ip_masq_link_local = "false"

  http_load_balancing        = false
  horizontal_pod_autoscaling = true
  kubernetes_dashboard       = false
  network_policy             = false

  kubernetes_version     = var.kubernetes_version
  maintenance_start_time = var.maintenance_start_time

  monitoring_service = var.monitoring_service
  logging_service    = var.logging_service

  enable_private_endpoint = false
  enable_private_nodes    = true
  master_ipv4_cidr_block  = var.master_ipv4_cidr_block

  master_authorized_networks_config = [
    {
      cidr_blocks = var.master_access_cidrs
    },
  ]

  # We do not need to create the default service account
  create_service_account = false
  service_account        = local.gke_service_account

  remove_default_node_pool = true

  node_pools = [
    {
      name               = var.default_node_pool_name
      machine_type       = var.default_node_pool_machine_type
      min_count          = var.default_node_pool_min_count
      max_count          = var.default_node_pool_max_count
      initial_node_count = var.default_node_pool_min_count
      auto_repair        = true
      auto_upgrade       = true
      disk_size_gb       = var.default_node_pool_disk_size_gb
      disk_type          = var.default_node_pool_disk_type
      image_type         = "COS"
    },
  ]

  node_pools_labels = {
    all                             = local.all_node_pools_labels
    "${var.default_node_pool_name}" = var.default_node_pool_labels
  }

  node_pools_metadata = {
    all                             = local.all_node_pools_metadata
    "${var.default_node_pool_name}" = var.default_node_pool_metadata
  }

  node_pools_taints = {
    all                             = []
    "${var.default_node_pool_name}" = var.default_node_pool_taints
  }

  node_pools_tags = {
    all                             = local.all_node_pools_tags
    "${var.default_node_pool_name}" = var.default_node_pool_tags
  }

  node_pools_oauth_scopes = {
    all                             = ["https://www.googleapis.com/auth/cloud-platform"]
    "${var.default_node_pool_name}" = ["https://www.googleapis.com/auth/cloud-platform"]
  }

  identity_namespace = "${var.project_id}.svc.id.goog"

  node_metadata = "UNSPECIFIED"
}
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://www.hashicorp.com/security
# If reproducing the bug involves modifying the config file (e.g., apply a config,
# change a value, apply the config again, see the bug) then please include both the
# version of the config before the change, and the version of the config after the change.

Debug Output

N/A

Panic Output

https://gist.github.com/chrissng/c01959daad9050de7339ab44bca03663#file-crash-log


Error: rpc error: code = Unavailable desc = transport is closing

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

Terraform crashed! This is always indicative of a bug within Terraform.
A crash log has been placed at "crash.log" relative to your current
working directory. It would be immensely helpful if you could please
report the crash with Terraform[1] so that we can fix this.

When reporting bugs, please include your terraform version. That
information is available on the first line of crash.log. You can also
get it by running 'terraform --version' on the command line.

[1]: https://github.com/hashicorp/terraform/issues

!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!
panic: runtime error: invalid memory address or nil pointer dereference
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b88522]
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: 
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: goroutine 90 [running]:
2019-10-29T10:57:34.153+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/terraform-providers/terraform-provider-google-beta/google-beta.resourceContainerClusterRead(0xc0000eeb60, 0x205eb80, 0xc0000cf600, 0xc0000eeb60, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-google-beta/google-beta/resource_container_cluster.go:1198 +0xfe2
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/hashicorp/terraform-plugin-sdk/helper/schema.(*Resource).RefreshWithoutUpgrade(0xc000205480, 0xc0004cc1e0, 0x205eb80, 0xc0000cf600, 0xc000abae10, 0x0, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/github.com/hashicorp/terraform-plugin-sdk@v1.0.0/helper/schema/resource.go:455 +0x119
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/hashicorp/terraform-plugin-sdk/internal/helper/plugin.(*GRPCProviderServer).ReadResource(0xc00000f0c0, 0x2849bc0, 0xc000a0e7b0, 0xc0000adf40, 0xc00000f0c0, 0xc000a0e7b0, 0xc000710a80)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/github.com/hashicorp/terraform-plugin-sdk@v1.0.0/internal/helper/plugin/grpc_provider.go:525 +0x3d8
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: github.com/hashicorp/terraform-plugin-sdk/internal/tfplugin5._Provider_ReadResource_Handler(0x23d3420, 0xc00000f0c0, 0x2849bc0, 0xc000a0e7b0, 0xc000503380, 0x0, 0x2849bc0, 0xc000a0e7b0, 0xc000a22a00, 0x24c7)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/github.com/hashicorp/terraform-plugin-sdk@v1.0.0/internal/tfplugin5/tfplugin5.pb.go:3153 +0x217
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000be160, 0x287c1c0, 0xc00038e480, 0xc0005c5600, 0xc0006243c0, 0x367dd90, 0x0, 0x0, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:995 +0x460
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: google.golang.org/grpc.(*Server).handleStream(0xc0000be160, 0x287c1c0, 0xc00038e480, 0xc0005c5600, 0x0)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:1275 +0xd97
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc000476470, 0xc0000be160, 0x287c1c0, 0xc00038e480, 0xc0005c5600)
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:710 +0xbb
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4: created by google.golang.org/grpc.(*Server).serveStreams.func1
2019-10-29T10:57:34.154+0800 [DEBUG] plugin.terraform-provider-google-beta_v2.18.1_x4:  /opt/teamcity-agent/work/5d79fe75d4460a2f/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:708 +0xa1
2019-10-29T10:57:34.155+0800 [DEBUG] plugin: plugin process exited: path=/home/ubuntu/span-gcp-internal/environments/staging/workloads/staging/workload_gke/.terragrunt-cache/abrU2smPxLnsOYT7awftOrehozI/6rEP-yvQYmtwsPrcDCguWsdm3kU/modules/workload_gke/.terraform/plugins/linux_amd64/terraform-provider-google-beta_v2.18.1_x4 pid=118582 error="exit status 2"
2019/10/29 10:57:34 [ERROR] module.workload_gke.module.gke: eval: *terraform.EvalRefresh, err: rpc error: code = Unavailable desc = transport is closing
2019/10/29 10:57:34 [ERROR] module.workload_gke.module.gke: eval: *terraform.EvalSequence, err: rpc error: code = Unavailable desc = transport is closing
2019/10/29 10:57:34 [TRACE] [walkRefresh] Exiting eval tree: module.workload_gke.module.gke.google_container_cluster.primary
2019/10/29 10:57:34 [TRACE] vertex "module.workload_gke.module.gke.google_container_cluster.primary": visit complete
2019/10/29 10:57:34 [TRACE] vertex "module.workload_gke.module.gke.google_container_cluster.primary": dynamic subgraph encountered errors
2019/10/29 10:57:34 [TRACE] vertex "module.workload_gke.module.gke.google_container_cluster.primary": visit complete
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "module.workload_gke.module.gke.output.endpoint" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "module.workload_gke.output.endpoint" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "output.endpoint" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "provider.google-beta (close)" errored, so skipping
2019/10/29 10:57:34 [TRACE] dag/walk: upstream of "root" errored, so skipping
2019-10-29T10:57:34.198+0800 [DEBUG] plugin: plugin exited
2019-10-29T10:57:34.201+0800 [DEBUG] plugin: plugin process exited: path=/usr/local/bin/terraform pid=118518
2019-10-29T10:57:34.201+0800 [DEBUG] plugin: plugin exited

[terragrunt] 2019/10/29 10:57:34 Hit multiple errors:
exit status 1

Expected Behavior

The plan should succeed

Actual Behavior

The terraform google-beta provider crashed

Steps to Reproduce

  1. terraform plan

Important Factoids

Terraform is ran with Terragrunt. I have multiple GKE clusters setup using the same terraform module, however only one particular cluster (this) has the crashing issue.

Issue persists without using Terragrunt (using terraform directly).

References

bluemalkin commented 4 years ago

I also get the error message Error: rpc error: code = Unavailable desc = transport is closing using v2.18. I cannot tell which resource is causing it though.

tysen commented 4 years ago

Have you cut an issue with the terraform core repo and submitted the crash.log file?

chrissng commented 4 years ago

@tysen I believe this is not necessary as the TF devs would redirect us back to the provider.

I've bumped the google-beta provider to the latest version but the issue still persists. The stack trace suggests that the crash happened when there's a null object on this line https://github.com/terraform-providers/terraform-provider-google-beta/blob/release-2.18.1/google-beta/resource_container_cluster.go#L1198

chrissng commented 4 years ago

Taken a look at the REST API responses.

On a working cluster, the response contains an empty shieldedNodes object: https://gist.githubusercontent.com/chrissng/c01959daad9050de7339ab44bca03663/raw/63275eeabc54cf53d56517915c7fe14a078ff0e6/working_cluster_response.json Terraform operations works.

On the cluster that has the crashing issue (this), the response does not contain a shieldedNodes object: https://gist.github.com/chrissng/c01959daad9050de7339ab44bca03663/raw/63275eeabc54cf53d56517915c7fe14a078ff0e6/crash_cluster_response.json

Neither of these clusters have shielded nodes configured, so it is unknown why would the REST API return different results.

chrissng commented 4 years ago

This PR should fix the issue: https://github.com/GoogleCloudPlatform/magic-modules/pull/2555

pdecat commented 4 years ago

Same issue here, my work-around was to force update the cluster with gcloud to get the empty shieldedNodes object in the API response:

# gcloud beta container clusters update my-cluster --zone my-zone --no-enable-shielded-nodes
joemiller commented 4 years ago

Pinning the google-beta provider to 2.17.0 also seems to be a temp fix, fwiw

bluemalkin commented 4 years ago

Pinning the google-beta provider to 2.17.0 also seems to be a temp fix, fwiw

That's what I have also done.

tysen commented 4 years ago

https://github.com/GoogleCloudPlatform/magic-modules/pull/2555 is merged. Fix should be in next release.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error πŸ€– πŸ™‰ , please reach out to my human friends πŸ‘‰ hashibot-feedback@hashicorp.com. Thanks!