terraform-google-modules / terraform-google-kubernetes-engine

Configures opinionated GKE clusters
https://registry.terraform.io/modules/terraform-google-modules/kubernetes-engine/google
Apache License 2.0
1.15k stars 1.18k forks source link

Tests using terraform test framework #2108

Closed wyardley closed 1 month ago

wyardley commented 1 month ago

TL;DR

Is there any movement in the overall ecosystem for these modules to use the Terraform test framework for some of the tests? Converting the existing integration tests might be hard, but using plan only or mock provider tests might be a fast way to get some feedback (and could also probably allow running in publicly visible CI checks in either GHA or cloudbuild, since it wouldn't need real GCP creds at all).

This would make it pretty fast / cheap / easy to test a lot of different scenarios as far as input variables, and shift some of the basic tests "left" a bit.

I'm not sure if it would make sense to template the tests too (probably, though might have to have separate tests or separate tf vars, depending on how varied the scenarios are), or how exactly they'd run in CI, but I'd be happy to try writing a few if there's an example, or if you all are able to setup the skeleton framework.

Feel free to close if this is not planned or not possible.

Terraform Resources

No response

Detailed design

The following (very simple) test should pass if you run terraform init and terraform test in terraform-google-kubernetes-engine/modules/beta-private-cluster as cluster.tftest.hcl. Hopefully this shows the ways it can be used to do at least basic tests of the module's logic by using different variables.

Note that with a plan only test, some of the computed attributes won't be testable. However, for testing some of the logic, I think it will still prove useful.

I can provide an example with a mock provider as well, if desired.

provider "google-beta" {
  project = "foo-testproject"
}

override_data {
  target = data.google_compute_zones.available
  values = {
    id = "projects/foo-testproject/regions/us-central1",
    names = [
      "us-central1-a",
      "us-central1-b",
      "us-central1-c",
      "us-central1-f",
    ],
    project = "foo-testproject"
    region  = "us-central1",
    status  = "null",
  }
}

override_data {
  target = data.google_container_engine_versions.region
  values = {
    latest_master_version = "1.30.4-gke.1348000"
  }
}

override_data {
  target = data.google_container_engine_versions.zone
  values = {
    latest_master_version = "1.30.4-gke.1348000"
  }
}
# Default variables -- override as needed within run blocks for individual
# tests.
variables {
  project_id                 = "foo-testproject"
  name                       = "testcluster"
  regional                   = true
  region                     = "us-central1"
  network                    = "default"
  subnetwork                 = "default"
  ip_range_pods              = "pod-range"
  ip_range_services          = "service-range"
  enable_private_endpoint    = true
  enable_private_nodes       = true
  master_ipv4_cidr_block     = "172.16.0.0/28"
  deletion_protection        = false
  master_authorized_networks = []
  enable_confidential_nodes  = true
}

run "basic" {
  command = plan
  plan_options {
    refresh = false
  }

  assert {
    condition     = google_container_cluster.primary.name == "testcluster"
    error_message = "Cluster name does not match expectation."
  }

  assert {
    condition     = google_container_node_pool.pools["default-node-pool"].initial_node_count == 1
    error_message = "Wrong default pool initial node count."
  }

  assert {
    condition     = google_container_cluster.primary.addons_config[0].dns_cache_config[0].enabled == false
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }

  assert {
    condition     = output.dns_cache_enabled == false
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }
}

run "dns_cache" {
  command = plan
  plan_options {
    refresh = false
  }

  # Simple example of testing a variable override
  variables {
    dns_cache = true
  }

  assert {
    condition     = google_container_cluster.primary.addons_config[0].dns_cache_config[0].enabled == true
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }

  assert {
    condition     = output.dns_cache_enabled == true
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }
}

Additional information

No response

wyardley commented 1 month ago

Working example for same use case as above (will work if dropped into modules/beta-private-cluster) using a fully mocked provider plan / apply vs. a real provider in plan-only mode.

The two approaches can be combined to a certain extent.

mock_provider "google" {
  mock_data "google_container_engine_versions" {
    defaults = {
      latest_master_version = "1.30.4-gke.1348000",
      project               = "foo-testproject"
    }
  }

  # This needs to be mocked in as well, since the full service account name is
  # calculated.
  mock_resource "google_service_account" {
    defaults = {
      member  = "serviceAccount:testcluster-3g4f@foo-testproject.iam.gserviceaccount.com",
      project = "foo-testproject",
    }
  }
}

mock_provider "google-beta" {
  mock_data "google_compute_zones" {
    defaults = {
      id = "projects/foo-testproject/regions/us-central1",
      names = [
        "us-central1-a",
        "us-central1-b",
        "us-central1-c",
        "us-central1-f",
      ],
      project = "foo-testproject"
      region  = "us-central1",
      status  = "null",
    }
  }

  # Because the ID is used in the module, but is computed, we have to mock this.
  mock_resource "google_container_node_pool" {
    defaults = {
      id      = "projects/foo-testproject/locations/us-central1/clusters/testcluster/nodePools/default-pool",
      project = "foo-testproject",
    }
  }

}

# Default variables -- override as needed within run blocks for individual
# tests.
variables {
  project_id                 = "foo-testproject"
  name                       = "testcluster"
  regional                   = true
  region                     = "us-central1"
  network                    = "default"
  subnetwork                 = "default"
  ip_range_pods              = "pod-range"
  ip_range_services          = "service-range"
  enable_private_endpoint    = true
  enable_private_nodes       = true
  master_ipv4_cidr_block     = "172.16.0.0/28"
  deletion_protection        = false
  master_authorized_networks = []
  enable_confidential_nodes  = true
}

run "basic" {
  assert {
    condition     = google_container_cluster.primary.name == "testcluster"
    error_message = "Cluster name does not match expectation."
  }

  assert {
    condition     = google_container_node_pool.pools["default-node-pool"].initial_node_count == 1
    error_message = "Wrong default pool initial node count."
  }

  assert {
    condition     = google_container_cluster.primary.addons_config[0].dns_cache_config[0].enabled == false
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }

  assert {
    condition     = output.dns_cache_enabled == false
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }
}

run "dns_cache" {
  # Simple example of testing a variable override
  variables {
    dns_cache = true
  }

  assert {
    condition     = google_container_cluster.primary.addons_config[0].dns_cache_config[0].enabled == true
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }

  assert {
    condition     = output.dns_cache_enabled == true
    error_message = "Expected dns cache enabled output doesn't match expectations."
  }
}
apeabody commented 1 month ago

Hi @wyardley! We haven't used it in this repo, but support for Plan Assertions was recently added to the CFT blueprint-test. Keep in mind that configurations that successfully plan may still be rejected by the API, but feel free to try it out: https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit/blob/master/infra/blueprint-test/README.md#512-plan-assertions

wyardley commented 1 month ago

I think the main reason for opening this was to see if, higher level, there will be any movement towards using Terraform's builtin test framework now that it exists, as well as suggesting that using it for unit testing could be an interesting way to start using it without fully cutting over to it for things like integration tests.

I can see some pros and cons, and of course, there's obviously been some time invested in writing tooling, not only the one you mentioned, but also things like https://pypi.org/project/tftest/, which GCP also maintains. But, other than the challenges with mocking when using that approach vs. plan-only, the above does add a lot of simplicity and speed, and (for better or for worse), is implemented in an HCL like language.

wyardley commented 1 month ago

Keep in mind that configurations that successfully plan may still be rejected by the API

And yes, definitely aware. But I think where unit tests can add a lot of value is that with modules like this, a lot of the bugs come from the complexity / messiness of HCL itself, and the interaction of various variables. So while it doesn't replace the need for integration testing, unit testing (especially via a framework that's fast and easy to run locally) could speed up iteration and shift finding certain types of issues further left.

apeabody commented 1 month ago

I think the main reason for opening this was to see if, higher level, there will be any movement towards using Terraform's builtin test framework now that it exists, as well as suggesting that using it for unit testing could be an interesting way to start using it without fully cutting over to it for things like integration tests.

I can see some pros and cons, and of course, there's obviously been some time invested in writing tooling, not only the one you mentioned, but also things like https://pypi.org/project/tftest/, which GCP also maintains. But, other than the challenges with mocking when using that approach vs. plan-only, the above does add a lot of simplicity and speed, and (for better or for worse), is implemented in an HCL like language.

Got it - For discussion on the future testing methods for the terraform-google-modules I would recommend moving this to https://github.com/GoogleCloudPlatform/cloud-foundation-toolkit/issues