hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.58k stars 9.54k forks source link

Misleading documentation on provider configuration #30910

Open th0masb opened 2 years ago

th0masb commented 2 years ago

The documentation on configuring providers here seems to be incorrect.

You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration).

However I provide an example below that creates a resource and then uses the resource attributes (ones not known before it was created) to configure the Kubernetes provider. I looked at the deps graph output by terraform graph and it seems like the providers are part of the graph and can depend on resources?

Terraform Version

Terraform v1.1.8
on linux_amd64

Your version of Terraform is out of date! The latest version
is 1.1.9. You can update by downloading from https://www.terraform.io/downloads.html

Terraform Configuration Files

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "4.11.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.7.1"
    }
  }
}

variable "project" {
  type = string
}

resource "random_id" "env_id" {
  byte_length = 2
}

locals {
  resource_name  = "test-gke-${random_id.env_id.hex}"
  cluster_region = "europe-west4"
}

resource "google_compute_network" "this" {
  project = var.project
  name    = local.resource_name
  auto_create_subnetworks = false
  routing_mode            = "GLOBAL"
}

resource "google_compute_subnetwork" "cluster_subnet" {
  project       = var.project
  name          = "${local.resource_name}-subnet"
  region        = local.cluster_region
  network       = google_compute_network.this.name
  ip_cidr_range = "10.2.155.0/24"
  secondary_ip_range {
    ip_cidr_range = "10.244.0.0/16"
    range_name    = "gke-pods"
  }
  secondary_ip_range {
    ip_cidr_range = "10.245.0.0/16"
    range_name    = "gke-services"
  }
}

module "gke_cluster" {
  source                   = "terraform-google-modules/kubernetes-engine/google"
  version                  = "19.0.0"
  project_id               = var.project
  name                     = local.resource_name
  region                   = local.cluster_region
  network                  = google_compute_network.this.name
  subnetwork               = google_compute_subnetwork.cluster_subnet.name
  ip_range_pods            = google_compute_subnetwork.cluster_subnet.secondary_ip_range[0].range_name
  ip_range_services        = google_compute_subnetwork.cluster_subnet.secondary_ip_range[1].range_name
  remove_default_node_pool = true

  node_pools = [
    {
      name         = "default"
      machine_type = "e2-standard-2"
      autoscaling  = true
      min_count    = 0
      max_count    = 10
      disk_size_gb = 100
      disk_type    = "pd-standard"
      image_type   = "COS_CONTAINERD"
      auto_repair  = true
      auto_upgrade = true
    }
  ]
}

data "google_client_config" "this" {}

provider "kubernetes" {
  token                  = data.google_client_config.this.access_token
  host                   = "https://${module.gke_cluster.endpoint}"
  cluster_ca_certificate = base64decode(module.gke_cluster.ca_certificate)
}

resource "kubernetes_config_map" "this" {
  metadata {
    name = local.resource_name
  }
  data = {
    "something.txt" = "data"
  }
}

Am I missing something?

jbardin commented 2 years ago

Hi @th0masb,

I think the key phrase in the quoted docs is " you can safely reference" . As you have shown the references will work in the configuration, however computed resource attributes may not be known during planning so the provider may not be fully configured at that time. Depending on the provider this may work just fine, may never work, or fall somewhere in between where it fails only under certain conditions. Users often attempt these monolithic configurations with the kubernetes provider, and you can probably find numerous issues in their repository about the various failure modes.

While we definitely don't recommend configurations like this, they are allowed for backwards compatibility. Perhaps the documentation could be improved here without adding too much unnecessary detail and prevent some confusion.

jesseschalken commented 2 years ago

@jbardin

Can you provide an example of precisely how the above configuration might fail?

How can someone know if they are in the "works just fine" category?

I don't think a merely abstract admonishment from either you or the docs is enough to outweigh the enormous simplification that comes with being able to manage both GKE and cluster resources in the same Terraform config if it works just fine according to our testing.

This SO answer for example suggests to do it this way and has been working for enough people to get approval and 9 upvotes with no cautionary comments. Your comment and the docs remain in stark contrast with people's experience as far as I can find.

jbardin commented 2 years ago

29182 has a list of related issues, and specifically here is a comment indicating how the kubernetes provider can fail. Terraform cannot predict if a particular provider may fail when getting unknown values during the plan, so the best we can do is to advise against it until there is a more complete solution to handling multi-layered configurations of this sort.

th0masb commented 2 years ago

Thanks @jbardin that was a helpful thread to read and good to get some clarity on our use case of the K8s provider in particular. I think it would be good to add some more detail in the docs though, changing the quoted paragraph above to something like? What do you think?

You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This is because, in general, in order for a provider to produce a plan of changes the information to configure it must be independent of the state of resources managed by the module. There are limited exceptions to this, such as the case that a complete plan can be produced only by inspecting the terraform state file which allows a provider to be configured lazily. However, it is not recommended to rely on provider implementation behaviour. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration).

jesseschalken commented 2 years ago

@jbardin Thanks, that explains why it worked in our testing, but we do also need kubernetes_manifest resources and indeed a terraform plan with the GKE cluster missing fails to plan a kubernetes_manifest due to missing config in the Kubernetes provider.

I think the documentation should definitely be precise about this as @th0masb posted above. Specifically, it shouldn't write off such a broadly used configuration as "unsafe" but instead say that whether resources can be planned before their provider is configured, and whether resource attributes are available before being created is provider-dependent. It is not "unsafe", the safety depends on the resource and provider question.

alexsomesan commented 2 years ago

@jesseschalken The Kubernetes provider documentation for the kubernetes_manifest resource makes this requirement as clear as it could, just have a look here: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/manifest#before-you-use-this-resource

jesseschalken commented 2 years ago

@alexsomesan That says that kubernetes_manifest needs cluster access during plan time, but where are the docs saying that the other resources don't?

I can't find anything saying what resources are safe to use with providers configured from other resources, despite this being a widely used configuration. There is only broad caution against it here, even though it works fine for many resources.