particuleio / terraform-kubernetes-addons

Terraform module to deploy curated Kubernetes middlewares on multiple cloud providers.
https://registry.terraform.io/modules/particuleio/addons/kubernetes/latest
Apache License 2.0
201 stars 75 forks source link

[bug] \"for_each\" value depends on resource attributes that cannot be determined until apply #574

Closed mestuddtc closed 2 years ago

mestuddtc commented 2 years ago

Describe the bug

Cannot deploy addons to EKS cluster: terraform says it could not create a plan.

What is the current behavior?

Terraform plan could not be created\r
STDOUT: Releasing state lock. This may take a few moments...

STDERR:
Error: Invalid for_each argument

  on .terraform/modules/eks-addons/modules/aws/kong-crds.tf line 25, in resource \"kubectl_manifest\" \"kong_crds\":
  25:   for_each  = local.kong.enabled && local.kong.manage_crds ? { for v in local.kong_crds_apply : lower(join(\"/\", compact([v.data.apiVersion, v.data.kind, lookup(v.data.metadata, \"namespace\", \"\"), v.data.metadata.name]))) => v.content } : {}
    ├────────────────
    │ local.kong.enabled is true
    │ local.kong.manage_crds is true
    │ local.kong_crds_apply will be known only after apply

The \"for_each\" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

Error: Invalid for_each argument

  on .terraform/modules/eks-addons/modules/aws/kube-prometheus-crd.tf line 29, in data \"http\" \"prometheus-operator_crds\":
  29:   for_each = (local.victoria-metrics-k8s-stack.enabled && local.victoria-metrics-k8s-stack.install_prometheus_operator_crds) || (local.kube-prometheus-stack.enabled && local.kube-prometheus-stack.manage_crds) ? toset(local.prometheus-operator_crds) : []
    ├────────────────
    │ local.kube-prometheus-stack.enabled is true
    │ local.kube-prometheus-stack.manage_crds is true
    │ local.prometheus-operator_crds is tuple with 8 elements
    │ local.victoria-metrics-k8s-stack.enabled is false
    │ local.victoria-metrics-k8s-stack.install_prometheus_operator_crds is true

The \"for_each\" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the for_each depends on.

How to reproduce? Please include a code sample if relevant.

Terraform file:

provider "aws" {
  region = "us-east-1"
  profile = "ops-release"
}

terraform {
  required_version = ">= 0.13"
  required_providers {
    aws        = "~> 3.0"
    helm       = "~> 2.0"
    kubernetes = "~> 2.0"
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "~> 1.0"
    }
  }

  backend "s3" {
    encrypt        = "true"
  }
}

data "terraform_remote_state" "eks" {
  backend = "s3"

  config = {
    bucket = "tf-state.ops.tradingcentral.com"
    key    = "ops-eks"
    region = "us-east-1"
    profile = "ops-release"
  }
}

provider "kubectl" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
}

data "aws_eks_cluster" "cluster" {
  name = data.terraform_remote_state.eks.outputs.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = data.terraform_remote_state.eks.outputs.eks.cluster_id
}

locals {
  env_name                  = "ops"
  cluster_name              = "ops-eks"
  subnet_cidrs              = ["172.30.64.0/18", "172.30.128.0/18"]
}

# try splitting out some things to do first to try to solve the problems
# EKS has starting everything
module "eks-addons-first" {
  source = "particuleio/addons/kubernetes//modules/aws"
  version = "2.29.0"

  cluster-name = data.terraform_remote_state.eks.outputs.eks.cluster_id

  eks = {
    cluster_oidc_issuer_url = data.terraform_remote_state.eks.outputs.eks.cluster_oidc_issuer_url
  }

  # Handle events causing unavailability of EC2 instances
  aws-node-termination-handler = {
    enabled = true
    # FIXME: figure out how to use SQS queue-processor mode
  }

  # Network policy engine. Note this chart recommends using tigera-operator instead
  tigera-operator = {
    enabled = true
  }
}

module "eks-addons" {
  source = "particuleio/addons/kubernetes//modules/aws"

  depends_on = [module.eks-addons-first]

  cluster-name = data.terraform_remote_state.eks.outputs.eks.cluster_id

  eks = {
    cluster_oidc_issuer_url = data.terraform_remote_state.eks.outputs.eks.cluster_oidc_issuer_url
  }

  # disable things already created by eks-addons-first
  priority-class = {
    create = false
  }
  priority-class-ds = {
    create = false
  }

  # Use EBS for persistent volumes
  aws-ebs-csi-driver = {
    enabled = true
  }

  # Scale worker nodes based on workload
  cluster-autoscaler = {
    enabled = true
    version = "v1.22.1"
  }

  # Synchronise exposed services and ingresses with DNS providers (route53)
  external-dns = {
    external-dns = {
      enabled = true
        extra_values = "policy: 'sync'"
    },
  }

  # Kong ingress controller
  kong = {
    enabled = true
  }

  # Prometheus & related services for monitoring and alerting
  # Note that grafana is removed, to be added via grafana operator separately
  kube-prometheus-stack = {
    enabled = true
    # the only way to get this deploying was to enable the thanos sidecar
    thanos_sidecar_enabled = true
    extra_values = <<VALUES
grafana:
  enabled: false
VALUES
  }

  # Use prometheus metrics for autoscaling
  prometheus-adapter = {
    enabled = true
  }

  # Set up monitoring of endpoints over HTTP, HTTPS, DNS, TCP and ICMP
  prometheus-blackbox-exporter = {
    enabled = true
  }

  metrics-server = {
    enabled       = true
    allowed_cidrs = local.subnet_cidrs
  }

  npd = {
    enabled = true
  }
}

What's the expected behavior?

Terraform plans and deploys everything as configured.

Environment details

Any other relevant info

This is being run from the community.general.terraform ansible module. It has worked -- yesterday it did, but then did not an hour later and has not since. I am 99% certain no changes were made, as I was working on things further down the playbook. The backend configuration is being passed in via ansible, and the whole terraform directory and backend state were deleted each time the EKS was destroyed to start over.

ArchiFleKs commented 2 years ago

This is weird I never have any issue, with the CRDs, this module is tested with Terragrunt, I do not have any example with Terraform. Could you try to put manage_crds to false in kong and kube-prometheus-stack to see if the plan pass at least

mestuddtc commented 2 years ago

Definitely weird, especially because it was working with the CRDs earlier. manage_crds = false allows the deployment to succeed. It appears the CRDs are still installed? That was not what I expected.

ArchiFleKs commented 2 years ago

@mestuddtc CRDs are still installed by the charts, it is just that Helm does not manage the CRDs lifecycle, meaning when you helm upgrade to a new major version that might need new CRDs (like with kube-prometheus-stack CRDs) the CRDs are not upgraded by default.

Having them managed via TF allow to upgrade them without having to do it manually. The repo I use for testing is this one

ArchiFleKs commented 2 years ago

Also can you try an apply and not a plan to see if you get the same error ? Or is it just with plan ?

mestuddtc commented 2 years ago

apply has the same errors as plan. Interestingly, if I apply with manage_crds = false, then set manage_crds = true, apply works. So it seems to only have issue with the initial run?

ArchiFleKs commented 2 years ago

@mestuddtc is this issue still happening ?

mestuddtc commented 2 years ago

I gave up on trying to use this project. But yes, I get the exact same error with 4.0.0.

ArchiFleKs commented 2 years ago

Closing as this does not seems relevant anymore and depends a lot of Terragrunt/Terraform version used. Please reopen if needed