hashicorp / terraform-provider-kubernetes

Terraform Kubernetes provider
https://www.terraform.io/docs/providers/kubernetes/
Mozilla Public License 2.0
1.6k stars 979 forks source link

kubernetes_manifest: Terraform often fails with "http2: server sent GOAWAY and closed the connection" #1931

Open papanito opened 1 year ago

papanito commented 1 year ago

Terraform Version, Provider Version and Kubernetes Version

Terraform v1.3.2
on windows_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/helm v2.7.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.11.0
+ provider registry.terraform.io/rancher/rancher2 v1.22.2

Affected Resource(s)

Terraform Configuration Files

provider.tf:

terraform {
  required_providers {
    rancher2 = {
      source  = "rancher/rancher2"
      version = "~>1.22.2"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~>2.11.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "~>1.14.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~>2.7.1"
    }
  }

  backend "azurerm" {
    ....
  }
}

provider "rancher2" {
  api_url    = var.RANCHER_NOP_API_URL
  access_key = var.RANCHER_NOP_TOKEN
  secret_key = var.RANCHER_NOP_SECRET
}

provider "kubernetes" {
  host  = "${var.RANCHER_NOP_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
  token = "${var.RANCHER_NOP_TOKEN}:${var.RANCHER_NOP_SECRET}"
}

provider "kubectl" {
  load_config_file = "false"
  host             = "${var.RANCHER_NOP_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
  token            = "${var.RANCHER_NOP_TOKEN}:${var.RANCHER_NOP_SECRET}"
}

provider "helm" {
  kubernetes {
    host  = "${var.RANCHER_NOP_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
    token = "${var.RANCHER_NOP_TOKEN}:${var.RANCHER_NOP_SECRET}"
  }
}

module/gatekeeper/gatekeeper.tf:

resource "kubernetes_manifest" "opa_config" {
  manifest = {
    apiVersion = "config.gatekeeper.sh/v1alpha1"
    kind = "Config"
    metadata = {
      name = "config"
      namespace = "cattle-gatekeeper-system"
      labels = {
          team = "skywalkers"
      }
    }
    spec = {
      match = [{
        excludedNamespaces = ["kube-*", "cattle-*"]
        processes = ["*"]
      }]
    }
  }
}

Debug Output

Panic Output

N/A

Steps to Reproduce

  1. terraform plan

Expected Behavior

Plan succeeds without error

Actual Behavior

Plan fails with an error like this:

│   with module.gatekeeper.kubernetes_manifest.opa_config,
│   on .terraform\modules\gatekeeper\gatekeeper\main.tf line 1934, in resource "kubernetes_manifest" "opa_config":
│ 1934: resource "kubernetes_manifest" "opa_config" {
│
│ The plugin returned an unexpected error from plugin.(*GRPCProvider).UpgradeResourceState: rpc
│ error: code = Unknown desc = failed to determine resource type ID: cannot get OpenAPI foundry:
│ failed get OpenAPI spec: http2: server sent GOAWAY and closed the connection; LastStreamID=199,
│ ErrCode=NO_ERROR, debug=""

Important Factoids

N/A

References

Community Note

alexsomesan commented 1 year ago

This smells like authentication issues, but it's also the first time I've heard of that type of reply from the API server (GOAWAY) 😄

Need to look into potential causes for that error message.

papanito commented 1 year ago

Yeah not very friendly, at least a "please" would be nice 😄 Its pretty random, and after it occurred, a subsequent tf plan often succeeds

santimar commented 1 year ago

Any update on this? I am facing this issue as well, but i keep getting the same error over and over again. A temporary fix seems to be to destroy and recreate the certificate or do a plan/apply with -refresh=false but these solutions are just temporary hacks

This are my versions:

Terraform v1.4.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/kubernetes v2.19.0

and resources

resource "kubernetes_manifest" "selfsigned-ca-issuer" {
  manifest = {
    apiVersion = "cert-manager.io/v1"
    kind       = "ClusterIssuer"
    metadata   = {
      name = "selfsigned-ca-issuer"
    }
    spec = {
      selfSigned = {}
    }
  }
}

resource "kubernetes_manifest" "selfsigned-star-certificate" {
  manifest = {
    apiVersion = "cert-manager.io/v1"
    kind       = "Certificate"
    metadata   = {
      name      = "selfsigned-star-certificate"
      namespace = "default"
    }
    spec = {
      commonName = "*.${var.base_hostname}"
      dnsNames   = [
        "*.${var.base_hostname}"
      ]
      secretName = "selfsigned-star-certificate"
      privateKey = {
        algorithm = "RSA"
        size      = 4096
      }
      issuerRef = {
        name  = kubernetes_manifest.selfsigned-ca-issuer.manifest.metadata.name
        kind  = "ClusterIssuer"
        group = "cert-manager.io"
      }
    }
  }
}

data "kubernetes_secret_v1" "star-certificate" {
  metadata {
    name      = kubernetes_manifest.selfsigned-star-certificate.manifest.spec.secretName
    namespace = kubernetes_manifest.selfsigned-star-certificate.manifest.metadata.namespace
  }
}

after terraform plan i keep getting

module.services.kubernetes_manifest.selfsigned-ca-issuer: Refreshing state...
module.services.kubernetes_manifest.selfsigned-star-certificate: Refreshing state...

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: Plugin error
│ 
│   with module.services.kubernetes_manifest.selfsigned-star-certificate,
│   on services/certificates.tf line 14, in resource "kubernetes_manifest" "selfsigned-star-certificate":
│   14: resource "kubernetes_manifest" "selfsigned-star-certificate" {
│ 
│ The plugin returned an unexpected error from plugin.(*GRPCProvider).PlanResourceChange: rpc error: code = Unknown desc = failed to determine resource type ID: failed to look up GVK [cert-manager.io/v1, Kind=Certificate] among
│ available CRDs: unexpected error when reading response body. Please retry. Original error: http2: server sent GOAWAY and closed the connection; LastStreamID=199, ErrCode=NO_ERROR, debug=""
santimar commented 1 year ago

This smells like authentication issues, but it's also the first time I've heard of that type of reply from the API server (GOAWAY) smile

Need to look into potential causes for that error message.

@alexsomesan After some investigation, it seems to be a feature of the api server that can be used when you have a load balancer and multiple control plane nodes.

As you can see here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

One of the parameters is --goaway-chance float

To prevent HTTP/2 clients from getting stuck on a single apiserver, randomly close a connection (GOAWAY). The client's other in-flight requests won't be affected, and the client will reconnect, likely landing on a different apiserver after going through the load balancer again. This argument sets the fraction of requests that will be sent a GOAWAY. Clusters with single apiservers, or which don't use a load balancer, should NOT enable this. Min is 0 (off), Max is .02 (1/50 requests); .001 (1/1000) is a recommended starting point.

I only get this error on kubernetes_manifest resources though, so maybe it needs deeper investigation

aaj-synth commented 1 year ago

^ We're getting the same error but for other resources! Has there been a fix for this?

santimar commented 1 year ago

@aaj-synth I was able to fix this error by using multiple apiservers and putting a load-balancer in front of the cluster, but also the --goaway-chance 0 should work. I know it's not the fix you are looking for, but it works for now.

github-actions[bot] commented 2 weeks ago

Marking this issue as stale due to inactivity. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. This helps our maintainers find and focus on the active issues. Maintainers may also remove the stale label at their discretion. Thank you!