terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources 🇺🇦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.48k stars 4.09k forks source link

Error: default/inflate failed to fetch resource from kubernetes: client rate limiter Wait returned an error: context deadline exceeded #3208

Open FromOopsToOps opened 1 week ago

FromOopsToOps commented 1 week ago

Description

I'm trying to use this module to install Karpenter on my new cluster, while still using managed nodes (single node group). I'm getting this error:

18:57:09.709 STDERR terraform: ╷ 18:57:09.709 STDERR terraform: │ Error: default/inflate failed to fetch resource from kubernetes: client rate limiter Wait returned an error: context deadline exceeded 18:57:09.709 STDERR terraform: │ 18:57:09.709 STDERR terraform: │ with kubectl_manifest.karpenter_example_deployment, 18:57:09.709 STDERR terraform: │ on main.tf line 237, in resource "kubectl_manifest" "karpenter_example_deployment": 18:57:09.709 STDERR terraform: │ 237: resource "kubectl_manifest" "karpenter_example_deployment" { 18:57:09.709 STDERR terraform: │ 18:57:09.709 STDERR terraform: ╵

It stuck at kubectl_manifest.karpenter_example_deployment for almost 10 minutes.

Versions

Reproduction Code [Required]

created karpenter-values.yaml with the content: `controller: replicaCount: 1 serviceAccount: create: true provisioners:

General node pool, to handle every work load that isn't REDACTED.

- name: general-pool
  capacity: 3
  requirements:
    - key: "karpenter.sh/provisioner-name"
      operator: In
      values: ["general-pool"]
  provider:
    instanceTypes: ["t3a.small", "t3a.medium", "t3a.large", "t3a.xlarge", "t3a.2xlarge"]
    maxCount: 3
    spot: true
  labels:
    pool: "general"

# This is the REDACTED node pool, for specifically the REDACTED workload.
- name: REDACTED-application-pool
  capacity: 3
  requirements:
    - key: "karpenter.sh/provisioner-name"
      operator: In
      values: ["REDACTED-application-pool"]
  provider:
    instanceTypes: ["t3a.2xlarge"]
    minCount: 1
    maxCount: 3
    spot: true
  labels:
    pool: "REDACTED"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: "application-name"
              operator: In
              values:
                - "REDACTED"`

terragrunt.hcl `terraform { source = "tfr:///terraform-aws-modules/eks/aws//examples/karpenter?version=20.29.0" }

dependency "eks" { config_path = "../../us-east-2/config-files/eks" }

inputs = { create_node_iam_role = false node_iam_role_arn = "arn:aws:iam::REDACTED:role/experimental-ng-1-eks-node-group-20241113190238182300000003" cluster_name = dependency.eks.outputs.cluster_name create_access_entry = false namespace = "karpenter"

helm_release = { name = "karpenter" chart = "karpenter/karpenter" version = "v1.0.8" namespace = "karpenter"

# Load the values from the `karpenter-values.yaml` file in the same folder
values = [
  file("karpenter-values.yaml")
]

} tags = { Environment = "experimental" Terraform = "true" } }`

Steps to reproduce the behavior:

create files run terragrunt apply

Expected behavior

Apply until the end and create the resources

Actual behavior

It fails short of completing, missing just these: 19:04:44.668 STDOUT terraform: Terraform used the selected providers to generate the following execution 19:04:44.668 STDOUT terraform: plan. Resource actions are indicated with the following symbols: 19:04:44.668 STDOUT terraform: ~ update in-place 19:04:44.668 STDOUT terraform: -/+ destroy and then create replacement 19:04:44.668 STDOUT terraform: Terraform will perform the following actions: 19:04:44.668 STDOUT terraform: # helm_release.karpenter will be updated in-place 19:04:44.668 STDOUT terraform: ~ resource "helm_release" "karpenter" { 19:04:44.668 STDOUT terraform: id = "karpenter" 19:04:44.668 STDOUT terraform: name = "karpenter" 19:04:44.668 STDOUT terraform: ~ repository_password = (sensitive value) 19:04:44.668 STDOUT terraform: # (29 unchanged attributes hidden) 19:04:44.668 STDOUT terraform: } 19:04:44.669 STDOUT terraform: # kubectl_manifest.karpenter_example_deployment is tainted, so must be replaced 19:04:44.669 STDOUT terraform: -/+ resource "kubectl_manifest" "karpenter_example_deployment" { 19:04:44.669 STDOUT terraform: ~ id = "/apis/apps/v1/namespaces/default/deployments/inflate" -> (known after apply) 19:04:44.669 STDOUT terraform: ~ live_manifest_incluster = (sensitive value) 19:04:44.669 STDOUT terraform: ~ live_uid = "fc4693a4-7ab9-4afb-9bad-d20c1943900a" -> (known after apply) 19:04:44.669 STDOUT terraform: name = "inflate" 19:04:44.669 STDOUT terraform: + namespace = (known after apply) 19:04:44.669 STDOUT terraform: ~ uid = "fc4693a4-7ab9-4afb-9bad-d20c1943900a" -> (known after apply) 19:04:44.669 STDOUT terraform: ~ yaml_incluster = (sensitive value) 19:04:44.669 STDOUT terraform: # (11 unchanged attributes hidden) 19:04:44.669 STDOUT terraform: } 19:04:44.670 STDOUT terraform: Plan: 1 to add, 1 to change, 1 to destroy.

Additional context

I'm designing a new cluster for the company that will use karpenter, it's my first time using terragrunt and up until now everything has worked flawlessly but this. I got the managed node group going but still need the karpenter to handle the workload.

RHeynsZa commented 6 days ago

Kinda stuck in the same position, but Im just running the Kubernetes Example in this repo.

I run everything, except the karpenter_example_deployment, I leave that commented out

Everything finishes, no issues. When I then add karpenter_example_deployment to my deployment, it will just keep on deploying for 10 minutes, with no end in sight

Same error as noted by @FromOopsToOps

default/inflate failed to fetch resource from kubernetes: client rate limiter Wait returned an error: context canceled

I know it says cancelled, but Im sure i would reach the deadline if I let it run long enough