terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.45k stars 4.06k forks source link

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused #911

Closed vrathore18 closed 2 years ago

vrathore18 commented 4 years ago

I am started getting this issue:

Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/eks/terraform-aws-eks-11.1.0/aws_auth.tf line 62, in resource "kubernetes_config_map" "aws_auth":
  62: resource "kubernetes_config_map" "aws_auth" {

All my code were working fine but as I upgraded my terraform versions, providers version. I started getting above issue.

version on which everything was working: provider:- aws: 2.49 kubernetes: 1.10.0 helm: 0.10.4 eks: 4.0.2

others:- terraform:0.11.13 kubectl: 1.11.7 aws-iam-authenticator:0.4.0-alpha.1

Now my versions terraform:0.12.26 kubectl: 1.16.8 aws-iam-authenticator:0.5.0

eks.yaml

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "12.1.0"

  cluster_name    = var.name
  subnets         = module.vpc.private_subnets
  vpc_id          = module.vpc.vpc_id
  cluster_version = var.cluster_version
  manage_aws_auth = "true"

  kubeconfig_aws_authenticator_additional_args = ["-r", "arn:aws:iam::${var.target_account_id}:role/terraform"]

  worker_groups = [
    {
      instance_type        = var.eks_instance_type
      asg_desired_capacity = var.eks_asg_desired_capacity
      asg_max_size         = var.eks_asg_max_size
      key_name             = var.key_name
    }
  ]

  map_accounts = [var.target_account_id]

  map_roles = [
    {
      rolearn = format("arn:aws:iam::%s:role/admin", var.target_account_id)
      username = format("%s-admin", var.name)
      groups    = ["system:masters"]
    }
  ]

  # don't write local configs, as we do it anyway
  write_kubeconfig      = "false"
}

resource "local_file" "kubeconfig" {
  content  = module.eks.kubeconfig
  filename = "./.kube_config.yaml"
}

In the above code write_kubeconfig = "false" and creating a local file kubeconfig. I am using this file in helm and kubernetes provider.

provider.yaml

`provider "aws" { region = var.region version = "~> 2.65.0"

assume_role { role_arn = "arn:aws:iam::${var.target_account_id}:role/terraform" } }

provider "kubernetes" { config_path = "./.kube_config.yaml" version = "~> 1.11.3" }

provider "helm" { version = "~> 1.2.2"

kubernetes { config_path = "./.kube_config.yaml" } }`

On terraform apply, script is not able to create module.eks.kubernetes_config_map.aws_auth[0]:

I tried some of the suggestion mentioned here but didn't worked for me https://github.com/terraform-aws-modules/terraform-aws-eks/issues/817

dpiddockcmp commented 4 years ago

If you have manage_aws_auth = true then you need to configure the kubernetes provider as per the documentation in the README.

arielvinas commented 4 years ago

I think this is a problem with the k8s provider it self... I have it configured correctly and randomly fails to connect:

https://github.com/hashicorp/terraform/issues/4149 https://github.com/hashicorp/terraform/issues/4149

I found a comment in the provider Golang code that explains the problem: https://github.com/terraform-providers/terraform-provider-kubernetes/blob/master/kubernetes/provider.go#L244

jurgenweber commented 4 years ago

I get this as well, but when I tried to disable the cluster after creation (so destroy it) the plan fails.

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
xsqian commented 4 years ago

I encountered the similar issue: Error: Post "https://0ED4D7D93F983B4B6F3664DA6B0262D0.gr7.us-east-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps": dial tcp: lookup 0ED4D7D93F983B4B6F3664DA6B0262D0.gr7.us-east-2.eks.amazonaws.com on 192.168.86.1:53: no such host Any help would be appreciated!

deanshelton913 commented 4 years ago

@dpiddockcmp It would help if you were just a tad bit more specific.

What precisely in the README's example configuration solves this problem? Is it the version number which is called out explicitly in the readme the important part? Meaning we cant take the latest version for some reason? The concat functions in use? I tried a copy paste of that readme and i get:

No provider "kubernetes" plugins meet the constraint "1.10,>= 1.11.1".
deanshelton913 commented 4 years ago

As a workaround... I was able to use the AWS CLI to write my config, after my first deployment was only partially successful...

aws eks update-kubeconfig --name myApp --region $AWS_DEFAULT_REGION --alias myApp

Putting this ^ before the apply step (obviously only works after a successful creation of the cluster when the update to the aws-auth CM simply failed) worked for me... But if we ever burn the infra to the ground, we need to do this multi step process again.

kunickiaj commented 4 years ago

I get this as well, but when I tried to disable the cluster after creation (so destroy it) the plan fails.

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

Ran into this as well... given the relative instability around the lifecycle of EKS using this module I'm probably going to consider separating it from other infra in the vpc.

dpiddockcmp commented 4 years ago

You need to copy the two data sources and the kubernetes provider block from the usage example. Assuming your module definition was called eks:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  load_config_file       = false
  version                = "~> 1.12"
}
kunickiaj commented 4 years ago

@dpiddockcmp yep I get that, the problem is encountered if you set the create_eks flag to false to destroy the cluster and then set it back to true.

I think I hit some other funkiness where the state file even showed the correct host and CA cert but the provider was using the local host and missing CA entirely.

Will see if I can get a more specific set of repro steps.

cidemaxio commented 4 years ago

I had the same problem. you have to delete the cluster manually because the terraform just says that it already exists instead of deleting then recreating. Once you delete the cluster you get this error.

I resolved the problem by running terraform state rm module.saasoptics.module.eks.kubernetes_config_map.aws_auth[0]

kunickiaj commented 4 years ago

Yeah, looked into this a bit. It's because terraform tries to refresh the config map resource before deleting it -- however the cluster's already been destroyed.

This module essentially needs to ensure that destruction of the config map happens before cluster destruction if that's possible. Otherwise the manual removal of the configmap from the state seems like the best solution here.

An alternative workaround to cleanly remove the cluster (you must not have gotten yourself into the state where you have the localhost error for this to work):

  1. Use target mode to destroy only the EKS cluster: terraform destroy -target module.eks
  2. Subsequently, set the create_eks flag to false after the first step
  3. Run an apply to clean up the old cluster configuration. terraform apply
onprema commented 3 years ago

this fixed it for me

terraform state rm module.eks.kubernetes_config_map.aws_auth

thanks, @cidesaasoptics

schollii commented 3 years ago

I got the same error after the following sequence:

I was able to complete the creation by setting manage_aws_auth=false, and later deleting the map with kubectl. Then I was able to set the flag back to true.

fliphess commented 3 years ago

Same issue here, in my pipeline the kubeconfig is not present during apply as it's a new ci run in a fresh ci container. This results in the provisioner connecting to localhost.

EDIT: The workaround mentioned worked for me too: First apply the pipeline with manage_aws_auth=false. After that you can safely remove the cluster without errors and start over.

andrewalexander commented 3 years ago

if you are deleting and know the config map is gone, creating a listener and giving the terraform client a 204 response also seemed to work to make a stuck terraform destroy proceed happily.

I don't see any real difference in the net effect between this and the terraform state rm module.eks.kubernetes_config_map.aws_auth that also worked (as did running the terraform apply manage_aws_auth=false when creating the resources in the first place)

nc -l 80
DELETE /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
Host: localhost
User-Agent: HashiCorp/1.0 Terraform/0.14.2
Content-Length: 43
Accept: application/json, */*
Content-Type: application/json
Accept-Encoding: gzip

{"kind":"DeleteOptions","apiVersion":"v1"}
HTTP/1.1 204 OK
module.eks-cluster.module.eks.kubernetes_config_map.aws_auth[0]: Destruction complete after 2m30s

Destroy complete! Resources: 1 destroyed.
luizmiguelsl commented 3 years ago

I had the same problem. you have to delete the cluster manually because the terraform just says that it already exists instead of deleting then recreating. Once you delete the cluster you get this error.

I resolved the problem by running terraform state rm module.saasoptics.module.eks.kubernetes_config_map.aws_auth[0]

That's it. In my case first I ran terraform state list to see what was stored in my state and got module.your-module-name.kubernetes_config_map.aws_auth[0]. After this i just ran terraform state rm module.your-module-name.kubernetes_config_map.aws_auth[0] and now I'm able to run plans and applies again. Thanks

leiarenee commented 3 years ago

I guess, root cause of the problem is insufficient validity time out value of cluster token which is only 15 minutes or very long cluster creation times of EKS which sometimes take more than 15 minutes, which ever you choose. If internal data resources such as aws_eks_cluster_auth is used to store cluster token you end up with following problems.

How ever In those cases token is invalidated and have to be refreshed, but in the same terraform file this is not possible because data refresh occurs at the very beginning of the TF process.

My final Workaround to guarantee "Unauthorized" error free updates

I'm using Terragrunt to overcome terraforms many of those weaknesses such as dynamic dependencies, counts... which should be calculated before TF is applied

I changed K8s provider authorization configuration as follows enabling kubeconfig file instead of using internal token mechanism.

data "aws_eks_cluster" "cluster" {
  count = var.cluster_id != "" ? 1 : 0
  name  = var.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  count = var.cluster_id != "" ? 1 : 0
  name  = var.cluster_id
}

provider "kubernetes" {
  config_path = "${path.module}/.kubeconfig"
  config_context =  element(concat(data.aws_eks_cluster.cluster[*].arn, list("")), 0)

  // host                   = element(concat(data.aws_eks_cluster.cluster[*].endpoint, list("")), 0)
  // cluster_ca_certificate = base64decode(element(concat(data.aws_eks_cluster.cluster[*].certificate_authority.0.data, list("")), 0))
  // token                  = element(concat(data.aws_eks_cluster_auth.cluster[*].token, list("")), 0)
  // load_config_file       = false
}

Notice that data resource aws_eks_cluster_auth is no longer used. Instead a local kubeconfig file is used for authorization.

For every request which is sent to kubernetes api: In before hook section of terragrunt.hcl file I run following command:

aws --profile $profile eks update-kubeconfig --kubeconfig .kubeconfig --name $cluster > /dev/null 2>&1 || true

Which:

Where:

This solution worked well for my deployments which are held separately outside of the terraform-aws-eks module as separate TG folders which depends on cluster creation TG folder. However it did not work for internal config map update of this module since before creation there is no cluster and provider is refreshed for that empty state.

POSSIBLE SOLUTION: disable config map in this module : manage_aws_auth = false

Then take resource "kubernetes_config_map" "aws_auth" from aws_auth.tf in this module and apply it as separate terragrunt file creating a dependency block to terraform-aws-eks TG folder. Create outputs from original library to be consumed in your new aws_auth TG file.

New TF file to be excuted after cluster creation.

resource "kubernetes_config_map" "aws_auth" {
  #count      = var.create_eks ? 1 : 0

  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
    labels = merge(
      {
        "app.kubernetes.io/managed-by" = "Terraform"
        "terraform.io/module" = "terraform-aws-modules.eks.aws"
      },
      var.aws_auth_config.additional_labels
    )
  }

  data = var.aws_auth_config.data
}

variable "aws_auth_config" {
  description = "aws_auth_config data"
  type        = any
}

This should be added as extra_outputs.tf in your original TG environment which creates cluster.

output "aws_auth_config" {
  value = {
    additional_labels = var.aws_auth_additional_labels
    data = {
      mapRoles = yamlencode(
        distinct(concat(
          local.configmap_roles,
          var.map_roles,
        ))
      )
      mapUsers    = yamlencode(var.map_users)
      mapAccounts = yamlencode(var.map_accounts)
    }
  }
}

terragrunt.hcl for applying aws_auth seperately

locals {
  profile="<my_aws_profile>"
  cluster_name="<my-cluster-name>"
}

terraform {
  source = "${get_parent_terragrunt_dir()}/modules/terraform-aws-eks-auth"
  before_hook "refresh_kube_token" {
    commands     = ["apply", "plan","destroy","apply-all","plan-all","destroy-all","init", "init-all"]
    execute      = ["aws", "--profile", local.profile, "eks", "update-kubeconfig", "--kubeconfig", ".kubeconfig", "--name", local.cluster_name]
   }
}

# Inputs to passed to the TF module
inputs = {
  aws_auth_config = dependency.cluster.outputs.aws_auth_config
}

dependency "cluster" {
  config_path = "../cluster"  
}

Output:

[terragrunt] 2021/01/18 13:16:08 Executing hook: refresh_eks_token
[terragrunt] 2021/01/18 13:16:08 Running command: aws --profile my-profile eks update-kubeconfig --kubeconfig .kubeconfig --name my-cluster
Updated context arn:aws:eks:eu-central-1:767*****7216:cluster/my-cluster in /Users/***/dev/repo/live/infrastructure/.terragrunt-cache/767****216/dynamic/eu-central-1/shared/k8s/auth/zlnrmfl5SmuqAD673E7b9AUFDew/NYFQi6hkBT1xdxho53XJtCNnpYs/.kubeconfig

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kubernetes_config_map.aws_auth will be created
  + resource "kubernetes_config_map" "aws_auth" {
      + data = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::760******216:role/my-cluster20210118081306354500000009"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:masters"
                  "rolearn": "arn:aws:iam::292******551:role/MyTestRole"
                  "username": "MyTestRole"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + id   = (known after apply)

      + metadata {
          + generation       = (known after apply)
          + labels           = {
              + "app.kubernetes.io/managed-by" = "Terraform"
              + "terraform.io/module"          = "terraform-aws-modules.eks.aws"
            }
          + name             = "aws-auth"
          + namespace        = "kube-system"
          + resource_version = (known after apply)
          + self_link        = (known after apply)
          + uid              = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

kubernetes_config_map.aws_auth: Creating...
kubernetes_config_map.aws_auth: Creation complete after 1s [id=kube-system/aws-auth]

Conclusion: I hope in the future this token refreshing problem is solved in kubernetes provider natively, but until then you can use this hack to overcome the problem. Terraform is all about using such dirty hacks and finding workarounds, right? :-)

Happy Coding

spkane commented 3 years ago

This also happens when the EKS cluster is deleted out from under terraform, since it is try to talk to the K8S api endpoint which no longer exists. I have seen this is some dev workflows. The command from above terraform state rm module.eks.kubernetes_config_map.aws_auth will generally allow terraform commands to run correctly again.

Vlaaaaaaad commented 3 years ago

I can confirm successful creation and destroys and everything using the v14.0.0 of this module, with Terrafrom 0.14.4, AWS Provider v3.26.0, and terrafrom-provider-kubernetes v2.0.1.

In the new v2 of the Kubernetes provider, there is a dedicated example on how to use it with EKS, which I just copy/pasted πŸ™‚

bohdanyurov-gl commented 3 years ago

Unfortunately I've just hit the same issue with module version v14.0.0 and terraform 0.14.5. Still trying to find a fix.

Deleting kubernetes_config_map from the state doesn't work.

matthewmrichter commented 3 years ago

I figured out how to pin the version of the Kube provider in Terraform 14:

  1. Remove the provider "registry.terraform.io/hashicorp/kubernetes" { ... } block in .terraform.lock.hcl if it's there
  2. Add the following your top-level terraform { ...} block:
    required_providers {
    kubernetes = {
      source  = "registry.terraform.io/hashicorp/kubernetes"
      version = "~> 1.0"
    }
    }
  3. re-init your terraform

terraform-aws-modules seems to behave much better after.

acevedomiguel commented 3 years ago

It happened to me, after I destroyed the cluster successfully only the configmap resource is still there, then when I try to run the terraform destory again:

terraform destroy

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # module.eks.kubernetes_config_map.aws_auth[0] will be destroyed
  - resource "kubernetes_config_map" "aws_auth" {
      - binary_data = {} -> null
      - data        = {
          - "mapAccounts" = jsonencode([])
          - "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws-cn:iam::XXXXXXX:role/eks-cluster-XXXXX"
                  "username": "system:node:{{EC2PrivateDNSName}}"
            EOT
          - "mapUsers"    = jsonencode([])
        } -> null
      - id          = "kube-system/aws-auth" -> null

      - metadata {
          - annotations      = {} -> null
          - generation       = 0 -> null
          - labels           = {
              - "app.kubernetes.io/managed-by" = "Terraform"
              - "terraform.io/module"          = "terraform-aws-modules.eks.aws"
            } -> null
          - name             = "aws-auth" -> null
          - namespace        = "kube-system" -> null
          - resource_version = "xxx" -> null
          - uid              = "xxxxxxx" -> null
        }
    }

Plan: 0 to add, 0 to change, 1 to destroy.

The cluster doesn't exist anymore so will never succeed the destroy.

I manually removed (https://www.terraform.io/docs/cli/commands/state/rm.html)

terraform state rm module.eks.kubernetes_config_map.aws_auth

schollii commented 3 years ago

Same thing here and manual remove worked. I wonder if a depends_on is missing.

adv4000 commented 3 years ago

Same here, fixed by manually removing state terraform state rm module.eks.module.eks-cluster.kubernetes_config_map.aws_auth[0]

annyip commented 3 years ago

so does anyone know the root cause of this? I've seen this issue on POST and DELETE for the configmap

acim commented 3 years ago

https://github.com/terraform-aws-modules/terraform-aws-eks/blob/e5d26e1dcc41f859eb8d2be16460fd3b5b016412/docs/faq.md#configmap-aws-auth-already-exists

Error: Get http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth: dial tcp 127.0.0.1:80: connect: connection refused

Usually this means that the kubernetes provider has not been configured, there is no default ~/.kube/config and so the kubernetes provider is attempting to talk to localhost.

It looks to me that the problem comes when you have more clusters defined in ~/.kube/config. It seems this module ignores the current-context and then fails to read the configuration properly. I use this module for quite a long time and upgraded it lot of times but I never had kubernetes provider and it should continue working without it. It should just read the ~/.kube/config and respect the current context or something like that. I also renamed context name in my kube config, this may be another reason, but still if this module reads the current context it should have correct data. This may be general terraform problem, though, maybe not this module.

schollii commented 3 years ago

@acim interesting as long as you configure kubernetes provider to use the context that corresponds to the terraform config files not just the current context of kubectl (which could lead to modifying kubernetes resources in wrong kubernetes cluster). Eg

data "aws_eks_cluster" "cluster" {
  name  = module.eks.eks_cluster_id
}

provider "kubernetes" {
  config_path = "path/to/.kube/config"
  config_context =  data.aws_eks_cluster.cluster.arn 
}
acim commented 3 years ago

@acim interesting as long as you configure kubernetes provider to use the context that corresponds to the terraform config files not just the current context of kubectl (which could lead to modifying kubernetes resources in wrong kubernetes cluster). Eg

data "aws_eks_cluster" "cluster" {
  name  = module.eks.eks_cluster_id
}

provider "kubernetes" {
  config_path = "path/to/.kube/config"
  config_context =  data.aws_eks_cluster.cluster.arn 
}

This makes sense, thank you :)

lpkirby commented 3 years ago

I figured out how to pin the version of the Kube provider in Terraform 14:

1. Remove the `provider "registry.terraform.io/hashicorp/kubernetes" { ... }`  block in `.terraform.lock.hcl` if it's there

2. Add the following your top-level `terraform { ...}` block:
  required_providers {
    kubernetes = {
      source  = "registry.terraform.io/hashicorp/kubernetes"
      version = "~> 1.0"
    }
  }
1. re-init your terraform

terraform-aws-modules seems to behave much better after.

Thank you @matthewmrichter. This solved my problems too.

icco commented 3 years ago

My fix for this was I had deleted the provider configuration from my module. So k8s was trying to talk to localhost instead of the cluster.

Adding

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}

fixed it for me.

ptc-mrucci commented 3 years ago

While troubleshooting the issue I observed the following:

I'm still not sure there's a way to reliably prevent it in multi-cluster kube config files.

daroga0002 commented 3 years ago

The best is not rely on kubeconfig but rather use some configuration similar to this:

data "aws_eks_cluster" "cluster" {
  count = var.eks_create ? 1 : 0
  name  = module.eks_cluster.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  count = var.eks_create ? 1 : 0
  name  = module.eks_cluster.cluster_id
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster[0].endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster[0].certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster[0].token
}

where you using token and cluster_ca_certificate directly from module outputs

ptc-mrucci commented 3 years ago

Thanks, will consider changing to that, although it's surprising that the simple config_path and config_context options, as in the example from the official docs is not reliably usable.

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "my-context"
}
daroga0002 commented 3 years ago

Thanks, will consider changing to that, although it's surprising that the simple config_path and config_context options, as in the example from the official docs is not reliably usable.

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "my-context"
}

issue with this approach is that you must set your cluster/context before running terraform, so most probably your scenario is:

  1. you created EKS < everything was working
  2. you start playing with minikube or other kubernetes cluster
  3. you tried change something via terraform in EKS

so between step 2 and 3 you changed kubernetes context/cluster so terraform provider relying on kubeconfig trying to connect to cluster set there in line:

current-context: arn:aws:eks:us-east-1:REDACTED:cluster/eks-cluster

which doesn't reflect your EKS which you trying to modify

ptc-mrucci commented 3 years ago

I don't think the problem is a mismatch between the current-context set in kubeconfig and the "config_context" in terraform. What would be the point of specifying config_context otherwise?

I also just tried setting different contexts in kubeconfig and terraform in a multi-cluster configuration and could not reproduce the issue.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

matthewmrichter commented 3 years ago

/remove_stale

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue was automatically closed because of stale in 10 days

github-actions[bot] commented 2 years ago

This issue was automatically closed because of stale in 10 days

Clee681 commented 2 years ago

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused

Any ideas why I'm getting this error when trying to destroy the cluster?

acemasterjb commented 2 years ago

Okay so I'm using Terratest and I get this issue sometimes, I've now fixed it for the second time so I'll share my solution here since I have not seen a solution for persons using Terratest.

So for those unfamiliar, Terratest is a testing framework that automates testing IaC focusing on Terraform and Container Orchestration services.

A common testing practice is to add in a unique identifier to the Terratest scripts to make every EKS cluster created unique. This ensures that if multiple tests are run, they can be run in isolation with each other, creating new EKS clusters with unique names every time a test is ran. They call this namespacing in the Terratest docs.

Anyway, if something goes wrong with the cleanup then you probably haven't been able to destroy the EKS cluster and tried to do a terraform destroy that lead you here.

What you need to do is go to the EKS dashboard on aws console, copy the name of the cluster, and paste it in your terraform HCL files/template files in the cluster_name property of your eks module or variables.tf file.

TL;DR: Once the EKS cluster name on the AWS dashboard is the same as the one in your cluster_name in your EKS *.tf file then you can run terraform destroy again.

julian-berks-2020 commented 2 years ago

I also found that eks-k8s-role-mapping doesn't work. It immediately fails with β”‚ Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused

My fix is to wait for it to fail (not ideal), then create a ~/.kube/config using kubergrunt eks configure --eks-cluster-arn ""

Then adding the following (per a suggestion above) seems to solve it. provider "kubernetes" { config_path = "~/.kube/config" }

But of course this isn't possible until the cluster is built so not really ideal. Still, at least my instances finally connect to the cluster

ptc-mrucci commented 2 years ago

If it helps anybody, the root cause of my issue was due to differences between the context name (alias) and the cluster name.

In particular:

Shouldn't there be a clear failure when the provider references a non existent context or cluster? This would mirror kubectl behaviour:

$ kubectl --context INEXISTENT_CONTEXT get svc 
Error in configuration: context was not found for specified context: INEXISTENT_CONTEXT
$ kubectl --cluster INEXISTENT_CLUSTER get svc       
error: no server found for cluster "INEXISTENT_CLUSTER"
icicimov commented 2 years ago

This:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

is still not working even with

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.75.2"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.11.0"
    }
}
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"
}

It errors with:

β•·
β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
β”‚ 
β”‚   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 437, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  437: resource "kubernetes_config_map_v1_data" "aws_auth" {
β”‚ 
β•΅

I can see the issue was just left to expire due to inactivity, wasn't this a bug worth attention?

UPDATE: Some observations

I don't have any issues upon initial terraform plan + terraform apply nor deleting the cluster for as long as I don't change anything in EKS module (managed node group) like subnet_ids for example. My module call:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"

  cluster_name                    = var.k8s_cluster
  cluster_version                 = var.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  cluster_endpoint_public_access_cidrs = var.cluster_endpoint_public_access_cidrs

  cluster_enabled_log_types       = var.cluster_enabled_log_types

  vpc_id                          = module.vpc.vpc_id
  subnet_ids                      = flatten([for i in range(var.vpc["priv_subnet_sets"]) : module.private-subnets[i].subnet_ids])

  manage_aws_auth_configmap = true
  aws_auth_roles            = concat(local.admin_user_map_roles, local.developer_user_map_roles)
}

and if I change the subnet_ids value for example once the cluster has been created like:

subnet_ids                      = module.private-subnets[1].subnet_ids

in order to trigger a change I get the above error. Until then everything is fine.

For the record, using the eks provider in token version like above or exec variant like:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

makes no difference I get the same error in both cases.

EnriqueHormilla commented 2 years ago

This:

data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  token                  = data.aws_eks_cluster_auth.cluster.token
}

is still not working even with

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.75.2"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.11.0"
    }
}
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"
}

It errors with:

β•·
β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
β”‚ 
β”‚   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 437, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  437: resource "kubernetes_config_map_v1_data" "aws_auth" {
β”‚ 
β•΅

I can see the issue was just left to expire due to inactivity, wasn't this a bug worth attention?

UPDATE: Some observations

I don't have any issues upon initial terraform plan + terraform apply nor deleting the cluster for as long as I don't change anything in EKS module (managed node group) like subnet_ids for example. My module call:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 18.24.1"

  cluster_name                    = var.k8s_cluster
  cluster_version                 = var.cluster_version

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = true
  cluster_endpoint_public_access_cidrs = var.cluster_endpoint_public_access_cidrs

  cluster_enabled_log_types       = var.cluster_enabled_log_types

  vpc_id                          = module.vpc.vpc_id
  subnet_ids                      = flatten([for i in range(var.vpc["priv_subnet_sets"]) : module.private-subnets[i].subnet_ids])

  manage_aws_auth_configmap = true
  aws_auth_roles            = concat(local.admin_user_map_roles, local.developer_user_map_roles)
}

and if I change the subnet_ids value for example once the cluster has been created like:

subnet_ids                      = module.private-subnets[1].subnet_ids

in order to trigger a change I get the above error. Until then everything is fine.

For the record, using the eks provider in token version like above or exec variant like:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

makes no difference I get the same error in both cases.

Same bug here, when I tried to change the subnets for the eks, terraform plan want to replace the eks, but finally throw this error.

β•·
β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
β”‚ 
β”‚   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 437, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  437: resource "kubernetes_config_map_v1_data" "aws_auth" {}
β”‚ 
β•΅

I'm testing with the last version currently available, "18.26.6" , can reproduce the error using the example on the repo: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/v18.26.6/examples/complete/main.tf @bryantbiggs it's a bug about my terraform config or a bug about the module?

bryantbiggs commented 2 years ago

changing subnet Ids on the cluster is a destructive operation. This is not controlled by the module but the AWS EKS API

a0s commented 2 years ago

I dont know why this issue is closed, cause the error is still here.

I've got Error: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused in the process of cluster creation (from scratch). I have this in config:

createAwsAuthConfigmap: true,
manageAwsAuthConfigmap: true,
awsAuthRoles: [],
awsAuthUsers: [],

I'am creating a cluster under assumed role's aws provider.

I'm very curious why it tries connect to localhost?

bryantbiggs commented 2 years ago

@a0s its not a module issue, its a mix of provider + user configuration issue

https://github.com/hashicorp/terraform-provider-kubernetes/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+localhost

icicimov commented 2 years ago

changing subnet Ids on the cluster is a destructive operation. This is not controlled by the module but the AWS EKS API

@bryantbiggs thanks for still looking at this and trying to help. Does this mean we need to raise this issue with AWS EKS team? Because from terraform and this module's perspective even in a case of destructive action like destroying and re-creating the cluster we as users expect terraform plan to run successfully and tell us all about it in the plan so we can decide to apply the changes or not. Instead we are seeing this cryptic error message about trying to connect to localhost.