terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.45k stars 4.06k forks source link

dial tcp 127.0.0.1:80: connect: connection refused #2007

Closed ArchiFleKs closed 1 year ago

ArchiFleKs commented 2 years ago

Description

I know there are numerous issues (#817) related to this problem, but since v18.20.1 reintroduced the management of configmap thought we could discuss in a new one because the old ones are closed.

The behavior is till very weird. I updated my module to use the configmap management feature and the first run went fine (was using the aws_eks_cluster_auth datasource. When I run the module with no change I have no error either in plan or apply.

I then tried to update my cluster form v1.21 to v1.22 and then plan and apply began to fail with the following well know error:

null_resource.node_groups_asg_tags["m5a-xlarge-b-priv"]: Refreshing state... [id=7353592322772826167]                                                                                                                                    
β•·                                                                                                                                                                                                                                        
β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused                                                                                                    
β”‚                                                                                                                                                                                                                                        
β”‚   with kubernetes_config_map_v1_data.aws_auth[0],                                                                                                                                                                                      
β”‚   on main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":                                                                                                                                                         
β”‚  428: resource "kubernetes_config_map_v1_data" "aws_auth" {                                                                                                                                                                            
β”‚                                                                                                                                                                                                                                        
β•΅                                                           

I then moved to the exec plugin as recommended per the documentation and removed from state the old datasource. Still go the same error.

Something I don't get is when setting the variable export KUBE_CONFIG_PATH=$PWD/kubeconfig as suggested in #817 things work as expected.

I'm sad to see things are still unusable (not related to this module but on the Kubernetes provider side), load_config_file option has been removed from Kubernetes provider for a while and I don't see why this variable needs to be set and how it could be set beforehand.

Anyway, if someone managed to use the readded feature of managing configmap I'd be glad to know how to workaround this and help debug this issue.

PS: I'm using Terragrunt, not sure if the issue could be related but it might

Versions

Terraform v1.1.7
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.9.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.10.0
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/tls v3.3.0

Reproduce

Here is my provider block

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster.id]
  }
}

data "aws_eks_cluster" "cluster" {
  name = aws_eks_cluster.this[0].id
}
PLeS207 commented 2 years ago

I have the same issue but when I work with state with another AWS user , I'm got error like

Error: Unauthorized  

with module.eks.module.eks.kubernetes_config_map.aws_auth[0],   
on .terraform/modules/eks.eks/main.tf line 411, in resource "kubernetes_config_map" "aws_auth":
411: resource "kubernetes_config_map" "aws_auth" {
FeLvi-zzz commented 2 years ago

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name?

I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

bryantbiggs commented 2 years ago

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name?

I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

not quite true - if the data source fails to find a result, its a failure not indeterminate.

@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}
sergiofteixeira commented 2 years ago

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name? I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

not quite true - if the data source fails to find a result, its a failure not indeterminate.

@ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

you cant run these in tf cloud though, cause of the local exec

bryantbiggs commented 2 years ago

Would you try replacing aws_eks_cluster.this[0].id with the hard coded cluster name? I guess aws_eks_cluster.this[0].id would be known after apply because you're going to bump up EKS cluster version. That's why the data resource is indeterminate, and kubernetes provider will fallback to default 127.0.0.1:80.

not quite true - if the data source fails to find a result, its a failure not indeterminate. @ArchiFleKs you shouldn't need the data source at all; does this still present the same issue?

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

you cant run these in tf cloud though, cause of the local exec

This is just merely pointing to what the Kubernetes provider documentation specifies. The module doesn't have any influence over this aspect

ArchiFleKs commented 2 years ago

I can confirm that this snippet works as expected without the datasource:

provider "kubernetes" {
  host                   = aws_eks_cluster.this[0].endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.this[0].certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", aws_eks_cluster.this[0].id]
  }
}
bryantbiggs commented 2 years ago

I know Hashi are hiring and have made some hires to start offering more support to the Kubernetes and Helm providers recently so hopefully some of these quirks get resolved soon! for now, we can just keep sharing what others have found to have worked for their setups πŸ€·πŸ½β€β™‚οΈ

evenme commented 2 years ago

Unfortunately, it doesn't seem to work with tf-cloud (it gets the Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused error), I locked the module on v18.19 so it still works.

evenme commented 2 years ago

Apparently using kubectl provider instead of kubernetes provider (even completely removing it) made it work with terraform-cloud πŸ€·β€β™€οΈ :

provider "kubectl" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  token                  = data.aws_eks_cluster_auth.cluster.token
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

but unfortunately this got the previously working aws-auth deleted and was not able to create one Error: The configmap "aws-auth" does not exist... :|

MadsRC commented 2 years ago

I just ran into this while debugging an issue during redeployment of a cluster. I'm not sure exactly how it happened, but we ended up in a state where the cluster had been destroyed, which caused terraform to not be able to connect to the cluster (duh...) using the provider and such defaulted to 120.0.0.1 when trying to touch the config map...

As mentioned, I'm not sure exactly how it ended up in that state, but it got so bad that I'd get this dial tcp 127.0.0.1:80: connect: connection refused error on terraform plan even with all references to the config map removed. Turns out there was still a reference to the config map in the state file, so removing that using terraform state rm module.eks.this.kubernetes_config_map_v1_data.aws_auth allowed me to redeploy...

Maybe not applicable to most of you, but hopefully it's useful for someone in the future :D

bryantbiggs commented 2 years ago

hey all - let me know if its still worthwhile to leave this issue open. I don't think there is anything further we can do here in this module to help alleviate any of the issues shown - there seems to be some variability in terms of what works or does not work for folks. I might be biased, but I think the best place to look at sourcing some improvements/resolution would be upstream with the other providers (Kubernetes, Helm, Kubectl, etc.)

kaykhancheckpoint commented 2 years ago

I'm also experiencing this, in the meantime are there any work arounds?

Im experiencing the same problem with the latest version. Initial creation of cluster worked fine but trying to update any resources after creation i get the same error.

β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
β”‚
β”‚   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on .terraform/modules/eks/main.tf line 431, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  431: resource "kubernetes_config_map_v1_data" "aws_auth" {
β”‚

Same as the example below except i had multiple profiles on my machine and had to specify the profile. https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/eks_managed_node_group/main.tf#L5-L15

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--profile", "terraformtest"]
  }
}
DimamoN commented 2 years ago

Faced the same, then checked state using terraform state list and found k8s related entries there. Then I removed then using

terraform state rm module.eks.kubernetes_config_map.aws_auth[0]

And that helped to resolve the issue.

kaykhancheckpoint commented 2 years ago

The previous suggestions didin't work for me (maybe i misunderstood something)

  1. export KUBE_CONFIG_PATH=$PWD/kubeconfig

This kubeconfig does not appear to exist in my current path...

  1. Deleting the datasource

The latest version of this example and module does not use a datasource, instead just uses module.eks.cluster_id but still get this error.


i ended up deleting the aws_auth from the state, it allowed me to continue/resolve the connection refused problem.

terraform state rm 'module.eks.kubernetes_config_map_v1_data.aws_auth[0]'

I don't know what the implications of rm'ing this state has, is it safe to keep removing this state whenever we encounter this error?.

FernandoMiguel commented 2 years ago

a brand new cluster and tf state, eks 1.22

terraform {
  required_version = ">= 1.1.8"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 4.9"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.10"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = ">= 1.13.1"
    }
  }
}

provider "aws" {
  alias  = "without_default_tags"
  region = var.aws_region
  assume_role {
    role_arn = var.assume_role_arn
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}
locals {
  ## strips 'aws-reserved/sso.amazonaws.com/' from the AWSReservedSSO Role ARN
  aws_iam_roles_AWSReservedSSO_AdministratorAccess_role_arn_trim = replace(one(data.aws_iam_roles.AWSReservedSSO_AdministratorAccess_role.arns), "/[a-z]+-[a-z]+/([a-z]+(\\.[a-z]+)+)\\//", "")

  aws_auth_roles = concat([
    {
      rolearn  = data.aws_iam_role.terraform_role.arn
      username = "terraform"
      groups   = ["system:masters"]
    },
    {
      rolearn  = local.aws_iam_roles_AWSReservedSSO_AdministratorAccess_role_arn_trim
      username = "sre"
      groups   = ["system:masters"]
    }
  ],
    var.aws_auth_roles,
  )
}
  # aws-auth configmap
  create_aws_auth_configmap = var.self_managed_node_groups != [] ? true : null
  manage_aws_auth_configmap = true
  aws_auth_roles            = local.aws_auth_roles
  aws_auth_users            = var.aws_auth_users
  aws_auth_accounts         = var.aws_auth_accounts

leads to:

β”‚ Error: Unauthorized
β”‚
β”‚   with module.eks.module.eks.kubernetes_config_map.aws_auth[0],
β”‚   on .terraform/modules/eks.eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth":
β”‚  414: resource "kubernetes_config_map" "aws_auth" {

any ideas @bryantbiggs ? thanks in advance.

mebays commented 2 years ago

@FernandoMiguel I'm seeing something similar in a configuration I'm working with. After some time of thought I believe you'll need to add the Assumed role to your configuration

provider "aws" {
  alias  = "without_default_tags"
  region = var.aws_region
  assume_role {
    role_arn = var.assume_role_arn
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id,"--role", var.assume_role_arn]
  }
}

Sadly this isn't a solution for me. The configuration I'm working with uses dynamic credentials fed in.

Something along the lines...

provider "aws" {
  access_key = <access_key>
  secret_key = <secret_key>
  token = <token>
  region = <region>
}

This is useful if doing something where a temporary vm or container or tfe is running the terraform execution

Going down this route the provider is getting fed the information for connection and used entirely within the provider context (no aws config process was ever used).

The problem is none of that data is stored or carried over, so when the kubernetes provider tries to run the exec it's going to default to the methods the aws cli uses (meaning a locally store config in ~/.aws/config or ~/.aws/credentials). In my case that doesn't exist.

@FernandoMiguel it looks like your are presumably using a ~/.aws/config, so passing the assumed role and possibly the profile (if not using a default) should help move that forward. I cannot guarantee it will fix it, but that would be the theory.

FernandoMiguel commented 2 years ago

No config and no aws creds hardcoded. Everything is assume role from a global var. This works on hundreds of our projects.

FernandoMiguel commented 2 years ago

If you mean the cli exec, that's running from aws-vault exec --server

mebays commented 2 years ago

@FernandoMiguel Hmm well that's interesting. I was able to get a solution to work for me.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws-iam-authenticator"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["token", "-i, module.eks.cluster_id]
  }
}

This seemed to work for me, but I also had to expose my endpoint to be public for the first run. Our network configuration was locked down too tightly for our remote execution server to hit the endpoint. That could be something else you make sure you are hitting.

If you mean the cli exec, that's running from aws-vault exec --server

What I meant was if credentials are being passed to the aws provider than I would necessarily see them being passed to the kubernetes provider. Some trouble shooting you could try it TF_LOG=debug terraform plan ... in order to get more information if you haven't tried that. If you really wanted to test if the kubernetes exec works spin up a vm or container pass the credentials and see if that carries over.

If my guess it correct than a way around it would be creating a ~/.aws/credentials file using a null resource and template out configuration that aws eks get-token can then reference.

The thought process I am having is the data being passed into the kubernetes provider contains no information about aws configuration. So I would expect it to fail if the instance running the terraform didn't have the aws cli configured.


Further thought if the remote execution tool being used doesn't have an ~/.aws/config but running inside an instance with an IAM role attached to it. Then it would default to that IAM role, so then it could still work as long as that IAM role has the ability to assume the role.

mebays commented 2 years ago

@bryantbiggs I think the thought process I had from above just reassures your comment. I don't think there is anything in this module that can be done to fix this. I do have a suggestion of not completely remove the aws_auth_configmap_yaml output unless you have other solutions coming up. The reasoning is I could see a use case where terraform is ran to provision private cluster which may or may not be running on an instance that can reach that endpoint. If it isn't the aws_auth_configmap_yaml can be used in a completely separate process to hit the private cluster endpoint. It all depends on how separation of duties may come into play (a person to provision, and maybe a person to configure). It's just a thought.

FernandoMiguel commented 2 years ago

I would love to know what isn't working here. I spent a large chunk of this week trying every combo I could think to get this to work, without success. Different creds for the kube provider, different parallelism settings, recreating the code outside of the module so it would run after the eks cluster module had finished, etc.. I would always get either authentication error, that the config map didn't exist or that it couldn't create it. Very frustrating.

If we were to keep the now deprecated output, I can at least revert my internal PR and keep using that old and terrible null exec code to patch the config map.

tanvp112 commented 2 years ago

The problem might be the terraform-provider-kubernetes and not terraform-aws-eks, eg. https://github.com/hashicorp/terraform-provider-kubernetes/issues/1479, ... more about localhost connection refused. This one can really be difficult to catch.

FernandoMiguel commented 2 years ago

@tanvp112 you are onto something there

we have this provider image notice the highlight bit that is not available until the cluster is up so it is possible that this provider is getting initialised with the wrong endpoint maybe even "localhost" and ofc that explains why auth fails explains why the 2nd apply works fine, cause now the endpoint is correct

mebays commented 2 years ago

So my issue was with authentication, and I believe this example clearly states the issue.

The example state that you must set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Doing a little more digging and for those having issues with authentication could try something like this.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This would set up the aws cli configuration if there is no config or credential file running on the host  that would run the aws cli command
    env = {
        AWS_ACCESS_KEY_ID = var.access_key_id
        AWS_SECRET_ACCESS_KEY = var.secret_access_key
        AWS_SESSION_TOKEN = var.token
    } 
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

I haven't gotten to try this myself, but it should work. The AWS_SESSION_TOKEN would only be needed for an assumed role process, but it could possibly work.

FernandoMiguel commented 2 years ago

So my issue was with authentication, and I believe this example clearly states the issue.

The example state that you must set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. Doing a little more digging and for those having issues with authentication could try something like this.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This would set up the aws cli configuration if there is no config or credential file running on the host  that would run the aws cli command
    env = {
        AWS_ACCESS_KEY_ID = var.access_key_id
        AWS_SECRET_ACCESS_KEY = var.secret_access_key
        AWS_SESSION_TOKEN = var.token
    } 
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

I haven't gotten to try this myself, but it should work. The AWS_SESSION_TOKEN would only be needed for an assumed role process, but it could possibly work.

I honestly don't know what you are trying to do... aws iam auth can be done in many ways. not everyone has a dedicated IAM account... we use assume roles, for ex.

mebays commented 2 years ago

I honestly don't know what you are trying to do... aws iam auth can be done in many ways. not everyone has a dedicated IAM account... we use assume roles, for ex.

When you assume a role your retrieve an temporary access key, secret key, and token. My code snippet is an example for when a user is running things in a jobbed off process inside of a container. Where the container contains no context for AWS (no config or credentials file). That is my use case where my runs are an isolated instance that does not persist (Terraform Cloud follows this same structure, but does not have aws installed by default), and run in a CICD pipeline fashion not on a local machine.

When the aws provider is used the configuration information is is passed into the provider for this example. (I'm making it simple. My context actually uses dynamic credential by using hashicorp vault, but don't want to introduce that complexity in this explanation.)

provider "aws" {
  region = "us-east-1"
  access_key = "<access key | passed via variable or some data query>"
  secret_key = "<secret access key | passed via variable or some data query>"
  token = "<session token | passed via variable or some data query>"
}

In this instance the AWS Provider has all information passed in and using the Provider Configuration method. On this run no local aws config file or environment variables exist, so it needs this to make any aws connection.

All aws resources create successfully in this process, besides that aws-auth configmap, when using the suggested example.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}

The reason this is failing is the Kubernetes provider has no context on what you use for the aws command because no config or environment variables are being used. Therefore this will fail

That is how the suggested route came to be.

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    # This would set up the aws cli configuration if there is no config or credential file running on the host  that would run the aws cli command
    env = {
        AWS_ACCESS_KEY_ID = "<same access key passed to aws provider | passed via variable or some data query>"
        AWS_SECRET_ACCESS_KEY = "<same secret access key passed to aws provider | passed via variable or some data query>"
        AWS_SESSION_TOKEN = "<same session token passed to aws provider | passed via variable or some data query>"
}
    } 
    # This requires the awscli to be installed locally where Terraform is executed\
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}

In this provider block it is purposely passing in the required credential/configuration needed for the aws cli to successfully call aws eks get-token --cluster-name <cluster name>. Because the kubernetes provider does not care what was passed in to the aws provider. There is no shared context because there is no local configuration file or environment variables being leveraged.

@FernandoMiguel does this make sense on what I was trying to attain now? This may not be your use case, but it is useful information for anyone trying to run this module using some external remote execution tool.

I'm going to add this module does not contain the issue, but adding the above snippet to the documentation may help out those that may be purposely providing configuration to the aws provider vs utilizing Environment variables or local config files.

FernandoMiguel commented 2 years ago

In this provider block it is purposely passing in the required credential/configuration needed for the aws cli to successfully call aws eks get-token --cluster-name <cluster name>. Because the kubernetes provider does not care what was passed in to the aws provider. There is no shared context because there is no local configuration file or environment variables being leveraged.

@FernandoMiguel does this make sense on what I was trying to attain now? This may not be your use case, but it is useful information for anyone trying to run this module using some external remote execution tool.

it does. I've been fighting issued using the kube provider for weeks with what seems a race condition or failed to initialise endpoint/creds. Sadly, in our case, your snippet does not help since creds are already available via metadata endpoint. but it's a good idea to always double check if CLI tools are using the expected creds.

alfredo-gil commented 2 years ago

I was having the same issue but the solution that worked for me is to configure the kubernetes provider to use the role, something like this:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}" ]
  }
}
FernandoMiguel commented 2 years ago

I was having the same issue but the solution that worked for me is to configure the kubernetes provider to use the role, something like this:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${AWS_ACCOUNT_ID}:role/${ROLE_NAME}" ]
  }
}

Ohh that's an interesting option... Need to try that

Epic55 commented 2 years ago

I have the same issue, but like this: Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused when i set "manage_aws_auth_configmap = true" when deploy eks managed group. Is there a decision how to solve it?

miguelgmalpha commented 2 years ago

Related, if someone is not aware of it. https://github.com/hashicorp/terraform-provider-kubernetes/issues/1307#issuecomment-873089000

mathewmoon commented 2 years ago

My team has suffered this ongoing problem for a hot minute now. Even if you use the k8s provider outside of the module to update the configmap you will hit an issue anytime your provider config relies on a computed value. The workaround that we are implementing as I type this is to use a local-exec to call a script with kubectl. We are updating the configmap and doing some helm stuff to replace aws-cni and coredns with a proper chart. This has been a huge pain for us with even plans failing when the cluster needs to be recreated or the EKS version updated. Hashi doesn't seem too concerned with the provider limitations/issues. There will alwasy be an inherent issue with having providers embeded in the module as well .

ghost commented 2 years ago

Same problem with a fresh deployment.

dempti commented 2 years ago

Same problem with a fresh deployment.

were you able to resolve it?

ghost commented 2 years ago

Same problem with a fresh deployment.

were you able to resolve it?

I have used this workaround.

dracut5 commented 2 years ago

We have updated successfully from 17.x to 18.x version, but I noticed current problem and decided to dig deeper.

Reproduce

My steps to reproduce the issue:

  1. Creating new cluster using the latest version(18.21.0 at that moment) of the module, create_aws_auth_configmap and manage_aws_auth_configmap are true due self-managed node groups. Worked well.
  2. Changing module parameter to make cluster destroy/apply, add to iam_role_name value some ending as example. I got an error
    β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused
    β”‚ 
    β”‚   with module.eks.kubernetes_config_map.aws_auth[0],
    β”‚   on .terraform/modules/eks/main.tf line 414, in resource "kubernetes_config_map" "aws_auth":
    β”‚  414: resource "kubernetes_config_map" "aws_auth" {

I used both configurations for kubernetes provider, which work as expected until iam_role_name was changed:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.eks.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.eks.token
}
provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id, "--role", "arn:aws:iam::${var.account_id}:role/system/${var.current_iam_role_name}"]
  }
}

As mentioned before, I suppose, such behavior is caused by computed values in kubernetes provider. I understand that cluster recreate is not what you want to get, but you should be able to determine that something is going wrong.

Versions

Terraform v1.1.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.75.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/http v2.1.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.11.0
+ provider registry.terraform.io/hashicorp/null v2.1.2
+ provider registry.terraform.io/hashicorp/tls v3.4.0

P.S. forgot to add that updating cluster version doesn't generate any errors for me.

amazingguni commented 2 years ago

I face same error T_T

rooty0 commented 2 years ago

So everything works well, tho whenever I change the cluster_name within the EKS module (cluster will be replaced, which is ok) I get an error: Error: Get "http://localhost/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/blah:eks-editor": dial tcp [::1]:80: connect: connection refused

This is driving me crazy. Wondering if owners of the kubernetes provider are even aware about what is going on. So far, no single workaround is working for me.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

robpearce-flux commented 2 years ago

bumping this as i have the same error with, 18.26.6. It seems like it's when trying to connect to make the configmap for aws-auth that is causing the issue.

nick4fake commented 2 years ago

This truly sounds like the biggest issue of the module. I've been using EKS tf module for many years, and this is something that we face with almost every upgrade and new deployment.

ecoupal-believe commented 2 years ago

I had this issue when I added one of these: iam_role_name / cluster_security_group_name / node_security_group_name on an existing cluster.

sushil-propel commented 2 years ago

As the OP wrote, using export KUBE_CONFIG_PATH=$PWD/kubeconfig worked for me. So did using config_path for the provider "kubernetes" block. Which, of course, means it takes a couple of passes of terraform apply in order to get to a stable state.

adiii717 commented 2 years ago

Face a similar issue and have to delete the aws_auth from the remote stat and then terragrunt apply seems to work.

to remove the block from remote state

terragrunt state pull > temp.tfstate
//remove the complete block of `aws_auth` {}
      "module": "module.eks",
      "mode": "managed",
      "type": "kubernetes_config_map_v1_data",
      "name": "aws_auth",
 terragrunt state push temp.tfstate
terragrunt apply

and found shortest way

terragrunt state list
terragrunt state rm "module.eks.kubernetes_config_map_v1_data.aws_auth[0]"
FernandoMiguel commented 2 years ago

Face a similar issue and have to delete the aws_auth from the remote state which fixed the issue.

to remove the block from remote state

terragrunt state pull > temp.tfstate
//remove the complete block of `aws_auth` {}
      "module": "module.eks",
      "mode": "managed",
      "type": "kubernetes_config_map_v1_data",
      "name": "aws_auth",
 terragrunt state push temp.tfstate

You don't want to leave aws auth unmanaged.

adiii717 commented 2 years ago

okay this is going to be crazy, I tried to remove it from the state

terragrunt state rm "module.eks.kubernetes_config_map_v1_data.aws_auth[0]"

and now when I do apply, I am getting

β”‚ Error: error creating EKS Cluster (demo-eks): ResourceInUseException: Cluster already exists with name: demo-eks

also the kubernetes provider does not help

bcarranza commented 2 years ago

to remove the block from remote state Hi I'm in the same point that you @adiii717 !!!, Have you been able to get through this?

adiii717 commented 2 years ago

@bcarranza actually the error keeps the same until I have to destroy and recreate, the more strange part is that the destroy recognize the same cluster but the apply does not. so I will say the latest module is pretty unstable which definitely create problem in the live environment, been using 17.x so far in live but did not face any issue so far

bcarranza commented 2 years ago

@bcarranza actually the error keeps the same until I have to destroy and recreate, the more strange part is that the destroy recognize the same cluster but the apply does not. so I will say the latest module is pretty unstable which definitely create problem in the live environment, been using 17.x so far in live but did not face any issue so far

Hi @adiii717 , In my case I can't destroy the cluster, because even though it happens to me in an early environment, I don't want to imagine this happening in production, so I have to find a solution without destroying the cluster, as a preventive measure if this happens in production.

dcarrion87 commented 2 years ago

This is such a frustrating issue having to do crazy workarounds to get the auth mechanism to work.

Only an issue when certain changes to the EKS cluster that cause data sources to be empty.