terraform-aws-modules / terraform-aws-eks

Terraform module to create Amazon Elastic Kubernetes (EKS) resources πŸ‡ΊπŸ‡¦
https://registry.terraform.io/modules/terraform-aws-modules/eks/aws
Apache License 2.0
4.45k stars 4.06k forks source link

dial tcp 127.0.0.1:80: connect: connection refused #2007

Closed ArchiFleKs closed 1 year ago

ArchiFleKs commented 2 years ago

Description

I know there are numerous issues (#817) related to this problem, but since v18.20.1 reintroduced the management of configmap thought we could discuss in a new one because the old ones are closed.

The behavior is till very weird. I updated my module to use the configmap management feature and the first run went fine (was using the aws_eks_cluster_auth datasource. When I run the module with no change I have no error either in plan or apply.

I then tried to update my cluster form v1.21 to v1.22 and then plan and apply began to fail with the following well know error:

null_resource.node_groups_asg_tags["m5a-xlarge-b-priv"]: Refreshing state... [id=7353592322772826167]                                                                                                                                    
β•·                                                                                                                                                                                                                                        
β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp 127.0.0.1:80: connect: connection refused                                                                                                    
β”‚                                                                                                                                                                                                                                        
β”‚   with kubernetes_config_map_v1_data.aws_auth[0],                                                                                                                                                                                      
β”‚   on main.tf line 428, in resource "kubernetes_config_map_v1_data" "aws_auth":                                                                                                                                                         
β”‚  428: resource "kubernetes_config_map_v1_data" "aws_auth" {                                                                                                                                                                            
β”‚                                                                                                                                                                                                                                        
β•΅                                                           

I then moved to the exec plugin as recommended per the documentation and removed from state the old datasource. Still go the same error.

Something I don't get is when setting the variable export KUBE_CONFIG_PATH=$PWD/kubeconfig as suggested in #817 things work as expected.

I'm sad to see things are still unusable (not related to this module but on the Kubernetes provider side), load_config_file option has been removed from Kubernetes provider for a while and I don't see why this variable needs to be set and how it could be set beforehand.

Anyway, if someone managed to use the readded feature of managing configmap I'd be glad to know how to workaround this and help debug this issue.

PS: I'm using Terragrunt, not sure if the issue could be related but it might

Versions

Terraform v1.1.7
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.9.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.10.0
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/tls v3.3.0

Reproduce

Here is my provider block

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", data.aws_eks_cluster.cluster.id]
  }
}

data "aws_eks_cluster" "cluster" {
  name = aws_eks_cluster.this[0].id
}
github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

sotiriougeorge commented 2 years ago

This is still a major issue.

bryantbiggs commented 2 years ago

This is still a major issue.

This isn't a module issue. This is at the provider level; there isn't anything we can do here

evercast-mahesh2021 commented 2 years ago

I am getting below error while if i touch/change/comment/update anything on cluster_security_group_description and cluster_security_group_name variables. I just wanted to get a default name and description of sg that is created for EKS by default. I am using version = "~> 18.23.0".

cluster_security_group_description = "Short Description" cluster_security_group_name = local.name_suffix

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

  with module.eks_cluster.kubernetes_config_map_v1_data.aws_auth[0],
  on .terraform/modules/eks_cluster/main.tf line 443, in resource "kubernetes_config_map_v1_data" "aws_auth":
 443: resource "kubernetes_config_map_v1_data" "aws_auth" {

Any solution for this?

Thanks!

mesobreira commented 2 years ago

Hello,

Regarding this problem, I also had this problem and found a workaround. Since this issue happens when the EKS datasources are only "known after application" during the terraform plan due to ControlPlane endpoint changes, I created an external datasource that basically fetches the EKS cluster endpoint and certificates from a shell script (it uses the aws command line). The script is in attach. I set it as my default data source. If this datasource fails (usually when I create a new cluster), it switches to the default EKS datasource. But with this external datasource, I no longer depend on the state of terraform and then any "Known after application" has no impact.

This is the content of the .tf file used to instantiate the kubernetes providers : data "aws_region" "current" {}

data "external" "aws_eks_cluster" { program = ["sh","${path.module}/script/get_endpoint.sh" ] query = { cluster_name = "${var.kubernetes_properties.cluster_name}" region_name = "${data.aws_region.current.name}" } }

provider "kubernetes" { host = data.external.aws_eks_cluster.result.cluster_endpoint == "" ? data.aws_eks_cluster.this[0].endpoint : data.external.aws_eks_cluster.result.cluster_endpoint cluster_ca_certificate = data.external.aws_eks_cluster.result.cluster_endpoint == "" ? base64decode(data.aws_eks_cluster.this[0].certificate_authority[0].data) : base64decode(data.external.aws_eks_cluster.result.certificate_data) exec { api_version = "client.authentication.k8s.io/v1beta1" args = ["eks", "get-token", "--cluster-name", var.kubernetes_properties.cluster_name, "--role-arn", try(data.aws_iam_session_context.this[0].issuer_arn, "")] command = "aws" } }

The same configuration can be applied to kubectl and helm providers. I have created clusters and changed EKS control plane configurations using this workaround and have no issues so far.

I know that External Data Source is not recommended as it's a bypass to the terraform state, but in this case it's very useful. get_endpoint.sh.gz

bryantbiggs commented 2 years ago

But with this external datasource, I no longer depend on the state of terraform and then any "Known after application" has no impact.

That is entirely inaccurate. The kubernetes/helm/kubectl providers will always need a clusters certificate and endpoint, in some shape or form, which are not values that you can know before the cluster comes into existence

mesobreira commented 2 years ago

My bad. What I was trying to say is that after the cluster is created, I will not depend on "know after applying" in case of changes in the EKS control plane. If the cluster does not exist of course, I cannot retrieve the EKS cluster endpoint and certificate.

That's why I said, "If this datasource fails (usually when I create a new cluster), it switches to the default EKS datasource."

That's why I have this condition: data.external.aws_eks_cluster.result.cluster_endpoint == ""? data.aws_eks_cluster.this[0].endpoint: data.external.aws_eks_cluster.result.cluster_endpoint

evercast-mahesh2021 commented 1 year ago

Thank you @mesobreira and @bryantbiggs. I will try this solution.

csepulveda commented 1 year ago

I am getting below error while if i touch/change/comment/update anything on cluster_security_group_description and cluster_security_group_name variables. I just wanted to get a default name and description of sg that is created for EKS by default. I am using version = "~> 18.23.0".

cluster_security_group_description = "Short Description" cluster_security_group_name = local.name_suffix

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

  with module.eks_cluster.kubernetes_config_map_v1_data.aws_auth[0],
  on .terraform/modules/eks_cluster/main.tf line 443, in resource "kubernetes_config_map_v1_data" "aws_auth":
 443: resource "kubernetes_config_map_v1_data" "aws_auth" {

Any solution for this?

Thanks!

Same issue here, i could create without any issue the clusters and modify it. But after a few hours i got the same error.

I already try a lot of changes. Use data. Use module output Use exec command

Always the same issue.

β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused β”‚ β”‚ with module.eks.kubernetes_config_map_v1_data.aws_auth[0], β”‚ on .terraform/modules/eks/main.tf line 475, in resource "kubernetes_config_map_v1_data" "aws_auth": β”‚ 475: resource "kubernetes_config_map_v1_data" "aws_auth" {

mesobreira commented 1 year ago

@csepulveda, have you tried to use external data source, as I mentioned above ?

VladoPortos commented 1 year ago

I really do not understand what the issue is with terraform.

provider "kubernetes" {
  host = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", var.cluster_name]
  }
}

Using the data will not provide the information to the provider, despite the information clearly are in state file and are correct. Had to switch it to module.eks.cluster_endpoint and module.eks.cluster_certificate_authority_data

Why the variables are not provided to provider ??

terraform -version
Terraform v1.3.3
on linux_amd64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v4.37.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.2.0
+ provider registry.terraform.io/hashicorp/helm v2.7.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.15.0
+ provider registry.terraform.io/hashicorp/local v2.2.3
+ provider registry.terraform.io/hashicorp/null v3.2.0
+ provider registry.terraform.io/hashicorp/random v3.4.3
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/time v0.9.0
+ provider registry.terraform.io/hashicorp/tls v4.0.4
+ provider registry.terraform.io/oboukili/argocd v4.1.0
+ provider registry.terraform.io/terraform-aws-modules/http v2.4.1
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

joseph-igb commented 1 year ago

Was getting this error: Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused Using static values in the data section fixed the error for me. This was my configuration:

data "aws_eks_cluster_auth" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}
data "aws_eks_cluster" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}
sotiriougeorge commented 1 year ago

Was getting this error: Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused Using static values in the data section fixed the error for me. This was my configuration:

data "aws_eks_cluster_auth" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}
data "aws_eks_cluster" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

How do you mean static values?

joseph-igb commented 1 year ago

Was getting this error: Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused Using static values in the data section fixed the error for me. This was my configuration:

data "aws_eks_cluster_auth" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}
data "aws_eks_cluster" "default" {
  name = var.cluster_name
  depends_on =[aws_eks_cluster.cluster]
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

How do you mean static values?

Previously had something along the lines of:

data "aws_eks_cluster_auth" "default" {
  name = aws_eks_cluster.my_cluster.name
}

Based on some of the comments above, decided to use pre-set values so used variables and that got rid of the error.

stdmje commented 1 year ago

Same error here using Terragrunt. Everytime i have to upgrade k8s version i have to delete kubernetes_config_map_v1_data.aws_auth[0] from the state otherwise i will get the following error.

kubernetes_config_map_v1_data.aws_auth[0]: Refreshing state... [id=kube-system/aws-auth]
β•·
β”‚ Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused
β”‚
β”‚   with kubernetes_config_map_v1_data.aws_auth[0],
β”‚   on main.tf line 518, in resource "kubernetes_config_map_v1_data" "aws_auth":
β”‚  518: resource "kubernetes_config_map_v1_data" "aws_auth" {
β”‚
β•΅
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 1 year ago

This issue was automatically closed because of stale in 10 days

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.