jenkins-x / terraform-aws-eks-jx

A Terraform module for creating Jenkins X infrastructure on AWS
Apache License 2.0
63 stars 42 forks source link

JX3 setup failing. Kubernetes cluster unreachable: invalid configuration. #233

Closed serhiykrupka closed 3 years ago

serhiykrupka commented 3 years ago

Summary

JX3 setup failed on AWS

Steps to reproduce the behavior

  1. generate repo from https://github.com/jx3-gitops-repositories/jx3-terraform-eks/generate (commit https://github.com/jenkins-x/terraform-aws-eks-jx/tree/af1182fc3d07881af6a96a2335255ef7b0dce7f6)

  2. generate repo from https://github.com/jx3-gitops-repositories/jx3-eks-vault/generate

  3. configure terraform variables

  4. execute: terraform apply

Expected behavior

Boot job stated without errors.

Actual behavior

module.eks-jx.module.cluster.module.iam_assumable_role_cluster_autoscaler.aws_iam_role_policy_attachment.custom[0]: Creating...
module.eks-jx.module.cluster.module.iam_assumable_role_pipeline_visualizer.aws_iam_role_policy_attachment.custom[0]: Creating...
module.eks-jx.module.cluster.module.iam_assumable_role_external_dns.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-external-dns-20210121104622556100000019]
module.eks-jx.module.cluster.module.iam_assumable_role_controllerbuild.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-build-ctrl-2021012110462263690000001a]
module.eks-jx.module.cluster.module.iam_assumable_role_cm_cainjector.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-cert-manager-cert-manager-cainjector-2021012110462269650000001b]
module.eks-jx.module.cluster.module.iam_assumable_role_cluster_autoscaler.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-cluster-autoscaler-cluster-autoscaler-2021012110462283150000001c]
module.eks-jx.module.cluster.module.iam_assumable_role_pipeline_visualizer.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-jx-pipelines-visualizer-2021012110462286320000001d]
module.eks-jx.module.cluster.module.iam_assumable_role_tekton_bot.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-tekton-bot-2021012110462288510000001f]
module.eks-jx.module.cluster.module.iam_assumable_role_cert_manager.aws_iam_role_policy_attachment.custom[0]: Creation complete after 1s [id=jx-test-cert-manager-cert-manager-2021012110462288200000001e]
module.eks-jx.module.cluster.null_resource.kubeconfig (local-exec): Updated context arn:aws:eks:eu-central-1:xxx:cluster/jx-test in /Users/serhiykrupka/.kube/config
module.eks-jx.module.cluster.null_resource.kubeconfig: Creation complete after 1s [id=5757698152048227794]
module.eks-jx.module.cluster.helm_release.jx-git-operator[0]: Creating...

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Terraform version

v0.13.3

Module version

Operating system

MacOS 11.1

nnsense commented 3 years ago

Same here

Note that I can actually reach the cluster:

# kubectl get pods -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE
aws-node-7vdqj             1/1     Running   0          11m
aws-node-fksdg             1/1     Running   0          11m
aws-node-nhmgr             1/1     Running   0          11m
coredns-59b69b4849-hslfw   1/1     Running   0          16m
coredns-59b69b4849-wdzds   1/1     Running   0          16m
kube-proxy-bl7vm           1/1     Running   0          11m
kube-proxy-chw85           1/1     Running   0          11m
kube-proxy-jfktn           1/1     Running   0          11m
# eksctl get cluster
NAME            REGION          EKSCTL CREATED
eks-devops      eu-central-1    False

This is the last part of the apply with debug enabled:

-----------------------------------------------------: timestamp=2021-01-24T17:54:46.619Z                                                                                                                                                
2021-01-24T17:54:46.620Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [DEBUG] [aws-sdk-go] {"ContinuousBackupsDescription":{"ContinuousBackupsStatus":"ENABLED","PointInTimeRecoveryDescription":{"PointInTimeRe
coveryStatus":"DISABLED"}}}: timestamp=2021-01-24T17:54:46.619Z                                                                                                                                                                          
2021/01/24 17:54:46 [WARN] Provider "registry.terraform.io/hashicorp/aws" produced an unexpected new value for module.eks-jx.module.vault.aws_dynamodb_table.vault-dynamodb-table[0], but we are tolerating it because it is using the le
gacy plugin SDK.                                                                                                                                                                                                                         
    The following problems may be the cause of any confusing errors from downstream operations:                                                                                                                                          
      - .write_capacity: was cty.NumberIntVal(2), but now cty.NumberIntVal(0)                                                                                                                                                            
      - .read_capacity: was cty.NumberIntVal(2), but now cty.NumberIntVal(0)                                                                                                                                                             
module.eks-jx.module.vault.aws_dynamodb_table.vault-dynamodb-table[0]: Modifications complete after 2s [id=vault-unseal-eks-devops-Y9yKtOQW]                                                                                             
module.eks-jx.module.vault.data.aws_iam_policy_document.vault_iam_user_policy_document[0]: Reading... [id=3649945265]
2021-01-24T17:54:46.686Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.686Z                                         
2021-01-24T17:54:46.686Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.686Z                                         
2021-01-24T17:54:46.686Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.686Z                                         
2021-01-24T17:54:46.686Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.686Z                                         
2021-01-24T17:54:46.687Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.686Z                                         
2021-01-24T17:54:46.687Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.687Z                                         
2021-01-24T17:54:46.687Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.687Z                                         
2021-01-24T17:54:46.687Z [INFO]  plugin.terraform-provider-aws_v3.25.0_x5: 2021/01/24 17:54:46 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-24T17:54:46.687Z                                         
module.eks-jx.module.vault.data.aws_iam_policy_document.vault_iam_user_policy_document[0]: Read complete after 0s [id=3649945265]                                                                                                        
2021/01/24 17:54:46 [DEBUG] After incorporating new values learned so far during apply, module.eks-jx.module.vault.aws_iam_policy.aws_vault_user_policy[0] change has become NoOp
2021-01-24T17:54:46.711Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"                                                                                         
2021-01-24T17:54:46.715Z [DEBUG] plugin: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/3.25.0/linux_amd64/terraform-provider-aws_v3.25.0_x5 pid=26563                                             
2021-01-24T17:54:46.715Z [DEBUG] plugin: plugin exited                                                                                                                                                                                   

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

2021-01-24T17:54:46.734Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-01-24T17:54:46.734Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-01-24T17:54:46.734Z [WARN]  plugin.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing"
2021-01-24T17:54:46.737Z [DEBUG] plugin: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/helm/2.0.2/linux_amd64/terraform-provider-helm_v2.0.2_x5 pid=26543
2021-01-24T17:54:46.737Z [DEBUG] plugin: plugin exited
2021-01-24T17:54:46.739Z [DEBUG] plugin: plugin process exited: path=/usr/bin/terraform pid=26385
2021-01-24T17:54:46.739Z [DEBUG] plugin: plugin exited
2021-01-24T17:54:46.739Z [DEBUG] plugin: plugin process exited: path=/usr/bin/terraform pid=26288
2021-01-24T17:54:46.739Z [DEBUG] plugin: plugin exited

Also note that before trying jx3 I've tried jx2, which is using the same module, and it worked, so I guess the issue is related the jx3 part triggered by is_jx2.

nnsense commented 3 years ago

I did some testing and apparently if I set to false the option install_kuberhealthy I get the error only one time:

Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

If I set it back to true, I get 2 errors like that, as you can see in both mine and OP log, so (maybe) that option is related.

Hope it helps.

jstrachan commented 3 years ago

we fixed the kuberhealthy issue here: https://github.com/jenkins-x/terraform-jx-health/commit/9d99da326fcb87d4c9b0fa973f2ac13e83aa12d1 it was caused by a new kuberhealthy chart breaking

jstrachan commented 3 years ago

I wonder if you could remove your .terraform folder and rerun terraform plan && terraform apply?

nnsense commented 3 years ago

Sure, currently destroying, will start over and update, thanks! :)

Frejl commented 3 years ago

same issue for me, tried on both macos and ubuntu

nnsense commented 3 years ago

Update: Unfortunately I'm getting the same behaviour starting with a new deployment (no .terraform)

patrickleet commented 3 years ago

https://kubernetes.slack.com/archives/C9MBGQJRH/p1611693802377700?thread_ts=1611679357.370700&channel=C9MBGQJRH&message_ts=1611693802.377700

ankitm123 commented 3 years ago

/assign I have been doing what @patrickleet suggested in the slack message. Let me check why this is breaking, will report back.

dephee commented 3 years ago

i had the same problem. I fixed that by adding helm provider config

provider "helm" {
  kubernetes {
      config_path = "./kubeconfig_jenkins"
  }
}

it will fail on first call, but it will success on second call, after kubeconfig_jenkins is created

ankitm123 commented 3 years ago

it will fail on first call, but it will success on second call, after kubeconfig_jenkins is created

So the reason this happened was because, the helm provider uses outdated kube config on the first apply, then when you apply the second time, it has access to the correct credentials and applies successfully.

@dephee Would be nice to give my branch a try, and see if the error goes away - wont need the provider helm block anymore in the main.tf file.