rancher / tf-rancher-up

MIT License
14 stars 5 forks source link

"Error: Kubernetes cluster unreachable" when creating upstream k3s cluster on aws #157

Open kourosh7 opened 2 months ago

kourosh7 commented 2 months ago

I have cloned the repo to my local laptop. I navigate to ../recipes/upstream/aws/k3s and set the terraform.tfvars file Then I run terraform init/plan/apply and get the following output with error at the end:

module.rancher_install.null_resource.bootstrap_message: Creating...
module.rancher_install.null_resource.bootstrap_message: Provisioning with 'local-exec'...
module.rancher_install.null_resource.bootstrap_message (local-exec): Executing: ["/bin/sh" "-c" "echo 'Rancher will be started with the given password'"]
module.rancher_install.null_resource.bootstrap_message (local-exec): Rancher will be started with the given password
module.rancher_install.null_resource.bootstrap_message: Creation complete after 0s [id=5072771248198043446]
module.k3s_first.random_password.token: Creating...
module.k3s_additional.random_password.token: Creating...
module.k3s_additional.random_password.token: Creation complete after 0s [id=none]
module.k3s_first.random_password.token: Creation complete after 0s [id=none]
module.k3s_first_server.aws_security_group.sg_allowall[0]: Creating...
module.k3s_first_server.aws_security_group.sg_allowall[0]: Creation complete after 3s [id=sg-0bb598eafd5b707d9]
module.k3s_first_server.aws_instance.instance[0]: Creating...
module.k3s_first_server.aws_instance.instance[0]: Still creating... [10s elapsed]
module.k3s_first_server.aws_instance.instance[0]: Provisioning with 'remote-exec'...
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Host: 35.94.253.178
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Host: 35.94.253.178
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Host: 35.94.253.178
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_first_server.aws_instance.instance[0]: Still creating... [20s elapsed]
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Host: 35.94.253.178
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_first_server.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Connected!
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Waiting for cloud-init to complete...
module.k3s_first_server.aws_instance.instance[0]: Still creating... [30s elapsed]
module.k3s_first_server.aws_instance.instance[0]: Still creating... [40s elapsed]
module.k3s_first_server.aws_instance.instance[0]: Still creating... [50s elapsed]
module.k3s_first_server.aws_instance.instance[0] (remote-exec): Completed cloud-init!
module.k3s_first_server.aws_instance.instance[0]: Creation complete after 53s [id=i-0eebe381ea74b53d4]
data.local_file.ssh_private_key: Reading...
data.local_file.ssh_private_key: Read complete after 0s [id=fb3da899d3b61f3cd12c59f0b0a58a21ad62b31b]
ssh_resource.retrieve_kubeconfig: Creating...
module.k3s_workers.aws_instance.instance[0]: Creating...
ssh_resource.retrieve_kubeconfig: Creation complete after 0s [id=3551892770514621087]
local_file.kube_config_yaml_backup: Creating...
local_file.kube_config_yaml: Creating...
local_file.kube_config_yaml: Creation complete after 0s [id=a00363a8dae13ac17259015c7ec318fbe9e5f19b]
local_file.kube_config_yaml_backup: Creation complete after 0s [id=a00363a8dae13ac17259015c7ec318fbe9e5f19b]
module.rancher_install.helm_release.cert_manager[0]: Creating...
module.k3s_workers.aws_instance.instance[0]: Still creating... [10s elapsed]
module.k3s_workers.aws_instance.instance[0]: Provisioning with 'remote-exec'...
module.k3s_workers.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Host: 35.87.207.105
module.k3s_workers.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_workers.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Host: 35.87.207.105
module.k3s_workers.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_workers.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Host: 35.87.207.105
module.k3s_workers.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_workers.aws_instance.instance[0]: Still creating... [20s elapsed]
module.k3s_workers.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Host: 35.87.207.105
module.k3s_workers.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_workers.aws_instance.instance[0]: Still creating... [30s elapsed]
module.k3s_workers.aws_instance.instance[0] (remote-exec): Connecting to remote host via SSH...
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Host: 35.87.207.105
module.k3s_workers.aws_instance.instance[0] (remote-exec):   User: ubuntu
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Password: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Private key: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Certificate: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   SSH Agent: true
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Checking Host Key: false
module.k3s_workers.aws_instance.instance[0] (remote-exec):   Target Platform: unix
module.k3s_workers.aws_instance.instance[0] (remote-exec): Connected!
module.k3s_workers.aws_instance.instance[0] (remote-exec): Waiting for cloud-init to complete...
module.k3s_workers.aws_instance.instance[0]: Still creating... [40s elapsed]
module.k3s_workers.aws_instance.instance[0] (remote-exec): Completed cloud-init!
module.k3s_workers.aws_instance.instance[0]: Creation complete after 46s [id=i-01384dda824790169]
╷
│ Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
│ 
│   with module.rancher_install.helm_release.cert_manager[0],
│   on ../../../../modules/rancher/main.tf line 90, in resource "helm_release" "cert_manager":
│   90: resource "helm_release" "cert_manager" {
│ 
╵
kourosh7 commented 2 months ago

I see the same with /upstream/aws/rke

kourosh7 commented 2 months ago

based on: https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1234 maybe the issue here is that the KUBECONFIG file is not set when it's trying to install cert-manager?

kourosh7 commented 2 months ago

I think I found the issue. I added an output to check what the kubeconfig file points to:

kourosh@kourosh:~/tf-rancher-up/recipes/upstream/aws/k3s$ terraform output
instances_private_ip = [
  [
    "172.31.44.112",
  ],
  [],
]
instances_public_ip = [
  [
    "34.212.45.9",
  ],
  [],
]
kubeconfig_file = "~/.kube/rancher-terraform.yml"
rancher_bootstrap_password = "initial-admin-password"
rancher_hostname = "rancher.34.212.45.9.sslip.io"
rancher_url = "https://rancher.34.212.45.9.sslip.io"
kourosh@kourosh:~/tf-rancher-up/recipes/upstream/aws/k3s$ cat \~/.kube/rancher-terraform.yml 
cat: '~/.kube/rancher-terraform.yml': Is a directory
kourosh@kourosh:~/tf-rancher-up/recipes/upstream/aws/k3s$ ls \~/.kube/rancher-terraform.yml 
kourosh_kube_config.yml  kourosh_kube_config.yml.backup

The script creates a directory in ~/.kube/ with the name rancher-terraform.yml and then the kubeconfig file is generated beneath that directory. But the script is trying to use ~/.kube/rancher-terraform.yml as the kubeconfig file which is invalid.

dkeightley commented 2 months ago

I think the issue might be related to the kubeconfig_file having a full path, looking at the code it should have a filename only, if you want the path you can add that too, eg

kube_config_path = "~/.kube/"
kube_config_filename = "rancher-terraform.yml"

https://github.com/rancher/tf-rancher-up/blob/main/recipes/upstream/aws/k3s/variables.tf#L106-L116

By default, without a path it will use the current working directory (cwd)

https://github.com/rancher/tf-rancher-up/blob/main/recipes/upstream/aws/k3s/main.tf#L2-L4

kourosh7 commented 2 months ago

@dkeightley I think you are on the right track. The problem can be seen when:

  1. following the example in the terraform.tfvars.exmple file:
## -- Override the default (${prefix}_kube_config.yml) kubeconfig file/path
kube_config_path = "~/.kube/rancher-terraform.yml"
  1. Following your example (from the previous comment) has the same problem.

It works fine when:

  1. kube_config_path is set to just a filename with no path
  2. Not setting any file/path in the terraform.tfvars file at all