kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!
MIT License
2.39k stars 368 forks source link

Unable to deploy Kubernetes cluster. SSH connection fails #110

Closed arkkanoid closed 2 years ago

arkkanoid commented 2 years ago

Hi, I'm trying to deploy a cluster with the same config than the template but when the servers reboot the deployment is unable to connect to them via SSH.

module.control_planes[1].hcloud_server.server (remote-exec): Connecting to remote host via SSH... module.control_planes[1].hcloud_server.server (remote-exec): Host: XXXXX module.control_planes[1].hcloud_server.server (remote-exec): User: root module.control_planes[1].hcloud_server.server (remote-exec): Password: false module.control_planes[1].hcloud_server.server (remote-exec): Private key: true module.control_planes[1].hcloud_server.server (remote-exec): Certificate: false module.control_planes[1].hcloud_server.server (remote-exec): SSH Agent: true module.control_planes[1].hcloud_server.server (remote-exec): Checking Host Key: false module.control_planes[1].hcloud_server.server (remote-exec): Target Platform: unix module.agents["agent-small-0"].hcloud_server.server: Still creating... [4m30s elapsed] module.agents["agent-big-0"].hcloud_server.server: Still creating... [4m30s elapsed]

mysticaltech commented 2 years ago

Check your keypairs! If you have a passphrase you need to use the agent, otherwise, just generate a new one of the same type as the example.

mysticaltech commented 2 years ago

To debug more, please save the terraform logs with the following:

export TF_LOG=TRACE
terraform apply 2>&1 | tee apply.log

And post the file here, please!

arkkanoid commented 2 years ago

thanks for your feedback @mysticaltech . These are the logs I get:

remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" Still creating... [2m30s elapsed] [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" Still creating... [2m40s elapsed] [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" remote-exec): Waiting for load-balancer to get an IP... Still creating... [2m50s elapsed] [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" Still creating... [3m0s elapsed] [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "root" is waiting for "output.load_balancer_public_ipv4" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" is waiting for "data.hcloud_load_balancer.traefik" remote-exec): Waiting for load-balancer to get an IP... [TRACE] dag/walk: vertex "output.load_balancer_public_ipv4" is waiting for "data.hcloud_load_balancer.traefik" [TRACE] dag/walk: vertex "null_resource.destroy_traefik_loadbalancer" is waiting for "null_resource.kustomization" [TRACE] dag/walk: vertex "provider[\"registry.terraform.io/hashicorp/null\"] (close)" is waiting for "null_resource.destroy_traefik_loadbalancer" [TRACE] dag/walk: vertex "data.hcloud_load_balancer.traefik" is waiting for "null_resource.kustomization" [DEBUG] remote command exited with '124': /tmp/terraform_356227660.sh [WARN] Errors while provisioning null_resource.kustomization with "remote-exec", so aborting [TRACE] evalApplyProvisioners: null_resource.kustomization provisioning failed, but we will continue anyway at the caller's request [TRACE] maybeTainted: null_resource.kustomization encountered an error during creation, so it is now marked as tainted [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState to workingState for null_resource.kustomization [TRACE] NodeAbstractResouceInstance.writeResourceInstanceState: writing state object for null_resource.kustomization [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 148 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate [ERROR] vertex "null_resource.kustomization" error: remote-exec provisioner error [TRACE] vertex "null_resource.kustomization": visit complete, with errors [TRACE] dag/walk: upstream of "data.hcloud_load_balancer.traefik" errored, so skipping [TRACE] dag/walk: upstream of "null_resource.destroy_traefik_loadbalancer" errored, so skipping [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/hetznercloud/hcloud\"] (close)" errored, so skipping [TRACE] dag/walk: upstream of "provider[\"registry.terraform.io/hashicorp/null\"] (close)" errored, so skipping [TRACE] dag/walk: upstream of "output.load_balancer_public_ipv4" errored, so skipping [TRACE] dag/walk: upstream of "root" errored, so skipping [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 149 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate ╷ │ Error: remote-exec provisioner error │ │ with null_resource.kustomization, │ on init.tf line 128, in resource "null_resource" "kustomization": │ 128: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_356227660.sh": Process exited with status 124 ╵ ╷ │ Error: server limit exceeded (resource_limit_exceeded) │ │ with module.agents["agent-big-0"].hcloud_server.server, │ on modules/host/main.tf line 1, in resource "hcloud_server" "server": │ 1: resource "hcloud_server" "server" { │ ╵ statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing" provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = transport is closing" provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/null/3.1.0/darwin_amd64/terraform-provider-null_v3.1.0_x5 pid=77422 provider: plugin exited provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hetznercloud/hcloud/1.33.1/darwin_amd64/terraform-provider-hcloud_v1.33.1 pid=77420 provider: plugin exited

mysticaltech commented 2 years ago

Here's your error, just ask Hetzner to allow you to create more servers :)

Process exited with status 124 ╵ ╷ │ Error: server limit exceeded (resource_limit_exceeded)

arkkanoid commented 2 years ago

Agh that's true! Actually as I signed up with a new account I can't create more servers... Your account is too new to request a limit increase. Please note that we generally do not answer questions regarding limit increase on the telephone. I'll start with 1 worker node.

Thanks!

mysticaltech commented 2 years ago

@arkkanoid Will push in 1h, something that will let you use 1 worker node better. Keep you posted.

mysticaltech commented 2 years ago

You can now pull the master branch and your single node cluster will not use the Hetzner LB, but the embedded Klipper LB instead, which makes more sense!