Closed dalosa14 closed 6 months ago
module.kube-hetzner.null_resource.agents["1-0-agent-2"]: Still creating... [1h40m0s elapsed] ^C Interrupt received. Please wait for Terraform to exit or data loss may occur. Gracefully shutting down...
Stopping operation... ╷ │ Error: execution halted │ │ ╵ ╷ │ Error: execution halted │ │ ╵ ╷ │ Error: execution halted │ │ ╵ ╷ │ Error: execution halted │ │ ╵ ╷ │ Error: execution halted │ │ ╵ ╷ │ Error: execution halted │ │ ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["1-0-agent-2"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1327696752.sh": wait: remote command exited without exit status or exit signal ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-1"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_2112463296.sh": wait: remote command exited without exit status or exit signal ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["2-0-agent-3"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1129368313.sh": wait: remote command exited without exit status or exit signal ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.control_planes["2-0-control-plane-hel1"], │ on .terraform/modules/kube-hetzner/control_planes.tf line 176, in resource "null_resource" "control_planes": │ 176: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1033948135.sh": Process exited with status 124 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.control_planes["1-0-control-plane-nbg1"], │ on .terraform/modules/kube-hetzner/control_planes.tf line 176, in resource "null_resource" "control_planes": │ 176: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_938187383.sh": Process exited with status 124 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.control_planes["0-0-control-plane-fsn1"], │ on .terraform/modules/kube-hetzner/control_planes.tf line 176, in resource "null_resource" "control_planes": │ 176: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1354768503.sh": Process exited with status 124 ╵
if its just 1 plane, works fine
@dalosa14 longhorn won't work on cpx11, as explained in the comments inside kube.tf.example, it needs at least cx21. If you want same price as cpx11, just use cax21.
For one cp it works cause your cp nodes are cx21, which has enough RAM, as when in single node setup, we enable scheduling on the cp.
@mysticaltech Thank you for your response. I've been reviewing the configuration I shared and the details you mentioned about the RAM requirements for Longhorn, but I'm not entirely convinced that this is the issue in this case. According to my kube.tf file, I'm using cpx21 instances with 4GB of RAM for the control plane nodes and cax11 instances, also with 4GB, for the agent (worker) nodes. Both instance types meet the minimum 4GB requirement you indicated for Longhorn. I understand that Longhorn typically runs on the worker nodes and not necessarily on the control plane nodes, so the cax11 nodes I'm using should be sufficient in terms of RAM. Moreover, if the problem was a lack of RAM, I would expect to see specific errors related to insufficient resources or the inability to schedule Longhorn pods. However, what I'm seeing is that the provisioning process gets stuck on creating the agent nodes, without any clear error about the cause. Do you have any other suggestions on what could be causing this issue? Are there any additional logs or error messages I can provide to help diagnose the problem? I appreciate your help and I'm willing to provide more details if necessary to get to the bottom of this issue.
My bad @dalosa14, I misread that you were using cpx11 for the agents. In that case, indeed, cax11 should work. I suggest you debug via ssh'ing into a cp node (see debug section in the readme) during the unending install, and kubectl from there to get all the logs, see what's happening, why it's getting stuck. Inspect the memory usage also via kubectl top.
I also have the same issue.
module.kube-hetzner.null_resource.kustomization: Still creating... [6m10s elapsed] module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-arm-small"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_388455073.sh": Process exited with status 124 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 290, in resource "null_resource" "kustomization": │ 290: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_426453076.sh": Process exited with status 1
I am using cax11 for 3 control planes (which seem to be fine).
cax11 also for an agent_nodepool.
When trying CAX21 (4 core and 8GB arm-based):
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-arm-small"], │ on .terraform/modules/kube-hetzner/agents.tf line 103, in resource "null_resource" "agents": │ 103: provisioner "remote-exec" { │ │ timeout - last error: dial tcp 49.13.124.107:22: connect: no route to host ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 290, in resource "null_resource" "kustomization": │ 290: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1642855255.sh": Process exited with status 1
When attempting to apply again with CAX21 I noted this part way through the attempt:
module.kube-hetzner.null_resource.kustomization (remote-exec): E0426 21:40:33.687428 11109 reflector.go:147] k8s.io/client-go@v1.29.4-k3s1/tools/cache/reflector.go:229: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request
And finally:
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-arm-small"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1647253974.sh": │ Process exited with status 124 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 290, in resource "null_resource" "kustomization": │ 290: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1089942153.sh": │ Process exited with status 1
I also have the same issue.
module.kube-hetzner.null_resource.kustomization: Still creating... [6m10s elapsed] module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-arm-small"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_388455073.sh": Process exited with status 124 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 290, in resource "null_resource" "kustomization": │ 290: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_426453076.sh": Process exited with status 1
I am using cax11 for 3 control planes (which seem to be fine).
cax11 also for an agent_nodepool.
When trying CAX21 (4 core and 8GB arm-based):
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-arm-small"], │ on .terraform/modules/kube-hetzner/agents.tf line 103, in resource "null_resource" "agents": │ 103: provisioner "remote-exec" { │ │ timeout - last error: dial tcp 49.13.124.107:22: connect: no route to host ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 290, in resource "null_resource" "kustomization": │ 290: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1642855255.sh": Process exited with status 1
When attempting to apply again with CAX21 I noted this part way through the attempt:
module.kube-hetzner.null_resource.kustomization (remote-exec): E0426 21:40:33.687428 11109 reflector.go:147] k8s.io/client-go@v1.29.4-k3s1/tools/cache/reflector.go:229: Failed to watch *unstructured.Unstructured: the server is currently unable to handle the request
And finally:
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.agents["0-0-agent-arm-small"], │ on .terraform/modules/kube-hetzner/agents.tf line 108, in resource "null_resource" "agents": │ 108: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1647253974.sh": │ Process exited with status 124 ╵ ╷ │ Error: remote-exec provisioner error │ │ with module.kube-hetzner.null_resource.kustomization, │ on .terraform/modules/kube-hetzner/init.tf line 290, in resource "null_resource" "kustomization": │ 290: provisioner "remote-exec" { │ │ error executing "/tmp/terraform_1089942153.sh": │ Process exited with status 1
I see many pending pods:
kube-system coredns-6799fbcd5-xr77h 0/1 Pending 0 44m kube-system hcloud-cloud-controller-manager-64659ff76c-wjfdd 0/1 Pending 0 28m kube-system hcloud-csi-controller-7dd59b5d75-wxbs5 0/5 Pending 0 28m kube-system hcloud-csi-node-5zhkh 3/3 Running 0 28m kube-system hcloud-csi-node-gmt4p 0/3 Pending 0 28m kube-system hcloud-csi-node-mvvt6 3/3 Running 0 28m kube-system hcloud-csi-node-sxkm5 3/3 Running 0 28m kube-system hcloud-csi-node-z7mth 3/3 Running 3 (16m ago) 28m kube-system helm-install-cert-manager-z88bk 0/1 Pending 0 28m kube-system helm-install-traefik-8l6xv 0/1 Completed 0 28m kube-system metrics-server-54fd9b65b-mszxw 0/1 Pending 0 44m system-upgrade system-upgrade-controller-bb8dd57b4-9l8tm 0/1 Pending 0 28m traefik traefik-5d6bdfcf49-dj9z8 0/1 Pending 0 28m
And one control plane node that is not ready (but this is transient):
NAME STATUS ROLES AGE VERSION k3s-agent-arm-small-wrs Ready
30m v1.29.4+k3s1 k3s-agent-small-nwp Ready 46m v1.29.4+k3s1 k3s-control-plane-fsn1-aoa Ready control-plane,etcd,master 47m v1.29.4+k3s1 k3s-control-plane-hel1-sxh NotReady control-plane,etcd,master 45m v1.29.4+k3s1 k3s-control-plane-nbg1-naf Ready control-plane,etcd,master 46m v1.29.4+k3s1
I deleted all resources. Then brought up on the 3 control planes, this worked.
Then upped the count of a CAX21 nodepool from 0 to 1, this worked also.
I deleted all resources. Then brought up on the 3 control planes, this worked.
Then upped the count of a CAX21 nodepool from 0 to 1, this worked also.
When scaling the CAX21 nodepool to 0 and then adding a nodepool of 1 CAX11 node I get a successful apply also.
@dalosa14 Any updates?
I have deleted all files, and retry it, and it worked, I think it's because i used cax21 before so fs is not 80> in cax11, and do not start correctly, but now its all ok.
Feel free to close the issue here on GitHub if you feel your issue is resolved :+1:
Description
Teraform is not deploying correctly
module.kube-hetzner.null_resource.agents["1-0-agent-2"]: Still creating... [1h37m10s elapsed] module.kube-hetzner.null_resource.agents["2-0-agent-3"]: Still creating... [1h37m10s elapsed] module.kube-hetzner.null_resource.agents["0-0-agent-1"]: Still creating... [1h37m20s elapsed] module.kube-hetzner.null_resource.agents["2-0-agent-3"]: Still creating... [1h37m20s elapsed] module.kube-hetzner.null_resource.agents["1-0-agent-2"]: Still creating... [1h37m20s elapsed] module.kube-hetzner.null_resource.agents["0-0-agent-1"]: Still creating... [1h37m30s elapsed] module.kube-hetzner.null_resource.agents["2-0-agent-3"]: Still creating... [1h37m30s elapsed] module.kube-hetzner.null_resource.agents["1-0-agent-2"]: Still creating... [1h37m30s elapsed]
Kube.tf file
Screenshots
No response
Platform
wsl 2