Error making request: GET https://5.161.192.216:6443/version giving up after 61 attempt(s)

lpellegr commented 3 days ago

Using version 0.6.3, cluster deployment gives:

module.kubernetes.hcloud_uploaded_certificate.state: Creation complete after 0s [id=1362738] module.kubernetes.data.http.kube_api_health[0]: Still reading... [10s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [20s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [30s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [40s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [50s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [1m0s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [1m10s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [1m20s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [1m30s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [1m40s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [1m50s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [2m0s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [2m10s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [2m20s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [2m30s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [2m40s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [2m50s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [3m0s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [3m10s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [3m20s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [3m30s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [3m40s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [3m50s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [4m0s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [4m10s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [4m20s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [4m30s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [4m40s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [4m50s elapsed] module.kubernetes.data.http.kube_api_health[0]: Still reading... [5m0s elapsed] ╷ │ Error: Error making request │ │ with module.kubernetes.data.http.kube_api_health[0], │ on .terraform/modules/kubernetes/talos.tf line 291, in data "http" "kube_api_health": │ 291: data "http" "kube_api_health" { │ │ Error making request: GET https://5.161.192.216:6443/version giving up after 61 attempt(s): Get "https://5.161.192.216:6443/version": dial tcp 5.161.192.216:6443: connect: connection │ refused

Here is the configuration file used:

  module "kubernetes" {
    source  = "hcloud-k8s/kubernetes/hcloud"
    version = "0.6.3"

    cluster_name = "my-cluster"
    hcloud_token = "xxx"

    cluster_delete_protection = false
    # Export configs for Talos and Kube API access
    cluster_kubeconfig_path  = "kubeconfig"
    cluster_talosconfig_path = "talosconfig"

    # Optional Ingress Controller and Cert Manager
    cert_manager_enabled  = false
    ingress_nginx_enabled = false

    control_plane_nodepools = [
      { name = "control", type = "cpx11", location = "ash", count = 3 }
    ]

    worker_nodepools = [
      { name = "worker", type = "ccx13", location = "ash", count = 1 }
    ]

  }

lpellegr commented 3 days ago

I am getting the same error with version 0.3.0 which was working a few weeks ago.

The firewall rules seems OK for my IP address. Not sure what the issue is.

M4t7e commented 1 day ago

Using node types with only 2 GB of RAM for Control Plane nodes is quite risky. A blank CP node alone typically requires 1.5 to 2 GB of memory just for the OS and Kubernetes system daemons. Blank Worker nodes typically need less resources between 0.5 and 1 GB. If you have the metrics server installed (it is by default), you can verify memory usage using kubectl top nodes. While I can't say for certain this is causing your issue, it’s worth investigating.

Have you checked the cluster status and ensured etcd is healthy? You can run talosctl health and refer to the troubleshooting docs for guidance.

Additionally, you can review the Talos dashboard through the Hetzner Cloud Console. Go to your Hetzner Cloud project, open the "Console" for your control plane nodes, and check the node status. A healthy node should look like this:

If your cluster appears healthy and there are no obvious issues, it’s worth considering potential network-related problems.

lpellegr commented 13 hours ago

Thank you so much for your assistance! As you pointed out, the issue was indeed caused by insufficient memory. After upgrading to a higher VM capacity with 4GB of memory for the control planes, everything started working seamlessly.

hcloud-k8s / terraform-hcloud-kubernetes

Error making request: GET https://5.161.192.216:6443/version giving up after 61 attempt(s) #27