vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.83k stars 139 forks source link

Autoscaled nodes not joining cluster #466

Open enter-marlah opened 3 days ago

enter-marlah commented 3 days ago

Hello!

We are running Hetzner-k3s version 2.0.8 with the following worker pool config:

worker_node_pools:
- name: med-static
  instance_type: cpx31
  instance_count: 3
  location: hel1
  autoscaling:
    enabled: true
    min_instances: 0
    max_instances: 6

The nodes are created in Hetzner after autoscaling is initiated by stressing the cluster but they are not joining the cluster after that. We can ssh into the machines but they don't have for example ssh keys set or anything related to k3s installed. For static nodes the ssh keys are set correctly.

We think this has something to do with the previous cloud init wait problem in issue https://github.com/vitobotta/hetzner-k3s/issues/379

If we read the code correctly the cloud_init_wait.sh script is not called when creating a autoscaled node?

We are running a private network only cluster. Regarding to this PR https://github.com/vitobotta/hetzner-k3s/pull/458 our cloud init takes several minutes with both static and autoscaled nodes.

vitobotta commented 3 days ago

Hi, can you share your full config file (minus the token)?

enter-marlah commented 3 days ago
---
cluster_name: kube-prod
kubeconfig_path: "./kubeconfig"
k3s_version: v1.29.3+k3s1
include_instance_type_in_instance_name: true

networking:
  ssh:
    port: 22
    use_agent: false
    public_key_path: "./id_rsa_hetzner_prod.pub"
    private_key_path: "./id_rsa_hetzner_prod"
  allowed_networks:
    ssh:
      - 0.0.0.0/0
    api:
      - 0.0.0.0/0
  public_network:
    ipv4: false
    ipv6: false
  private_network:
    enabled : true
    subnet: 10.0.0.0/16
    existing_network_name: "KubeNet"
  cni:
    enabled: true
    encryption: false
    mode: flannel

manifests:
  cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.20.0/ccm-networks.yaml"
  csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.8.0/deploy/kubernetes/hcloud-csi.yml"
  system_upgrade_controller_deployment_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml"
  system_upgrade_controller_crd_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml"
  cluster_autoscaler_manifest_url: "https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/hetzner/examples/cluster-autoscaler-run-on-master.yaml"

datastore:
  mode: etcd
  external_datastore_endpoint: postgres://....

schedule_workloads_on_masters: false

image: ubuntu-22.04

masters_pool:
  instance_type: cpx11
  instance_count: 3
  location: hel1

worker_node_pools:
- name: med-static
  instance_type: cpx31
  instance_count: 3
  location: hel1
  autoscaling:
    enabled: true
    min_instances: 0
    max_instances: 6

embedded_registry_mirror:
  enabled: true

post_create_commands:
 - echo 'network:\n  version:\ 2\n  ethernets:\n    enp7s0:\n      critical:\ true\n      nameservers:\n        addresses:\ [10.0.0.2]\n      routes:\n      - on-link:\ true\n        to:\ 0.0.0.0/0\n        via:\ 10.0.0.1' > /etc/netplan/50-cloud-init.yaml
 - sed -i 's/\\//g' /etc/netplan/50-cloud-init.yaml
 - sed -i 's/^nameserver.*/nameserver 10.0.0.2/' /etc/resolv.conf
 - netplan apply
 - apt update
 - apt upgrade -y
 - apt autoremove -y