vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.86k stars 140 forks source link

2.0.0: master node doesn't join private network #410

Closed Funzinator closed 2 months ago

Funzinator commented 2 months ago

Extracted from https://github.com/vitobotta/hetzner-k3s/discussions/385#discussioncomment-10168998


Scenario: existing private network only, no public IP addresses:

networking:
  public_network:
    ipv4: false
    ipv6: false
  private_network:
    enabled: true
    existing_network_name: seiyuu
    subnet: 10.63.0.0/16

I had to manually assign the master server to the existing network (while the tool, version 2.0.0rc2, waited for instance to be up in a loop). It then completed.

My config:

cluster_name: seiyuu
k3s_version: v1.30.2+k3s2

networking:
  ssh:
    public_key_path: "/root/.ssh/id_rsa.pub"
    private_key_path: "/root/.ssh/id_rsa"
    use_agent: false
  allowed_networks:
    ssh:
      - 0.0.0.0/0
      - ::/0
    api:
      - 0.0.0.0/0
      - ::/0
  private_network:
    enabled: true
    existing_network_name: seiyuu
    subnet: 10.63.0.0/16
  public_network:
    ipv4: false
    ipv6: false
  cni:
    enabled: true
    encryption: true
  cluster_cidr: 10.244.0.0/16
  service_cidr: 10.43.0.0/16

include_instance_type_in_instance_name: true

schedule_workloads_on_masters: false

manifests:
  cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.20.0/ccm-networks.yaml"
  csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.8.0/deploy/kubernetes/hcloud-csi.yml"
  system_upgrade_controller_manifest_url: "https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml"

image: debian-12
autoscaling_image: debian-12

masters_pool:
  instance_type: cax11
  instance_count: 1
  location: fsn1

worker_node_pools:
- name: cax11-autoscale-1
  instance_type: cax11
  instance_count: 1
  location: fsn1
  autoscaling:
    enabled: true
    min_instances: 0
    max_instances: 3

post_create_commands:
  - timedatectl set-timezone Europe/Berlin
  - ip route add default via 10.63.0.1
  - ip route add 169.254.0.0/16 via 172.31.1.1
  - rm -f /etc/resolv.conf
  - echo 'nameserver 185.12.64.1' >> /etc/resolv.conf
  - echo 'nameserver 185.12.64.2' >> /etc/resolv.conf
  - echo 'edns edns0 trust-ad'    >> /etc/resolv.conf
  - echo 'search .'               >> /etc/resolv.conf
  - mkdir -p /etc/network/interfaces.d
  - echo "auto enp7s0"                                              > /etc/network/interfaces.d/enp7s0
  - echo "iface enp7s0 inet dhcp"                                  >> /etc/network/interfaces.d/enp7s0
  - echo "    post-up ip route add default via 10.63.0.1"          >> /etc/network/interfaces.d/enp7s0
  - echo "    post-up ip route add 169.254.169.254 via 172.31.1.1" >> /etc/network/interfaces.d/enp7s0
  - apt update
  - apt upgrade -y
  - apt autoremove -y
  - apt install -y apparmor apparmor-utils

and some log output:

[Configuration] Validating configuration...
[Configuration] ...configuration seems valid.
[SSH key] SSH key already exists, skipping create
[Placement groups] Deleting unused placement group seiyuu-cax11-autoscale-1-2...
[Placement groups] ...placement group seiyuu-cax11-autoscale-1-2 deleted
[Placement groups] Deleting unused placement group seiyuu-masters...
[Placement groups] ...placement group seiyuu-masters deleted
[Placement groups] Creating placement group seiyuu-masters...
[Placement groups] ...placement group seiyuu-masters created
[Instance seiyuu-cax11-master1] Creating instance seiyuu-cax11-master1 (attempt 1)...
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 1)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 2)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance seiyuu-cax11-master1 already exists, skipping create
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 1)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 2)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 3)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 4)
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 3)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: off
[Instance seiyuu-cax11-master1] Powering on instance (attempt 5)
[Instance seiyuu-cax11-master1] Instance status: off

and so on, assigning network now manually, until...

[Instance seiyuu-cax11-master1] Powering on instance (attempt 17)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: running
[Instance seiyuu-cax11-master1] Waiting for successful ssh connectivity with instance seiyuu-cax11-master1...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 18)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 19)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 20)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 21)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 22)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 23)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 24)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 25)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 26)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: starting
[Instance seiyuu-cax11-master1] Powering on instance (attempt 27)
[Instance seiyuu-cax11-master1] Waiting for instance to be powered on...
[Instance seiyuu-cax11-master1] Instance status: running
[Instance seiyuu-cax11-master1] Waiting for successful ssh connectivity with instance seiyuu-cax11-master1...
[Instance seiyuu-cax11-master1] ...instance seiyuu-cax11-master1 is now up.
[Firewall] Updating firewall...
[Firewall] ...firewall updated
[Instance seiyuu-cax11-master1] [INFO]  Using v1.30.2+k3s2 as release
[Instance seiyuu-cax11-master1] [INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.30.2+k3s2/sha256sum-arm64.txt
[Instance seiyuu-cax11-master1] [INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.30.2+k3s2/k3s-arm64
[Instance seiyuu-cax11-master1] [INFO]  Verifying binary download
[Instance seiyuu-cax11-master1] [INFO]  Installing k3s to /usr/local/bin/k3s
[Instance seiyuu-cax11-master1] [INFO]  Skipping installation of SELinux RPM
[Instance seiyuu-cax11-master1] [INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[Instance seiyuu-cax11-master1] [INFO]  Creating /usr/local/bin/crictl symlink to k3s
[Instance seiyuu-cax11-master1] [INFO]  Creating /usr/local/bin/ctr symlink to k3s
[Instance seiyuu-cax11-master1] [INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[Instance seiyuu-cax11-master1] [INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[Instance seiyuu-cax11-master1] [INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[Instance seiyuu-cax11-master1] [INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[Instance seiyuu-cax11-master1] [INFO]  systemd: Enabling k3s unit
[Instance seiyuu-cax11-master1] [INFO]  systemd: Starting k3s
[Instance seiyuu-cax11-master1] Waiting for the control plane to be ready...
[Control plane] Saving the kubeconfig file to /root/kubeconfig...
[Instance seiyuu-cax11-master1] ...k3s deployed
[Hetzner Cloud Secret] Creating secret for Hetzner Cloud token...
[Hetzner Cloud Secret] secret/hcloud created
[Hetzner Cloud Secret] ...secret created
[Hetzner Cloud Controller] Installing Hetzner Cloud Controller Manager...
[Hetzner Cloud Controller] serviceaccount/hcloud-cloud-controller-manager created
[Hetzner Cloud Controller] clusterrolebinding.rbac.authorization.k8s.io/system:hcloud-cloud-controller-manager created
[Hetzner Cloud Controller] deployment.apps/hcloud-cloud-controller-manager created
[Hetzner Cloud Controller] Hetzner Cloud Controller Manager installed
[Hetzner CSI Driver] Installing Hetzner CSI Driver...
[Hetzner CSI Driver] serviceaccount/hcloud-csi-controller created
[Hetzner CSI Driver] storageclass.storage.k8s.io/hcloud-volumes created
[Hetzner CSI Driver] clusterrole.rbac.authorization.k8s.io/hcloud-csi-controller created
[Hetzner CSI Driver] clusterrolebinding.rbac.authorization.k8s.io/hcloud-csi-controller created
[Hetzner CSI Driver] service/hcloud-csi-controller-metrics created
[Hetzner CSI Driver] service/hcloud-csi-node-metrics created
[Hetzner CSI Driver] daemonset.apps/hcloud-csi-node created
[Hetzner CSI Driver] deployment.apps/hcloud-csi-controller created
[Hetzner CSI Driver] csidriver.storage.k8s.io/csi.hetzner.cloud created
[Hetzner CSI Driver] Hetzner CSI Driver installed
[System Upgrade Controller] Installing System Upgrade Controller...
[System Upgrade Controller] namespace/system-upgrade created
[System Upgrade Controller] customresourcedefinition.apiextensions.k8s.io/plans.upgrade.cattle.io created
[System Upgrade Controller] clusterrole.rbac.authorization.k8s.io/system-upgrade-controller created
[System Upgrade Controller] role.rbac.authorization.k8s.io/system-upgrade-controller created
[System Upgrade Controller] clusterrole.rbac.authorization.k8s.io/system-upgrade-controller-drainer created
[System Upgrade Controller] clusterrolebinding.rbac.authorization.k8s.io/system-upgrade-drainer created
[System Upgrade Controller] clusterrolebinding.rbac.authorization.k8s.io/system-upgrade created
[System Upgrade Controller] rolebinding.rbac.authorization.k8s.io/system-upgrade created
[System Upgrade Controller] namespace/system-upgrade configured
[System Upgrade Controller] serviceaccount/system-upgrade created
[System Upgrade Controller] configmap/default-controller-env created
[System Upgrade Controller] deployment.apps/system-upgrade-controller created
[System Upgrade Controller] ...System Upgrade Controller installed
[Cluster Autoscaler] Installing Cluster Autoscaler...
[Cluster Autoscaler] serviceaccount/cluster-autoscaler created
[Cluster Autoscaler] clusterrole.rbac.authorization.k8s.io/cluster-autoscaler created
[Cluster Autoscaler] role.rbac.authorization.k8s.io/cluster-autoscaler created
[Cluster Autoscaler] clusterrolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
[Cluster Autoscaler] rolebinding.rbac.authorization.k8s.io/cluster-autoscaler created
[Cluster Autoscaler] deployment.apps/cluster-autoscaler created
[Cluster Autoscaler] ...Cluster Autoscaler installed
vitobotta commented 2 months ago

It seems the problem was that in v2 I changed the logic to create an instance so that I power it on and attach it to the private network in separate steps with a little delay in between. This was done to avoid some issues that can occur when creating a large cluster from the get go with a private network or when adding many notes at once and a private network is in use.

In your case, you had disabled the public IPs, so when the instance was created was left without IPs and therefore everything got stuck. Since it's unlikely that someone might create large clusters from the get go (I only did this in my tests for benchmarks) I have reverted the change mentioned above, so the instance gets attached to the network automatically again and always has an IP (private only in this case) and the setup can continue.

I have pushed these changes and it's currently building rc4. Monitor https://github.com/vitobotta/hetzner-k3s/actions/runs/10356376895 and try with the binary as soon as the one for your OS is ready. Please let me know how it goes :)

vitobotta commented 2 months ago

I released v2.0.0 with several more fixes and improvements. Can you please see if you are still having problems even with the GA version? Thanks!

vitobotta commented 2 months ago

Closing. Please open another issue for v2 if you are still having problems also with the new version.