vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.89k stars 141 forks source link

Can't create cluster from management server #298

Closed Chris6077 closed 11 months ago

Chris6077 commented 11 months ago

Hello!

I am trying to create a K3s cluster using a management server in the same virtual network. For this, I have created a Ubuntu 22.04 server (10.1.0.2) and a vnet (vnet-temp). When I try to create the cluster with "hetzner-k3s create --config x.yaml" the script gives me the following errors:

Validating the configuration... Some information in the configuration file requires your attention:

  • Your current IP publicIPv4 must belong to at least one of the networks allowed for SSH
  • Your current IP publicIPv4 must belong to at least one of the permitted API networks

This is my config:

hetzner_token: <removed>
cluster_name: test
kubeconfig_path: "./kubeconfig"
k3s_version: v1.26.10+k3s1
public_ssh_key_path: "/home/test/.ssh/id_ecdsa.pub"
private_ssh_key_path: "/home/test/.ssh/id_ecdsa"
use_ssh_agent: true
ssh_port: 22
ssh_allowed_networks:
  - 10.1.0.0/16
api_allowed_networks:
  - 10.1.0.0/16
private_network_subnet: 10.1.0.0/16
disable_flannel: false
schedule_workloads_on_masters: false
enable_public_net_ipv4: false
enable_public_net_ipv6: false
cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.18.0/ccm-networks.yaml"
csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.5.1/deploy/kubernetes/hcloud-csi.yml"
system_upgrade_controller_manifest_url: "https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml"
masters_pool:
  instance_type: cx21
  instance_count: 3
  location: fsn1
worker_node_pools:
- name: small-static
  instance_type: cx21
  instance_count: 2
  location: fsn1
- name: big-autoscaled
  instance_type: cx41
  instance_count: 2
  location: fsn1
  autoscaling:
    enabled: true
    min_instances: 0
    max_instances: 3
post_create_commands:
- apt update
- apt upgrade -y
- apt autoremove -y
enable_encryption: true
existing_network: vnet-temp

Is there anything wrong with my configuration? When using 0.0.0.0/0 for ssh_allowed_networks and api_allowed_networks, the servers will be created, but the script will get stuck at this part:

Validating configuration......configuration seems valid.

=== Creating infrastructure resources ===
Creating firewall...done.
Creating SSH key...done.
Creating placement group test-masters...done.
Placement group test-small-static-1 already exists, skipping.
Creating server test-cx21-master1...
Creating server test-cx21-pool-small-static-worker2...
Creating server test-cx21-pool-small-static-worker1...
Creating server test-cx21-master2...
Creating server test-cx21-master3...
...server test-cx21-pool-small-static-worker1 created.
...server test-cx21-master2 created.
...server test-cx21-master1 created.
...server test-cx21-pool-small-static-worker2 created.
...server test-cx21-master3 created.
Server test-cx21-master1 already exists, skipping.
Waiting for successful ssh connectivity with server test-cx21-master1...
Server test-cx21-master2 already exists, skipping.
Waiting for successful ssh connectivity with server test-cx21-master2...
Server test-cx21-master3 already exists, skipping.
Waiting for successful ssh connectivity with server test-cx21-master3...
Server test-cx21-pool-small-static-worker1 already exists, skipping.
Waiting for successful ssh connectivity with server test-cx21-pool-small-static-worker1...
Waiting for successful ssh connectivity with server test-cx21-pool-small-static-worker2...
vitobotta commented 11 months ago

Hi, from the management server can you SSH into the nodes manually?

Chris6077 commented 11 months ago

Yes, I can do that by executing sudo ssh -i .ssh/id_ecdsa 10.1.0.[3-7] on the management server.

vitobotta commented 11 months ago

Is there a passphrase on the ssh key?

Chris6077 commented 11 months ago

Yes I set a password when creating the key (ecdsa521) and specified the key pair in the configuration.

Chris6077 commented 11 months ago

Resolved the connection issues when using a password protected key now. Had some problems with the ssh-agent and file permissions. Creating the cluster now works when using the CIDR 0.0.0.0/0 and having IPv4 networks enabled.

However, setting "ssh_allowed_networks" or "api_allowed_networks" to a CIDR in the virtual network instead of using 0.0.0.0/0 still leads to the error message with the public IP as described in the initial message.

vitobotta commented 11 months ago

Resolved the connection issues when using a password protected key now. Had some problems with the ssh-agent and file permissions. Creating the cluster now works when using the CIDR 0.0.0.0/0 and having IPv4 networks enabled.

I was going to ask about the ssh agent next but glad you are making progress.

However, setting "ssh_allowed_networks" or "api_allowed_networks" to a CIDR in the virtual network instead of using 0.0.0.0/0 still leads to the error message with the public IP as described in the initial message.

Uhm the problem here is that it may be checking against your public IP, not the private one. But I can't remember for sure I can't check now. Will take a look in the evening.

Chris6077 commented 11 months ago

Thank you so much for being active and trying to help me :)

I can bypass the public IP check by adding the private network CIDR and my publicIP/32 to both fields. Makes no sense when IPv4 is disabled but it works. The script then fails in the section Deploying Hetzner drivers.

Creating secret for Hetzner Cloud token... The connection to the server localhost:8080 was refused - did you specify the right host or port? Failed to create Hetzner Cloud secret:

vitobotta commented 11 months ago

Can you check the kubeconfig contents? Did you see k3s setup messages in the log?

Chris6077 commented 11 months ago

Here is the full log:

Validating configuration......configuration seems valid.

=== Creating infrastructure resources === Updating firewall...done. SSH key already exists, skipping. Placement group test-masters already exists, skipping. Placement group test-small-static-1 already exists, skipping. Server test-cx21-master1 already exists, skipping. Server test-cx21-master2 already exists, skipping. Server test-cx21-pool-small-static-worker1 already exists, skipping. Server test-cx21-master3 already exists, skipping. Server test-cx21-pool-small-static-worker2 already exists, skipping. Server test-cx21-master1 already exists, skipping. Waiting for successful ssh connectivity with server test-cx21-master1... Server test-cx21-master2 already exists, skipping. Waiting for successful ssh connectivity with server test-cx21-master2... Server test-cx21-master3 already exists, skipping. Waiting for successful ssh connectivity with server test-cx21-master3... Server test-cx21-pool-small-static-worker1 already exists, skipping. Waiting for successful ssh connectivity with server test-cx21-pool-small-static-worker1... Server test-cx21-pool-small-static-worker2 already exists, skipping. Waiting for successful ssh connectivity with server test-cx21-pool-small-static-worker2... ...server test-cx21-master2 is now up. ...server test-cx21-master1 is now up. ...server test-cx21-pool-small-static-worker1 is now up. ...server test-cx21-pool-small-static-worker2 is now up. ...server test-cx21-master3 is now up. Load balancer for API server already exists, skipping.

=== Setting up Kubernetes === Deploying k3s to first master test-cx21-master1... Waiting for the control plane to be ready... Saving the kubeconfig file to /home/test/kubeconfig... ...k3s has been deployed to first master test-cx21-master1 and the control plane is up. Deploying k3s to master test-cx21-master2... Deploying k3s to master test-cx21-master3... ...k3s has been deployed to master test-cx21-master2. ...k3s has been deployed to master test-cx21-master3. Deploying k3s to worker test-cx21-pool-small-static-worker1... Deploying k3s to worker test-cx21-pool-small-static-worker2... ...k3s has been deployed to worker test-cx21-pool-small-static-worker1. ...k3s has been deployed to worker test-cx21-pool-small-static-worker2.

=== Deploying Hetzner drivers ===

Creating secret for Hetzner Cloud token... The connection to the server localhost:8080 was refused - did you specify the right host or port? Failed to create Hetzner Cloud secret:

The kubeconfig file is empty -> -rw------- 1 test test 0 Nov 30 09:19 kubeconfig

vitobotta commented 11 months ago

From the logs I can see that k3s didn't start for some reason. Can you please SSH into the first master, cat /etc/systemd/system/k3s.service (or similar filename) and run the command defined in the service manually? You need to source the .env file first. This allows you to see why k3s is not starting because you will see errors.

Chris6077 commented 11 months ago

There is no k3s service installed on master1

I found the following errors in journalctl:

Nov 30 07:53:54 Ubuntu-2204-jammy-64-minimal dhclient[532]: execve (/bin/true, ...): Permission denied Nov 30 07:53:54 Ubuntu-2204-jammy-64-minimal dhclient[527]: bound to 10.1.0.6 -- renewal in 38138 seconds.

Nov 30 07:55:55 test-cx21-master1 systemd-networkd-wait-online[553]: Timeout occurred while waiting for network connectivity. Nov 30 07:55:55 test-cx21-master1 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE Nov 30 07:55:55 test-cx21-master1 systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'. Nov 30 07:55:55 test-cx21-master1 systemd[1]: Failed to start Wait for Network to be Configured. Nov 30 07:55:55 test-cx21-master1 systemd[1]: Starting Initial cloud-init job (metadata service crawler)... Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: Cloud-init v. 23.3.1-0ubuntu1~22.04.1 running 'init' at Thu, 30 Nov 2023 07:55:56 +0000. Up 130.47 seconds. Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: | ens10 | False | . | . | . | 86:00:00:6a:21:d9 | Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: | lo | True | ::1/128 | . | host | . | Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +-------+-------------+---------+-----------+-------+ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: | Route | Destination | Gateway | Interface | Flags | Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +-------+-------------+---------+-----------+-------+ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: ci-info: +-------+-------------+---------+-----------+-------+ Nov 30 07:55:56 test-cx21-master1 cloud-init[559]: 2023-11-30 07:55:56,526 - schema.py[WARNING]: Invalid cloud-config provided: Please run 'sudo cloud-init schema --system' to see the schema errors.

Nov 30 07:56:15 test-cx21-master1 systemd[1]: cloud-final.service: Main process exited, code=exited, status=1/FAILURE Nov 30 07:56:15 test-cx21-master1 systemd[1]: cloud-final.service: Failed with result 'exit-code'. Nov 30 07:56:15 test-cx21-master1 systemd[1]: Failed to start Execute cloud user/final scripts. Nov 30 07:56:15 test-cx21-master1 systemd[1]: cloud-final.service: Consumed 1.380s CPU time. Nov 30 07:56:15 test-cx21-master1 systemd[1]: Reached target Cloud-init target. Nov 30 07:56:15 test-cx21-master1 audit[1414]: AVC apparmor="DENIED" operation="capable" profile="/{,usr/}sbin/dhclient" pid=1414 comm="dhclient" capability=16 capname="sys_module" Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: Error getting hardware address for "eth1": No such device Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: If you think you have received this message due to a bug rather Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: than a configuration issue please read the section on submitting Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: bugs on either our web page at www.isc.org or in the README file Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: before submitting a bug. These pages explain the proper Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: process and the information we find helpful for debugging. Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: Nov 30 07:56:15 test-cx21-master1 dhclient[1414]: exiting. Nov 30 07:56:15 test-cx21-master1 kernel: kauditd_printk_skb: 3 callbacks suppressed Nov 30 07:56:15 test-cx21-master1 kernel: audit: type=1400 audit(1701330975.720:15): apparmor="DENIED" operation="capable" profile="/{,usr/}sbin/dhclient" pid=1414 comm="dhclient" capability=16 capname="sys_module"

You can find the full log here. The password is vitobotta

vitobotta commented 11 months ago

Uhm problems with the network? Are the servers attached to the private network? Which interfaces do you see?

Chris6077 commented 11 months ago

Everything seemed fine but a ping to github.com did not work. I guess they still don't support IPv6. Did setting up a cluster with "enable_public_net_ipv4: false" work for you?

vitobotta commented 11 months ago

That was contributed with a PR and when I tested it I didn't have problems, but perhaps I didn't test all scenarios without IPs