xetys / hetzner-kube

A CLI tool for provisioning kubernetes clusters on Hetzner Cloud
Apache License 2.0
746 stars 116 forks source link

adding workers fails on Wireguard configuration #329

Open abarthol opened 4 years ago

abarthol commented 4 years ago

When adding a worker the install process stops at Wireguard configuration.

command:systemctl enable wg-quick@wg0 && systemctl restart wg-quick@wg0 && systemctl enable overlay-route.service && systemctl restart overlay-route.service stdout:Created symlink /etc/systemd/system/multi-user.target.wants/wg-quick@wg0.service → /lib/systemd/system/wg-quick@.service. Job for wg-quick@wg0.service failed because the control process exited with error code. See "systemctl status wg-quick@wg0.service" and "journalctl -xe" for details.

This seems to be related to: https://github.com/adrianmihalko/raspberrypiwireguard/issues/11

drallgood commented 4 years ago

yep. Ran into the same problem.

Ubuntu borked the wireguard module

Solution is apparently to install hwe (-> newer Kernel): https://wiki.ubuntu.com/Kernel/LTSEnablementStack

It works, but... don't know how to do that for a node

drallgood commented 4 years ago

So.. easiest fix: patch hetzner-kube to use Ubuntu 20.04LTS I actually upgraded a few of my existing nodes as well that broke (actually the reason why I wanted to recreate one in the first place)

abarthol commented 4 years ago

I've tried to use Ubuntu 20.04LTS but I get another error:

command:for i in ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack_ipv4; do modprobe $i; done && kubeadm reset -f && kubeadm join 10.0.1.1:6443 --token q5nor9.i7r02nwpwgl1cimm     --discovery-token-ca-cert-hash sha256:4e2d467803467a8aab9c484fa24b0c45db0c865a68c528fdd985b56879afa6c9 

stdout:modprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/5.4.0-28-generic
drallgood commented 4 years ago

Strange.... I just installed multiple new nodes and it worked just fine... Anyhow you need the corresponding kernel module

tmemenga commented 4 years ago

@abarthol have a look at https://github.com/tmemenga/hetzner-kube/tree/ubuntu-20-04, i was able to get past that error. But i still need to check if the cluster is really operational.

xetys commented 4 years ago

If this works, I'd love to see a PR if you don't mind

abarthol commented 4 years ago

Thanks @tmemenga. Your branch works for creating a new cluster. Please make a pull request to add this to the main project. Although I have not tested to add a new node to an existing (Ubuntu 16.04 LTS or 18.04 LTS) cluster.

abarthol commented 4 years ago

After successful cluster setup with Ubuntu 20.04 LTS I recognized a problem with canal. The pods did not startup correctly. The error message was:

[FATAL][581] int_dataplane.go 1032: Kernel's RPF check is set to 'loose' ...

I hat to set this to make it work:

kubectl -n kube-system set env daemonset/canal FELIX_IGNORELOOSERPF=true

drallgood commented 4 years ago

Thanks @tmemenga. Your branch works for creating a new cluster. Please make a pull request to add this to the main project. Although I have not tested to add a new node to an existing (Ubuntu 16.04 LTS or 18.04 LTS) cluster.

I'm running a mixed cluster right now, without any issues (control plane is 18.04 and 2 out of 6 nodes are 20.04)

eugenpro commented 4 years ago

Is it possible to manually add node to the cluster, that has been created with hetzner-kube utility now?

tmemenga commented 4 years ago

i also had to issue a

kubectl -n kube-system set env daemonset/canal FELIX_IGNORELOOSERPF=true

to stop canal from contstantly restarting.

But it seems this is something you should not do on systems other than DEV ?

https://alexbrand.dev/post/creating-a-kind-cluster-with-calico-networking/

Relax Calico's RPF Check Configuration
By default, Calico pods fail if the Kernel's Reverse Path Filtering (RPF) check is not enforced. This is a security measure to prevent endpoints from spoofing their IP address.

The RPF check is not enforced in Kind nodes. Thus, we need to disable the Calico check by setting an environment variable in the calico-node DaemonSet:

kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true
Note: I am disabling this check because this is a dev environment. You probably do not want to do this otherwise.
max-software-net commented 4 years ago

After successful cluster setup with Ubuntu 20.04 LTS I recognized a problem with canal. The pods did not startup correctly. The error message was:

[FATAL][581] int_dataplane.go 1032: Kernel's RPF check is set to 'loose' ...

I get it working by changing /etc/sysctl.d/10-network-security.conf as follow:

net.ipv4.conf.default.rp_filter=1 net.ipv4.conf.all.rp_filter=1

krzko commented 4 years ago

Looks like wireguard is borked in the 18.04 distro. Here's a cloud-init script that should bootstrap your cluster successfully.

my-k8s-cluster-cloud-init

#cloud-config

package_update: true

runcmd:
 - add-apt-repository ppa:wireguard/wireguard
 - apt-get update
 - apt-get install -y --install-recommends linux-generic-hwe-18.04
 - apt-get install -y wireguard wireguard-dkms wireguard-tools
 - modprobe wireguard
 - lsmod | grep wireguard

Can be invoked via;

hetzner-kube cluster create --name my-k8s-cluster --ssh-key my-ssh-key --cloud-init ./my-k8s-cluster-cloud-init

ulfw commented 4 years ago

No, sorry, that cloud-init is not a working fix

Antauri commented 4 years ago

I got it working with:


users:
  - name: your-sudo-user
    groups: users, admin
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    ssh_authorized_keys:
      - YOUR_KEY_HERE

package_update: true
package_upgrade: true      

packages:
  - your
  - list
  - of
  - packages

runcmd:
 - add-apt-repository ppa:wireguard/wireguard
 - apt-get update
 - apt-get install -y --install-recommends linux-generic-hwe-18.04
 - apt-get install -y wireguard wireguard-dkms wireguard-tools
 - modprobe wireguard
 - lsmod | grep wireguard
 - reboot```

And command:

hetzner-kube cluster create --name kubernetes -k YOUR-SSH-KEY --master-server-type cx21 -m 3 --worker-server-type cx21 --node-cidr a.b.c.d/16 -w 5 --ha-enabled --cloud-init /path/to/cloud-init.yml

mashkovd commented 3 years ago
hetzner-master-01    : installing transport tools         11.5% [--------------]
hetzner-worker-01    : prepare packages                   23.5% [=>------------]
run failed
command:add-apt-repository ppa:wireguard/wireguard -y
stdout:Cannot add PPA: 'ppa:~wireguard/ubuntu/wireguard'.
The team named '~wireguard' has no PPA named 'ubuntu/wireguard'
Please choose from the following available PPAs:

this command didn't work - hetzner-kube cluster create --name hetzner --ssh-key mctl --cloud-init ./my-k8s-cluster-cloud-init (my-k8s-cluster-cloud-init above)

eugene-chernyshenko commented 3 years ago

I fixed this issue in #339