karmab / kcli

Management tool for virtualization and kubernetes platforms
https://kcli.readthedocs.io/en/latest/
Apache License 2.0
496 stars 137 forks source link

join.sh for single control-plane node K3s clusters have the same issue as #704 #705

Closed larssb closed 1 month ago

larssb commented 1 month ago

See:

{% set extra_args =  extra_worker_args or extra_args %}

{% if api_ip == None %}
{% set api_ip = '{0}-ctlplane-1'.format(cluster)|kcli_info('ip') if scale|default(False) and 'ctlplane-0' in name else first_ip %}
{% endif %}

curl -sfL https://get.k3s.io | K3S_URL=https://{{ api_ip }}:6443 K3S_TOKEN={{ token }} {{ install_k3s_args|default([])|join(' ') }} sh -s - agent {{ extra_args|join(' ') }}

Where first_ip because of the same issue/s as in #704 will have no value. Therefore one will see:

Aug 09 20:13:25 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:13:25Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:47826->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:13:42 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:13:42Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:46856->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:13:59 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:13:59Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:39504->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:14:16 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:14:16Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:41860->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:14:33 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:14:33Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:47722->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:14:50 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:14:50Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:51670->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:15:07 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:15:07Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:40270->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:15:24 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:15:24Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:38470->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:15:41 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:15:41Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:45794->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:15:58 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:15:58Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:39838->127.0.0.1:6444: read: connection reset by peer"
Aug 09 20:16:15 minio-prod-worker-0 k3s[3329]: time="2024-08-09T20:16:15Z" level=error msg="failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:55696->127.0.0.1:6444: read: connection reset by peer"

thrown by the k3s-agent ( worker ) service. Can be seen by executing: journalctl -u k3s-agent.service -f

And workers will never be able to join a single control-plane node K3s cluster.

karmab commented 1 month ago

fixed in https://github.com/karmab/kcli/commit/ff1d2fda24258261d8872b57e3ccee7ab590a331 to revert to the behaviour prior to https://github.com/karmab/kcli/commit/cf3b0f89716498f8b95b1494eba5e8549cac229f :)

larssb commented 1 month ago

Aaaah wait. The changes in https://github.com/karmab/kcli/commit/cf3b0f89716498f8b95b1494eba5e8549cac229f where necessary for other reasons!

I don't think we should revert this. Now this:

# The logic below is to achieve the following
# - for cloud providers. If the API is internal and
# this is a HA cluster. Use the IP of the API load-balancer
# - for cloud providers. If the API is NOT internal.
# use the external IP. This in both the HA and single ctlplane case
# - The last to branches is for on-prem. E.g. vSphere/VMWare
# in HA or single ctlplane scenarios.

Won't work anymore. Sad ;-(

larssb commented 1 month ago

I alleviated this for join.sh by declaring the ctlplane-0 nodes ip as the api_ip. I guess that could work. However, as I understand the use of api_ip in KCLI it's mostly to indicate the need for what IP to use as the floating IP/VIP to front the control-plane nodes in a HA setup.