kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
1.12k stars 97 forks source link

:book: k3s node name is different from hostname #576

Closed Ognian closed 9 months ago

Ognian commented 1 year ago

Kairos version:

NAME="kairos-opensuse-arm-rpi" VERSION="v1.3.2-k3sv1.25.4+k3s1" ID="kairos" ID_LIKE="kairos-opensuse-arm-rpi" VERSION_ID="v1.3.2-k3sv1.25.4+k3s1" PRETTY_NAME="kairos-opensuse-arm-rpi v1.3.2-k3sv1.25.4+k3s1" ANSI_COLOR="0;32" BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues/new/choose" HOME_URL="https://github.com/kairos-io/provider-kairos" IMAGE_REPO="quay.io/kairos/kairos-opensuse-arm-rpi" IMAGE_LABEL="latest" GITHUB_REPO="kairos-io/provider-kairos" VARIANT="core" FLAVOR="opensuse"

CPU architecture, OS, and Version:

Linux rpi4node 5.14.21-150400.24.33-default #1 SMP PREEMPT_DYNAMIC Fri Nov 4 13:55:06 UTC 2022 (76cfe60) aarch64 aarch64 aarch64 GNU/Linux

Describe the bug

k3s node name should be the same like the host name host name is: rpi4node node name is: rpi4node-ec2407ca

To Reproduce

Expected behavior

Logs

Additional context

jimmykarily commented 1 year ago

We don't seem to pass any default value for --node-name to k3s. As far as I can tell k3s default behavior is to use the hostname to construct a node name with some suffix: here, I'm not sure if the suffix will be skipped altogether if we set --node-name, we need to try that out.

I'm wondering if nodeID is important in this case and if we should keep it. Let's see what --node-name flag does first and then we decide.

mudler commented 1 year ago

What we do by default is: https://github.com/kairos-io/provider-kairos/blob/0d636d2b2cd424eee5961b6fc14c4842ef74073d/internal/role/p2p/worker.go#L71 which causes this - it is by default in order to avoid collisions if the same hostname is given to multiple nodes.

To override the default logic you can set replace_args: true.

jimmykarily commented 1 year ago

@Ognian the default behavior targets less experienced users that shouldn't have to configure too many things. For the experienced ones, there is replace_args: true that requires them the configure k3s manually.

We will document the use of that flag. Are you ok with this plan?

Ognian commented 1 year ago

@jimmykarily yes, sounds reasonable

Ognian commented 1 year ago

@jimmykarily after power cut off I was not able to get the worker node up and running in the cluster. There is a chance that this was due to generating a different node name, after getting up again, maybe something in the state got corrupted … #1227 was also the result of this power cut off… Could you tell me how to configure a fixed node name?

jimmykarily commented 1 year ago

@Ognian with this config:

#cloud-config

users:
- name: kairos
  passwd: kairos
  ssh_authorized_keys:
  - github:jimmykarily

k3s:
  enabled: true
  replace_args: true
  args:
  - --node-name=my-node

I got this node:

localhost:/home/kairos # kubectl  get nodes
NAME      STATUS   ROLES                  AGE   VERSION
my-node   Ready    control-plane,master   53s   v1.21.14+k3s1

I used this image: https://github.com/kairos-io/provider-kairos/releases/download/v1.6.1/kairos-opensuse-leap-v1.6.1-k3sv1.21.14+k3s1.iso

Is this what you need? I'll see if I can put it somewhere in the docs as a hint.

jimmykarily commented 1 year ago

PR: https://github.com/kairos-io/kairos/pull/1244

Ognian commented 1 year ago

@jimmykarily in my case the raspberry pi is the worker not the master. So I have to use it with k3s_agent not k3s. Does this work the same way? replace_args: true suggests that I have to provide all args, not only the ones I would like to override, so which are the other ones I have to provide? An agent needs the token and the server values, where do they come from?

jimmykarily commented 1 year ago

--node-name is a flag passed to the agent: https://docs.k3s.io/cli/agent#node

From the same page:

Note that servers also run an agent, so all flags listed on this page are also valid for use on servers.

I think it will work similarly for all your nodes.

For the rest of the flags, what we do in code seems to complex to achieve manually. I've put the documentation for the "hardcoded" node name in the single-node setup for this reason. I don't think this manual solution works well in multi-node setups where they have to discover each other etc. (thus having to figure out multiple k3s args)

If we want to support setting node names manually on multi-node clusters, I think we need to expose a special configuration option for that. cc @mudler

mudler commented 1 year ago

maybe we could add some templating sugar, as we have already for hostnames and other fields 🤔

jimmykarily commented 1 year ago

Closed by the merged PR automatically. Re-opening until we decide how to move forward.

Ognian commented 10 months ago

OK I finally understood what was really going on: The Problem is not that the node name is not set, the problem ist that the k3s parameter --with-node-id is set hard coded into the kairos-agent code https://github.com/kairos-io/provider-kairos/blob/95bc4b4c37253ecd4a50246064f4e06a027556c1/internal/role/p2p/worker.go#L72.

--with-node-id changes the node name to the name AND a generated random hash.

We want the other needed k3s parameters to be set.

Actually the only thing which should be documented is the fact that it works exactly like I described and that it is intentional.

Ognian commented 9 months ago

I don't understand the documentation: There is no need to set --with-node-id since it is ALWAYS set regardless of the replace_args parameter ; And like stated in the k3s docs it just appends a random id to the node name

image

or are you planning to remove the --with-node-id when replace_args is set to true?

jimmykarily commented 9 months ago

When you specify k3s args, you are essentially replacing the default one. In other words, if you want to specify one, you are switching to "manual" mode which means you need to pass all the needed args including --with-node-id which was passed automatically otherwise. I think that's what this line means as well: https://github.com/kairos-io/kairos-docs/pull/128/files#diff-4ae3693f5ecdf0f5f08084df28f4dde27bff0108c7ad93224c1e5662b87c2d55R66

@mauromorales am I correct?

Ognian commented 9 months ago

just to clarify: this line is always appending --with-node-id regardless of any conditions and this is what I found out by testing various combinations of replace_envand replace_argsflag....

mauromorales commented 9 months ago

As far as I can tell, this case would override them:

https://github.com/kairos-io/provider-kairos/blob/40e86664507a6b9841f5aa3ac0d10d5eb0abc3a0/internal/role/p2p/worker.go#L102

Ognian commented 9 months ago

Ahh, OK my fault. But I'm using P2P so if I set replace_args: true than I would have to manually provide the P2P values. From the code I tried to understand what parameters I have to set up manually (is the following correct ?): Setting the --flannel-iface=edgevpn0 parameter is clear, since default is to use the VPN and therefore set this. To provide --node-ip %s I need to know the ip. If using KubeVip this is the ip of the Kubevip Interface else the ip of the first non local interface. But how to set a non fixed ip (i.e. dhcp) in the config file?

mauromorales commented 9 months ago

@mudler @jimmykarily do you know which parameters would need to be set?

jimmykarily commented 9 months ago

@mudler @jimmykarily do you know which parameters would need to be set?

No, I'd have to follow the code and collect them all :( . Maybe Ettore's suggestion with the templating (see previous comments in this issue) is the best way to solve this. This needs to be coded of course. As a workaround for now, a fast way to collect them all would be to let the agent spin up the k3s agent with the current config and then check the running process to see what flags were passed. Then change the config to pass the same args with replace_args set to true and node name set to a hardcoded value and --with-node-id set to false.

Since we don't have templating for the node name, this means, each node will need to be deployed with a different kairos config.