rancher / k3os

Purpose-built OS for Kubernetes, fully managed by Kubernetes.
https://k3os.io
Apache License 2.0
3.5k stars 397 forks source link

After reboot second node appears #734

Closed patrik-upspot closed 2 years ago

patrik-upspot commented 3 years ago

Version (k3OS / kernel) k3os version v0.21.1-k3s1r0 5.4.0-73-generic rancher/k3os#82 SMP Thu Jun 3 02:29:43 UTC 2021

Architecture x86_64

Describe the bug Hello,

i have a big issue with K3OS. yesterday in configured the cluster the third time. Today i wanted to create a VM Snapshop at my hoster netcup, if you do this you have to stop the VPS take a snapshot and restart the VPS. After the restart my rancher isnt come back. After the reboot i have a second node and my correct node is NotReady. What can i do to start ma "old" node an delete the new one? I'm very new to Kubernetes, so i'm not very truted with all the commands.

v2202108153938159941 [~]$ kubectl get nodes
NAME                   STATUS     ROLES                  AGE     VERSION
upspot-cluster         NotReady   control-plane,master   22h     v1.21.1+k3s1
v2202108153938159941   Ready      control-plane,master   5m34s   v1.21.1+k3s1

Additional context My k3os Config

boot_cmd:
- sed -i -r 's/^(\s*PasswordAuthentication\s*)no(\s*)$/\1yes\2/i' /etc/ssh/sshd_config
hostname: upspot-cluster
k3os:
  k3s_args:
  - server
  - --no-deploy=traefik
  modules:
  - wireguard
  password: ***
  token: ***

Thats for your help!

patrik-upspot commented 3 years ago

After typing

v2202108153938159941 [~]$ sudo k3os config
[INFO]  Skipping k3s download and verify
[INFO]  Skipping installation of SELinux RPM
[INFO]  env: Creating environment file /etc/rancher/k3s/k3s-service.env
[INFO]  openrc: Creating service file /etc/init.d/k3s-service
[INFO]  openrc: Enabling k3s-service service for default runlevel
[INFO]  openrc: Starting k3s-service
 * Caching service dependencies ...                                                                                                                                                                                                    [ ok ]
 * Stopping k3s-service ...                                                                                                                                                                                                            [ ok ]
 * Starting k3s-service ...  

My master Node gets ready and the new one get the status notReady. I deleted the new and now all pods come up. The rnacher is availabe again. But can someone tell me what i'm making wrong? why i get a second node after reboot and whats going on?

After every reboot the second node appears again with STATUS NotReady. I have to delete it every time.

patrik-upspot commented 3 years ago

Does no one have an idea or a tip for me?

srgvg commented 3 years ago

If I understand you correctly, it seems like the hostname of your node changed after that snapshot, causing the issue?

dweomer commented 3 years ago

If I understand you correctly, it seems like the hostname of your node changed after that snapshot, causing the issue?

This is what it looks like to me. It is incongruous, however, that you have specified the hostname in your config.yaml (meaning, you should have been protected against the hostname change). I am assuming that the config.yaml, specifically the hostname, did not change after you took the snapshot.

patrik-upspot commented 3 years ago

That problem occurs after ervery restart of the VPS. The Hostname ist configured in the config.yml and does not change after reebot or snapshoting.

But it seems like a problem with teh config. After typing "sudo k3os config" the new node goes offline and the "original" upspot-cluster comes up and all is fine.

cdeadlock commented 3 years ago

I was never able to get "exactly 2" master nodes to work in a cluster I think it was ETCD related, something requires 3 masters

t1murl commented 2 years ago

Hi @patrik-upspot ,

I happen to also host some k3os nodes on netcup and was seeing the same behavior when running with a minimal cloud init configuration specifying only the host name and ssh keys.

I was able to pin the origin of this down to the default dynamic network configuration. When using a static configuration for conman, the node preserves the network configuration as well as hostname correctly and the single node cluster does not report an additional node after e.g. rebooting.

Please gladly find my redacted config below showing parts of the cloud config file I use. This can also be used as a minimal configuration, so feel free to start with the below, validate your issue is resolved and start adding what else you want to configure.

ssh_authorized_keys:
- ssh-rsa XXX
- ssh-rsa XXX

hostname: XX.example.com

write_files:
  - path: /var/lib/connman/default.config
    content: |-
      [service_eth0]
      Type=ethernet
      MAC=XXXX
      IPv4=XXX.XX.XX.XX/255.255.252.0/XXX.XX.XX.1
      IPv6=off
      Nameservers=XXX.XXX.XXX.XXX,XXX.XXX.XXX.XXX

k3OS:
  ntp_servers:
    - 0.de.pool.ntp.org
    - 1.de.pool.ntp.org

  dns_nameservers:
    - XXX.XXX.XXX.XXX
    - XXX.XXX.XXX.XXX

I hope this helps.

patrik-upspot commented 2 years ago

Hey @t1murl , thanks for your reply. But i'm a very noob at devop topics. I only install K3OS on the VPS and then Rancher into the K3OS. What Nameserver i have to configure? Should i configre the IP of the VPS or something other? The same with the IPV4. What IP range i have to choose? Sorry for that stupid questions ;-) And thanks for yout help!