labring / sealos

Sealos is a production-ready Kubernetes distribution. You can run any Docker image on sealos, start high availability databases like mysql/pgsql/redis/mongo, develop applications using any Programming language.
https://cloud.sealos.io
Apache License 2.0
14.08k stars 2.08k forks source link

add new nodes it rename all nodes hostname to lvscare.node.ip #4707

Closed ElfenSterben closed 2 months ago

ElfenSterben commented 6 months ago

Sealos Version

v4.3.7

How to reproduce the bug?

  1. run sealos add --nodes 10.10.1.1,10.10.1.2
  2. it install fail on 10.10.1.2
  3. error `` W0424 10:09:36.876147 2123 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! [10.10.1.2:22](http://10.10.1.2:22/) [preflight] Running pre-flight checks [10.10.1.2:22](http://10.10.1.2:22/) [WARNING FileExisting-ethtool]: ethtool not found in system path [10.10.1.2:22](http://10.10.1.2:22/) [preflight] Reading configuration from the cluster... [10.10.1.2:22](http://10.10.1.2:22/) [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [10.10.1.2:22](http://10.10.1.2:22/) W0424 10:09:37.338544 2123 utils.go:69] The recommended value for "healthzBindAddress" in "KubeletConfiguration" is: [127.0.0.1](http://127.0.0.1/); the provided value is: [0.0.0.0](http://0.0.0.0/) [10.10.1.2:22](http://10.10.1.2:22/) error execution phase kubelet-start: a Node with name "lvscare.node.ip" and status "Ready" already exists in the cluster. You must delete the existing Node or change the name of this new joining Node [10.10.1.2:22](http://10.10.1.2:22/) To see the stack trace of this error execute with --v=5 or higher 2024-04-24T10:09:37 error Applied to cluster error: failed to join node [10.10.1.2:22](http://10.10.1.2:22/) run commandkubeadm join --config=/root/.sealos/default/etc/kubeadm-join-node.yaml -v 0on [10.10.1.2:22](http://10.10.1.2:22/), output: W0424 10:09:36.876147 2123 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! [preflight] Running pre-flight checks [WARNING FileExisting-ethtool]: ethtool not found in system path [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' W0424 10:09:37.338544 2123 utils.go:69] The recommended value for "healthzBindAddress" in "KubeletConfiguration" is: [127.0.0.1](http://127.0.0.1/); the provided value is: [0.0.0.0](http://0.0.0.0/) error execution phase kubelet-start: a Node with name "lvscare.node.ip" and status "Ready" already exists in the cluster. You must delete the existing Node or change the name of this new joining Node To see the stack trace of this error execute with --v=5 or higher , error: Process exited with status 1, Error: failed to join node [10.10.1.2:22](http://10.10.1.2:22/) run commandkubeadm join --config=/root/.sealos/default/etc/kubeadm-join-node.yaml -v 0` on 10.10.1.2:22, output: W0424 10:09:36.876147 2123 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! [preflight] Running pre-flight checks [WARNING FileExisting-ethtool]: ethtool not found in system path [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' W0424 10:09:37.338544 2123 utils.go:69] The recommended value for "healthzBindAddress" in "KubeletConfiguration" is: 127.0.0.1; the provided value is: 0.0.0.0 error execution phase kubelet-start: a Node with name "lvscare.node.ip" and status "Ready" already exists in the cluster. You must delete the existing Node or change the name of this new joining Node To see the stack trace of this error execute with --v=5 or higher , error: Process exited with status 1,
3. run `hostnamectl` on 10.10.1.1 and 10.10.1.2, they have same hostname settings
```Static hostname: n/a
Transient hostname: lvscare.node.ip
         Icon name: computer-vm
           Chassis: vm
        Machine ID: xxxxxxx
           Boot ID: xxxxxx
    Virtualization: kvm
  Operating System: Ubuntu 20.04.6 LTS
            Kernel: Linux 5.4.0-56-generic
      Architecture: x86-64

What is the expected behavior?

No response

What do you see instead?

No response

Operating environment

- Sealos version:v4.3.7
- Docker version: none
- Kubernetes version: v1.27.4
- Operating system: ubuntu 20.04
- Runtime environment:
- Cluster size:
- Additional information:

Additional information

No response

ghostloda commented 6 months ago

看这里 error execution phase kubelet-start: a Node with name "lvscare.node.ip" and status "Ready" already exists in the cluster. You must delete the existing Node or change the name of this new joining Node

ElfenSterben commented 6 months ago

看这里 error execution phase kubelet-start: a Node with name "lvscare.node.ip" and status "Ready" already exists in the cluster. You must delete the existing Node or change the name of this new joining Node

之前新加机器是不用管hostname的,新机器默认没有hostname,sealos add 会用node ip 组成hostname。但是现在 sealos add --nodes 的时候会将 nodes 列表所有的机器hostname 设置成 lvscare.node.ip,导致第二台和第一台机器重名,在装到第二台机器的时候就会报这个错,现在必须手动设置所有的节点hostname,4.3.0 的时候我记得是不用设置的

stale[bot] commented 4 months ago

This issue has been automatically closed because we haven't heard back for more than 60 days, please reopen this issue if necessary.