rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.58k stars 270 forks source link

rke2 not starting after IP address Change #3176

Closed ccccandddd closed 1 year ago

ccccandddd commented 2 years ago

Environmental Info: RKE2 Version: rke2 version v1.23.8+rke2r1 (f2f1ecd720f566e2cc8a700073d6d82ba9e2ca52) go version go1.17.5b7

Node(s) CPU architecture, OS, and Version: Linux install 5.14.21-150400.22-default #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 UTC 2022 (49db222) x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 Server installed follwing the Quick Installation guide

Describe the bug: When changing the IP address from dhcp to static ip of the server the rke2 server is not starting

Steps To Reproduce:

Expected behavior: The rke2 server is starting and able to work with the new IP address

Actual behavior: The rke server is failing to start

Additional context / logs:

Jul 21 11:14:35 install rke2[3325]: time="2022-07-21T11:14:35Z" level=info msg="Handling backend connection request [install]"
Jul 21 11:14:35 install rke2[3350]: Flag --volume-plugin-dir has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --file-check-frequency has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --sync-frequency has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --address has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --alsologtostderr has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
Jul 21 11:14:35 install rke2[3350]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --authentication-token-webhook has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --authorization-mode has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --cloud-provider has been deprecated, will be removed in 1.24 or later, in favor of removing cloud provider code from Kubelet.
Jul 21 11:14:35 install rke2[3350]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
Jul 21 11:14:35 install rke2[3350]: Flag --eviction-hard has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --eviction-minimum-reclaim has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --healthz-bind-address has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --log-file has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
Jul 21 11:14:35 install rke2[3350]: Flag --log-file-max-size has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
Jul 21 11:14:35 install rke2[3350]: Flag --logtostderr has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
Jul 21 11:14:35 install rke2[3350]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --read-only-port has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --serialize-image-pulls has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --stderrthreshold has been deprecated, will be removed in a future release, see https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components
Jul 21 11:14:35 install rke2[3350]: Flag --tls-cert-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:35 install rke2[3350]: Flag --tls-private-key-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 21 11:14:39 install rke2[3325]: time="2022-07-21T11:14:39Z" level=info msg="Defragmenting etcd database"
Jul 21 11:14:39 install rke2[3325]: time="2022-07-21T11:14:39Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [install-e79cd0c4=https://192.168.21.211:2380], expect: install-e79cd0c4=192.168.31.41"
Jul 21 11:14:40 install rke2[3325]: time="2022-07-21T11:14:40Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Jul 21 11:14:44 install rke2[3325]: time="2022-07-21T11:14:44Z" level=info msg="Defragmenting etcd database"
Jul 21 11:14:44 install rke2[3325]: time="2022-07-21T11:14:44Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [install-e79cd0c4=https://192.168.21.211:2380], expect: install-e79cd0c4=192.168.31.41"
Jul 21 11:14:45 install rke2[3325]: time="2022-07-21T11:14:45Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Jul 21 11:14:49 install rke2[3325]: time="2022-07-21T11:14:49Z" level=info msg="Defragmenting etcd database"
Jul 21 11:14:49 install rke2[3325]: time="2022-07-21T11:14:49Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [install-e79cd0c4=https://192.168.21.211:2380], expect: install-e79cd0c4=192.168.31.41"
Jul 21 11:14:50 install rke2[3325]: time="2022-07-21T11:14:50Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
brandond commented 2 years ago

Jul 21 11:14:49 install rke2[3325]: time="2022-07-21T11:14:49Z" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [install-e79cd0c4=https://192.168.21.211:2380], expect: install-e79cd0c4=192.168.31.41"

changed IP config for eth0 from dhcp to static IP in a different subnet

Servers must have static IP addresses. If you must change the address, you should delete the node from the cluster and re-add it with the new address. In the case of a single-node cluster, you can stop the rke2-server service, run rke2 server --cluster-reset to reset the etcd cluster membership back to a single member with the current node IP address, then start the rke2-server service again.

IanniF commented 1 year ago

Hi, I'm having the same issue for a cluster when I want to change my node IP.. I've tried multiple to remove it and re-add it to the cluster but I keep having the same issue. Is there any sort of documentation in order to remove a node from a RKE2 cluster and then re-add it ?

brandond commented 1 year ago

Is there any sort of documentation in order to remove a node from a RKE2 cluster and then re-add it ?

Just what you would expect - stop the service, kubectl delete node, then start the service again.

Bermos commented 1 year ago

I have the same problem, but (on my single node rke2 cluster) I haven't changed the IP address of the node. The rke2-server expects the correct IP address but gets the wrong IP for the etcd name. So the error looks like this:
Failed to test data store connection: this server is a not a member of the etcd cluster. Found [rancher-e79cd0c4=https://10.10.1.147:2380], expect: rancher-e79cd0c4=172.16.1.20
The nodes IP address is: 172.16.1.20
The etcd container listens on: https://172.16.1.20:2380

The node's IP has not changed, but after manually running:

systemctl stop rke2-server
rke2 certificate rotate
systemctl start rke2-server

Would anyone be willing to help me troubleshoot what might have happened here, ideally fix it or at least try to prevent it from happening again in the future?
I tried finding where k3s (that is a dependency of rke2) returns that error but then couldn't find where it gets the info that rancher-e79cd0c4 should equal https://10.10.1.147:2380. That might be an angle to find where the error originates from.

brandond commented 1 year ago

@Bermos where's the other IP coming from? Do you have multiple interfaces configured on the node, both with default routes? If so, and you can't rely on specific ordering, you may need to add node-ip: 10.10.1.147 to the config to tell it to use the correct IP. If you prefer to use the 172.16.1.20 address, you can do so, but you will need to run rke2 server --cluster-reset to rebuild the etcd cluster with that address.

caroline-suse-rancher commented 1 year ago

I'm going to convert this to a discussion, as there's no clear bug with rke2