Closed harridu closed 7 months ago
After migrating my rancher cluster from RKE to RKE2 (using the restore mechanism as described)
Just to clarify - did you install a new RKE2 cluster and migrate the Rancher application over via backup/restore, or did you in-place upgrade your cluster from RKE to RKE2? In-place migrations of a cluster from RKE to RKE2 are not currently supported (should probably work but still highly experimental and not supported) so I wanted to confirm which scenario we're dealing with.
@harridu , adding to the above, and assuming it's the Rancher itself that was migrated from RKE1 to RKE2, could you please clarify what exact restore procedure was used, is it https://ranchermanager.docs.rancher.com/v2.6/how-to-guides/new-user-guides/backup-restore-and-disaster-recovery/migrate-rancher-to-new-cluster
using the restore mechanism as described
I have set up new hosts rancher01{a..c} using Debian 11 and rke2 v1.24.8+rke2r1. The old hosts (rr0{1..3}) were based upon Debian 11 and rke v1.24.8. All hosts are virtual hosts (qemu, libvirt, ...) with 4 cores and 8 GByte RAM.
Migration has been done using the backup helm charts and the restore object (and local S3 storage on minio), as described in the documentation.
CHART_VERSION=2.1.2
helm install rancher-backup-crd rancher-charts/rancher-backup-crd -n cattle-resources-system --create-namespace --version $CHART_VERSION
helm install rancher-backup rancher-charts/rancher-backup -n cattle-resources-system --version $CHART_VERSION
First the restore operator, the S3 secret and the certificate have been created. kubectl describe did not indicate any problems with the restore object.
apiVersion: resources.cattle.io/v1
kind: Restore
metadata:
name: restore-migration
spec:
backupFilename: hourly-95f76381-d9e5-4644-af1b-ca8fdf93a7ae-2023-01-23T10-56-59Z.tar.gz
prune: false
storageLocation:
s3:
bucketName: rancher01
credentialSecretName: rancher-backup-s3
credentialSecretNamespace: cattle-resources-system
endpoint: minio.example.com:9010
folder: backup
Next rancher was installed using
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=rancher01.example.com --set ingress.tls.source=secret --version 2.6.9
If you changed the rancher hostname, then maybe also check https://www.suse.com/support/kb/doc/?id=000020173
I did not change the rancher hostname ("rancher01.example.com"), just the names of hosts in the cluster. RKE2 was setup with
server: https://rancher01a.example.com:9345
token: look it up on the first node
tls-san:
- rancher01.example.com
- rancher01a.example.com
- rancher01b.example.com
- rancher01c.example.com
rancher01 is a round-robin in DNS pointing to the IP addresses of rancher01{a..c}
Did I mention that 4 clusters running on RKE made the migration without warning? Just the cluster based on RKE2 shows "Updating" or "Reconciling" and a message "Waiting for plan to be applied" in the Rancher Web GUI.
PS, this might be helpful: The fleet-agent on kube003 seems to have a problem:
% k logs -n cattle-fleet-system fleet-agent-bfc5655cc-6wvgx
time="2023-01-27T15:22:50Z" level=error msg="Current credential failed, failing back to reregistering: Unauthorized"
time="2023-01-27T15:22:50Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-system/fleet-agent-bootstrap or cattle-fleet-system/fleet-agent: looking up secret cattle-fleet-system/fleet-agent-bootstrap: secrets \"fleet-agent-bootstrap\" not found"
time="2023-01-27T15:23:50Z" level=error msg="Current credential failed, failing back to reregistering: Unauthorized"
time="2023-01-27T15:23:50Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-system/fleet-agent-bootstrap or cattle-fleet-system/fleet-agent: looking up secret cattle-fleet-system/fleet-agent-bootstrap: secrets \"fleet-agent-bootstrap\" not found"
time="2023-01-27T15:24:50Z" level=error msg="Current credential failed, failing back to reregistering: Unauthorized"
time="2023-01-27T15:24:50Z" level=error msg="Failed to register agent: looking up secret cattle-fleet-system/fleet-agent-bootstrap or cattle-fleet-system/fleet-agent: looking up secret cattle-fleet-system/fleet-agent-bootstrap: secrets \"fleet-agent-bootstrap\" not found"
@harridu I see this same error with a downstream RKE2 custom cluster when I migrate my Rancher management cluster from one RKE2 cluster to another RKE2 cluster. This issue seems similar to issue https://github.com/rancher/rancher/issues/40080. However, the proposed fix for that issue is not working for me. Are you able to apply the fix for that issue here and confirm it does not solve this problem either?
Hi @ron1 , thank you for the pointer.
If I got this correctly, my new Rancher is in a highly questionable state, regardless whether there is a fix in #40080. Internal data has been corrupted by the broken backup/restore mechanism. The migration failed, ie. I have to setup a new Rancher to manage my clusters. Since there is no migration procedure for the managed clusters I have to rebuild them as well.
Seems I wasted a lot of time in trying the migration.
closed due to lack of interest
Rancher Server Setup
Describe the bug After migrating my rancher cluster from RKE to RKE2 (using the restore mechanism as described) all managed clusters based on RKE are back, but my one-and-only managed cluster ("kube003") based on RKE2 v1.24.8+rke2r1 is stuck in Updating. Error message is
In the cluster manager for kube003 is a message
All clusters are on-premises. The common hostname has been modified in DNS (TTL is 300secs) to point to the new hosts, as requested in the migration guide. It was set on the helm install command line on the new rancher cluster. The native host names of the old cluster nodes have not been preserved.