Closed rpelissi closed 2 years ago
So here I am from my investigations. We have those storageclass avaiable:
[root@node-1 ~]# kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 24h
longhorn (default) driver.longhorn.io Delete Immediate true 20h
I ma using longhorn and local-path could be good but I was not sure how to use this, so I created a local one like this:
[root@node-1 ~]# cat local-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: Immediate
So
[root@node-1 ~]# kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 24h
local-storage kubernetes.io/no-provisioner Delete Immediate false 63m
longhorn (default) driver.longhorn.io Delete Immediate true 20h
Now let's create the volume and the volume claim
[root@node-1 ~]# cat pv-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: local-storage
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/backup"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node1
I have set a node affinity so I can set the mount where I want (seems logic to me but could be totally stupid and in fact lead me to some issues after)
[root@node-1 ~]# cat pv-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: local-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
So:
[root@node-1 ~]# kubectl get persistentvolumes
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
backup 10Gi RWO Retain Released cattle-resources-system/rancher-backup-1 18h
task-pv-volume 10Gi RWO Retain Bound default/task-pv-claim local-storage 62m
[root@node-1 ~]# kubectl get persistentvolumeclaims
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
task-pv-claim Bound task-pv-volume 10Gi RWO local-storage 62m
After reading a little more on the backup restore operator, it seems that I have to defined when I install the operator the volume location (again I could be wrong...) I want to use this option (which is my case because my backup files are local) So I read tis: https://rancher.com/docs/rancher/v2.6/en/backups/configuration/storage-config/#existing-persistent-volume
Seems that I have to deplou the bckup restore operator with custom values. Fine. I took the template found on the web page above and set volumeName to task-pv-claim.
then execute:
[root@node-1 ~]# helm install rancher-backup rancher-charts/rancher-backup -n cattle-resources-system --version 2.1.2 -f values.yaml
but the pod fail with:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17s default-scheduler Successfully assigned cattle-resources-system/rancher-backup-cb4f7564d-w4rw7 to worker-3
Warning Failed 16s kubelet Failed to pull image "rancher/rancher-backup:v2.1.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/rancher/rancher-backup:v2.1.2": failed to resolve reference "docker.io/rancher/rancher-backup:v2.1.2": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning Failed 16s kubelet Error: ErrImagePull
Normal BackOff 15s kubelet Back-off pulling image "rancher/rancher-backup:v2.1.2"
Warning Failed 15s kubelet Error: ImagePullBackOff
Normal Pulling 0s (x2 over 16s) kubelet Pulling image "rancher/rancher-backup:v2.1.2"
Found that the template on the web page could be wrong: Instead of:
image:
repository: rancher/rancher-backup
tag: v0.0.1-rc10
I use:
image:
repository: rancher/backup-restore-operator
and now the pod is working correctly.
So, I recreate the restore job but still got issue:
[root@node-1 ~]# kubectl get Restore
NAME BACKUP-SOURCE BACKUP-FILE AGE STATUS
restore-pvc-demo daily-4a197c6b-2cff-4dae-bc12-c75a4c72c5f1-2022-05-22T00-00-00Z.tar.gz 49m Retrying
[root@node-1 ~]# kubectl describe Restore
..
Spec:
Backup Filename: daily-4a197c6b-2cff-4dae-bc12-c75a4c72c5f1-2022-05-22T00-00-00Z.tar.gz
Status:
Backup Source:
Conditions:
Last Update Time: 2022-05-29T15:05:25Z
Message: Backup location not specified on the restore CR, and not configured at the operator level
So at that point I am not sure that I have done wrong.. Maybe because the backup restore pod is running on another worker node...
[root@node-1 ~]# kubectl describe pod/rancher-backup-74779d9dfd-fdndh -n cattle-resources-system
Name: rancher-backup-74779d9dfd-fdndh
Namespace: cattle-resources-system
Priority: 0
Node: worker-3/192.168.2.105
Start Time: Sun, 29 May 2022 11:13:05 -0400
Labels: app.kubernetes.io/instance=rancher-backup
app.kubernetes.io/name=rancher-backup
pod-template-hash=74779d9dfd
resources.cattle.io/operator=backup-restore
Annotations: checksum/pvc: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
checksum/s3: 01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b
Status: Running
IP: 10.42.5.53
IPs:
IP: 10.42.5.53
Controlled By: ReplicaSet/rancher-backup-74779d9dfd
Containers:
rancher-backup:
Container ID: containerd://4404353cda78995dcb2aeef1c8b75d623cd5a99c136db07659edb1242b70a4fe
Image: rancher/backup-restore-operator:v2.1.2
Image ID: docker.io/rancher/backup-restore-operator@sha256:acbb9ae36580b53ec87a953a18a98e0b0bc0bcefe2100850dee7c66f8a978169
Port: <none>
Host Port: <none>
State: Running
Started: Sun, 29 May 2022 11:13:06 -0400
Ready: True
Restart Count: 0
Environment:
CHART_NAMESPACE: cattle-resources-system
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from rancher-backup-token-8f7p9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
rancher-backup-token-8f7p9:
Type: Secret (a volume populated by a Secret)
SecretName: rancher-backup-token-8f7p9
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: cattle.io/os=linux:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34m default-scheduler Successfully assigned cattle-resources-system/rancher-backup-74779d9dfd-fdndh to worker-3
Normal Pulling 34m kubelet Pulling image "rancher/backup-restore-operator:v2.1.2"
Normal Pulled 34m kubelet Successfully pulled image "rancher/backup-restore-operator:v2.1.2" in 326.073136ms
Normal Created 34m kubelet Created container rancher-backup
Normal Started 34m kubelet Started container rancher-backup
Not sure.. I will try to continur to investigate but any help will be more than welcome.
Ok I may have made a mistake for the custom values for my volume I now use:
image:
repository: rancher/rancher-backup
#tag: v0.0.1-rc10
#tag: latest
#tag: v2.1.2
## Default s3 bucket for storing all backup files created by the rancher-backup operator
s3:
enabled: false
## credentialSecretName if set, should be the name of the Secret containing AWS credentials.
## To use IAM Role, don't set this field
credentialSecretName: creds
credentialSecretNamespace: ""
region: us-west-2
bucketName: rancherbackups
folder: base folder
endpoint: s3.us-west-2.amazonaws.com
endpointCA: base64 encoded CA cert
# insecureTLSSkipVerify: optional
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
## If persistence is enabled, operator will create a PVC with mountPath /var/lib/backups
persistence:
enabled: false
## If defined, storageClassName: <storageClass>
## If set to "-", storageClassName: "", which disables dynamic provisioning
## If undefined (the default) or set to null, no storageClassName spec is
## set, choosing the default provisioner. (gp2 on AWS, standard on
## GKE, AWS & OpenStack).
## Refer to https://kubernetes.io/orage/persistentdocs/concepts/st-volumes/#class-1
##
storageClass: "-"
## If you want to disable dynamic provisioning by setting storageClass to "-" above,
## and want to target a particular PV, provide name of the target volume
volumeName: "task-pv-claim"
## Only certain StorageClasses allow resizing PVs; Refer to https://kubernetes.io/blog/2018/07/12/resizing-persistent-volumes-using-kubernetes/
size: 2Gi
global:
cattle:
systemDefaultRegistry: ""
nodeSelector: {}
tolerations: []
affinity: {}
and then
helm install rancher-backup-crd rancher-charts/rancher-backup-crd -n cattle-resources-system --create-namespace --version 2.1.2 -f values.yaml
helm install rancher-backup rancher-charts/rancher-backup -n cattle-resources-system --version 2.1.2
Still got his:
[root@node-1 ~]# kubectl describe Restore
Name: restore-pvc-demo
Namespace:
Labels: <none>
Annotations: <none>
API Version: resources.cattle.io/v1
Kind: Restore
Metadata:
Creation Timestamp: 2022-05-29T16:20:46Z
Generation: 1
Managed Fields:
API Version: resources.cattle.io/v1
Fields Type: FieldsV1
fieldsV1:
f:spec:
f:prune:
f:storageLocation:
f:status:
.:
f:backupSource:
f:conditions:
f:observedGeneration:
f:restoreCompletionTs:
f:summary:
Manager: backup-restore-operator
Operation: Update
Time: 2022-05-29T16:20:46Z
API Version: resources.cattle.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:backupFilename:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-05-29T16:20:46Z
Resource Version: 654441
UID: 4095284c-38e7-45b6-b7f0-c18d0e6bcf44
Spec:
Backup Filename: daily-4a197c6b-2cff-4dae-bc12-c75a4c72c5f1-2022-05-22T00-00-00Z.tar.gz
Status:
Backup Source:
Conditions:
Last Update Time: 2022-05-29T16:20:46Z
Message: Backup location not specified on the restore CR, and not configured at the operator level
Reason: Error
Status: False
Type: Reconciling
Last Update Time: 2022-05-29T16:20:46Z
Message: Retrying
Status: Unknown
Type: Ready
Observed Generation: 0
Restore Completion Ts:
Summary:
Events: <none>
So maybe I am wrong.. maybe a custom section in the https://rancher.com/docs/rancher/v2.6/en/backups/configuration/storage-config/#example-values-yaml-for-the-rancher-backup-helm-chart is needed.. Like in this example, the storage is s3, but i need a local storage instead but I have no ideas on how to define this in this yaml file...
This definitely needs some clarification as everything is mostly focused on S3. Here are some quick steps that I used while I'm working on improving this:
Available
before upgrading the the charthelm -n cattle-resources-system upgrade rancher-backup rancher-charts/rancher-backup --set persistence.enabled=true --set persistence.storageClass="local-path" --set persistence.volumeName="pvc-x-x-x-x-x"
(my helm repo is called rancher-charts
)kubectl -n cattle-resources-system exec deploy/rancher-backup -- ls /var/lib/backups
kubectl -n cattle-resources-system get events
I tested this on k3s + local-path storageclass.
Hi, Thanks for those useful infos, very appreciated :) I have tried to work on it yesterday but not yet been able to make it work. I will do another attempt today for sure.
Hello! I'm back. So this the current status. I was able to create a volume and have the rancher backup operator see the backup file:
[root@node-1 ~]# cat pv-volume-rancher.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: local-path
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/backup"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- worker-1
[root@node-1 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
backup 10Gi RWO Retain Released cattle-resources-system/rancher-backup-1 3d21h
task-pv-volume 10Gi RWO Retain Bound cattle-resources-system/rancher-backup-1 local-path 5h36m
I see the rancher backup pod running and the claim is done
[root@node-1 ~]# kubectl get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
cattle-resources-system rancher-backup-1 Bound task-pv-volume 10Gi RWO local-path 5h36m
I can see the file at the location/worker node I have selected
[root@node-1 ~]# kubectl -n cattle-resources-system exec deploy/rancher-backup -- ls /var/lib/backups
daily-4a197c6b-2cff-4dae-bc12-c75a4c72c5f1-2022-05-22T00-00-00Z.tar.gz
Now, the next step I think is to create the restore custom resource but.... in the example given here: https://rancher.com/docs/rancher/v2.6/en/backups/migrating-rancher/ the storageLocation is set to s3. In case I have mounted on worker-1:/backup what I am supposed to set in this file?
Thanks again for your patience and help.
Ok found it!
[root@node-1 ~]# cat migrationResource.yaml
# migrationResource.yaml
apiVersion: resources.cattle.io/v1
kind: Restore
metadata:
name: restore-migration
spec:
backupFilename: daily-4a197c6b-2cff-4dae-bc12-c75a4c72c5f1-2022-05-22T00-00-00Z.tar.gz
prune: false
And then
[root@node-1 ~]# kubectl apply -f migrationResource.yaml
After checking the logs using:
kubectl logs -n cattle-resources-system --tail 100 -f rancher-backup-xxxxxx
I see this:
INFO[2022/06/01 18:53:56] restoreResource: Restoring library-nfs-provisioner-0.1.2 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] restoreResource: Namespace cattle-global-data for name library-nfs-provisioner-0.1.2 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] Getting new UID for library-nfs-provisioner
INFO[2022/06/01 18:53:56] restoreResource: Restoring library-nfs-provisioner-0.2.2 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] restoreResource: Namespace cattle-global-data for name library-nfs-provisioner-0.2.2 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] Getting new UID for library-nfs-provisioner
INFO[2022/06/01 18:53:56] restoreResource: Restoring library-prometheus-9.1.0 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] restoreResource: Namespace cattle-global-data for name library-prometheus-9.1.0 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] Getting new UID for library-prometheus
INFO[2022/06/01 18:53:56] restoreResource: Restoring library-prometheus-6.2.1 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] restoreResource: Namespace cattle-global-data for name library-prometheus-6.2.1 of type management.cattle.io/v3, Resource=catalogtemplateversions
INFO[2022/06/01 18:53:56] Getting new UID for library-prometheus
INFO[2022/06/01 18:53:56] Processing controllerRef apps/v1/deployments/rancher
WARN[2022/06/01 18:53:56] Error getting object for controllerRef rancher, skipping it
INFO[2022/06/01 18:53:57] Done restoring
So far so good then, time to go with the next steps
So, I have followed the steps in https://rancher.com/docs/rancher/v2.6/en/backups/migrating-rancher/
But.. either I am doing something wrong of either I have created my backup not correctly the first time but... even if I can access rancher now, all my deployments are gone... That's pretty weird, I will try to dig in my old backups and see if it is the same situation after a restore... or maybe I have done something wrong?
I guess the expectation might be wrong here, by default, it backs up and restores Rancher, not everything. The default set of resources that is being backed up can be found here: https://github.com/rancher/backup-restore-operator/tree/master/charts/rancher-backup/files/default-resourceset-contents
If there are resources that match this selection and are not backed up, please share what exactly you are missing.
Oh. So the backup apps in rancher does not backup the workloads definitions, and custom storage definition, kind of thing then? That's make more sense now even if I am a little surprised to be honest :)
So I guess the ticket can be closed since I have the current process now to restore rancher from the backup file. I have 2 concerns/comments:
Can you share where you would like to have more information added? On https://rancher.com/docs/rancher/v2.6/en/backups/, it says:
The rancher-backup operator is used to backup and restore Rancher on any Kubernetes cluster. This application is a Helm chart, and it can be deployed through the Rancher Apps & Marketplace page, or by using the Helm CLI. The rancher-backup Helm chart is [here.](https://github.com/rancher/charts/tree/release-v2.6/charts/rancher-backup)
The backup-restore operator needs to be installed in the local cluster, and only backs up the Rancher app. The backup and restore operations are performed only in the local Kubernetes cluster.
Regarding backing up other resources, there are quite a few ways to do this, all with different approaches and strategies. The operator was created to backup/restore Rancher + migrating Rancher to a different set of nodes. One approach would be that all the other resources that you want to deploy, are in any automation of your choice and you deploy them after the restore of Rancher has been finished (ansible/terraform etc). This would be the recommended path.
If you really want to backup other resources, you could add your own resourcesets and specify what needs to be included in the backup (currently this is scoped to Rancher only) and that's why you don't see any non Rancher resources. I'd have to check to see how this done through the current Helm chart.
@rpelissi Let me know if you need anything else on this.
Hi! Sorry sorry I have been busy with other stuff :) So In fact I am disapointed not because of the tool but because I have not read correctly the documentation that in fact mention that the resources taken in the backup does not contains workload definitions for ex :) So it's my entire fault! Now, I still not sure about a DRP plan for that, I mean we have this:
Let say that we lost all our ranchers infra but we still got the backup files for the 2 components listed above, can I:
That's my question. Also, if we can have an example on how we can add custom resource to the rancher backup, that could be cool too! :)
Thanks!
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
Hi, So I am trying to understand how to do a restore in Rancher and tried to follow the documentation found here: https://rancher.com/docs/rancher/v2.6/en/backups/migrating-rancher/ The problem is that the documentation gives examples with s3/minio but not with local path so I am a bit lost how what do to and I have to admit that I just learning... So, I have my backup file on one of the node, I am trying to create a file:
So the daily-4a197c6b-2cff-4dae-bc12-c75a4c72c5f1-2022-05-22T00-00-00Z.tar.gz is my backup file. Of course it is not working:
So I think I need to configure the volume/claim on the operator level (not sure) so the restore job know how to connect to it and get the file. But I have not real clue on how to do this... I guess the steps are:
Can you assist my with those steps please? I am pretty sure this will help other users and also make the documentation even more usefull.
Thanks!