Open dereulenspiegel opened 1 year ago
I have the same problem. I backed up multiple PVCs in a namespace. And when I want to restore only one particular PVC it restores the latest wrong one
While setting up and testing recovery pipelines I have encountered the same issue. Is there any other workaround besides manually specifying snapshot ID?
Did bump into this as well, think the code is here: https://github.com/k8up-io/k8up/blob/master/restic/cli/restore.go#L121
Looks to me as if there is currently no other way than specifying the snapshot ID to ensure that the right thing will be restored.
Found out that for my case (initial restore on a re-deployment of the whole cluster) I can work around this by creating a plain Job
and calling restic
directly for the restore. The results are then just fine.
I am using the following small script which I inject via a ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: restore-scripts
annotations:
kustomize.toolkit.fluxcd.io/substitute: disabled
data:
restore-command.sh: |
#!/bin/bash
set -eux -o pipefail
: ${SNAPSHOT:=latest}
restic restore \
--host ${VOLUME_NAMESPACE} \
--path /data/${VOLUME_NAME} \
--tag ${CLUSTER_REVISION} \
--target / \
${SNAPSHOT}
Note that I am also using --tag
which I use to separate cluster revisions so that I can say "restore from revision 1 and backup into revision 2" within the same repository.
I've then used the k8up
image in the Job
itself to run everything. I think the image
should be kept in sync with the k8up
deployment.
apiVersion: batch/v1
kind: Job
metadata:
name: restore
annotations:
kustomize.toolkit.fluxcd.io/force: enabled
spec:
completions: 1
parallelism: 1
# The Job must stay, otherwise Flux will re-create it regularly
# ttlSecondsAfterFinished:
template:
spec:
containers:
- command:
- /scripts/restore-command.sh
image: ghcr.io/k8up-io/k8up:v2.7.1@sha256:77114c20de9c33661fd088670465714d58a1e3df4ffc5968b446704363fb369c
imagePullPolicy: IfNotPresent
name: restore
securityContext:
runAsUser: 0
env:
- name: VOLUME_NAMESPACE
value: PATCH_VOLUME_NAMESPACE
- name: VOLUME_NAME
value: PATCH_VOLUME_NAME
- name: CLUSTER_REVISION
value: "rev${cluster_bootstrap_revision}"
- name: SNAPSHOT
value: latest
envFrom:
- secretRef:
name: k8up-restic-restore
volumeMounts:
- name: scripts
mountPath: "/scripts"
- name: data
mountPath: "/data/PATCH_VOLUME_NAME"
volumes:
- name: scripts
configMap:
name: restore-scripts
defaultMode: 0555
- name: data
persistentVolumeClaim:
claimName: PATCH_VOLUME_NAME
restartPolicy: OnFailure
All the things which start with PATCH
have to be replaced and also the ${...}
things are to be replaced with reasonable values. In my setup Flux is taking care of this for me.
The secret provides the following variables to make restic happy:
AWS_ACCESS_KEY_ID:
AWS_SECRET_ACCESS_KEY:
RESTIC_PASSWORD:
RESTIC_REPOSITORY:
Maybe this can help to work around the missing feature for others too.
Description
When restoring from a repository in a namespace with multiple PVC it can happen the latest snapshot actually doesn't contain the necessary data. This happens because with multiple PVC multiple snapshots are taken in short succession during a backup operation. Since restoring from the latest snapshot seems to be the default behavior of the
Restore
operation, imho the correct latest snapshot should be selected. TherestoreFilter
should therefore also set the--path
option of restic, to let restic automatically select the correct snapshot. Alternatively this could also be done based on theclaimName
in the folder restore method.Additional Context
No response
Logs
No response
Expected Behavior
k8up selects the correct snapshot to restore previously saved data to a PVC instead of generally the latest snapshot, which might not contain the necessary data when multiple PVC are backed up within a single namespace.
Steps To Reproduce
No response
Version of K8up
v2.7.1
Version of Kubernetes
1.27.1
Distribution of Kubernetes
k3s