k8up-io / k8up

Kubernetes and OpenShift Backup Operator
https://k8up.io/
Apache License 2.0
681 stars 66 forks source link

RestoreSpec restoreFilter should also set --path filter #867

Open dereulenspiegel opened 1 year ago

dereulenspiegel commented 1 year ago

Description

When restoring from a repository in a namespace with multiple PVC it can happen the latest snapshot actually doesn't contain the necessary data. This happens because with multiple PVC multiple snapshots are taken in short succession during a backup operation. Since restoring from the latest snapshot seems to be the default behavior of the Restore operation, imho the correct latest snapshot should be selected. The restoreFilter should therefore also set the --path option of restic, to let restic automatically select the correct snapshot. Alternatively this could also be done based on the claimName in the folder restore method.

Additional Context

No response

Logs

No response

Expected Behavior

k8up selects the correct snapshot to restore previously saved data to a PVC instead of generally the latest snapshot, which might not contain the necessary data when multiple PVC are backed up within a single namespace.

Steps To Reproduce

No response

Version of K8up

v2.7.1

Version of Kubernetes

1.27.1

Distribution of Kubernetes

k3s

RomanRomanenkov commented 1 year ago

I have the same problem. I backed up multiple PVCs in a namespace. And when I want to restore only one particular PVC it restores the latest wrong one

nikolai5slo commented 10 months ago

While setting up and testing recovery pipelines I have encountered the same issue. Is there any other workaround besides manually specifying snapshot ID?

johbo commented 5 months ago

Did bump into this as well, think the code is here: https://github.com/k8up-io/k8up/blob/master/restic/cli/restore.go#L121

Looks to me as if there is currently no other way than specifying the snapshot ID to ensure that the right thing will be restored.

johbo commented 5 months ago

Found out that for my case (initial restore on a re-deployment of the whole cluster) I can work around this by creating a plain Job and calling restic directly for the restore. The results are then just fine.

I am using the following small script which I inject via a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: restore-scripts
  annotations:
    kustomize.toolkit.fluxcd.io/substitute: disabled
data:
  restore-command.sh: |
    #!/bin/bash

    set -eux -o pipefail

    : ${SNAPSHOT:=latest}

    restic restore \
      --host ${VOLUME_NAMESPACE} \
      --path /data/${VOLUME_NAME} \
      --tag ${CLUSTER_REVISION} \
      --target / \
      ${SNAPSHOT}

Note that I am also using --tag which I use to separate cluster revisions so that I can say "restore from revision 1 and backup into revision 2" within the same repository.

I've then used the k8up image in the Job itself to run everything. I think the image should be kept in sync with the k8up deployment.

apiVersion: batch/v1
kind: Job
metadata:
  name: restore
  annotations:
    kustomize.toolkit.fluxcd.io/force: enabled
spec:
  completions: 1
  parallelism: 1
  # The Job must stay, otherwise Flux will re-create it regularly
  # ttlSecondsAfterFinished:
  template:
    spec:
      containers:
        - command:
            - /scripts/restore-command.sh
          image: ghcr.io/k8up-io/k8up:v2.7.1@sha256:77114c20de9c33661fd088670465714d58a1e3df4ffc5968b446704363fb369c
          imagePullPolicy: IfNotPresent
          name: restore
          securityContext:
            runAsUser: 0
          env:
            - name: VOLUME_NAMESPACE
              value: PATCH_VOLUME_NAMESPACE
            - name: VOLUME_NAME
              value: PATCH_VOLUME_NAME
            - name: CLUSTER_REVISION
              value: "rev${cluster_bootstrap_revision}"
            - name: SNAPSHOT
              value: latest
          envFrom:
            - secretRef:
                name: k8up-restic-restore
          volumeMounts:
            - name: scripts
              mountPath: "/scripts"
            - name: data
              mountPath: "/data/PATCH_VOLUME_NAME"
      volumes:
        - name: scripts
          configMap:
            name: restore-scripts
            defaultMode: 0555
        - name: data
          persistentVolumeClaim:
            claimName: PATCH_VOLUME_NAME
      restartPolicy: OnFailure

All the things which start with PATCH have to be replaced and also the ${...} things are to be replaced with reasonable values. In my setup Flux is taking care of this for me.

The secret provides the following variables to make restic happy:

AWS_ACCESS_KEY_ID:   
AWS_SECRET_ACCESS_KEY:  
RESTIC_PASSWORD:        
RESTIC_REPOSITORY:      

Maybe this can help to work around the missing feature for others too.