openshift / openshift-velero-plugin

General Velero plugin for backup and restore of openshift workloads.
Apache License 2.0
47 stars 37 forks source link

OADP-145: Two-phase DC restore for restic #141

Closed sseago closed 2 years ago

sseago commented 2 years ago

When using default volumes to restic, disconnect DeploymentConfig pods from ReplicationController and label for later cleanup. For DeploymentConfigs, set replicas to zero (and paused to false) and annotate/label for later cleanup.

After restore completes, a post-restore bash script must be run to clean up restic restore pods and restore DC to prior configuration.

Note that this is currently only done when the backup sets defaultVolumesToRestic=true -- for the restic annotation case, we don't know in the DeploymentConfig plugin that the DC had its pods disconnected. To enable this case we may need to add another backup annotation to communicate the user's desire to disconnect all DC pods and scale down DCs.

Also, note that this does not currently handle the case for post-restore hooks on DC pods. For restore-level hooks, this could be added similarly to what's being done w/ the defaultVolumesToRestic config. For pod annotations, the concerns are similar to restic annotations.

Also note that following a successful restore, the following script would need to be run in order to clean up the restore pods and restore the DC to its prior configuration (script location in github still pending):

#!/bin/bash
set -e

OADP_NAMESPACE=${OADP_NAMESPACE:=openshift-adp}

if [[ $# -ne 1 ]]; then
  echo "usage: ${BASH_SOURCE} restore-name"
  exit 1
fi

echo using OADP Namespace $OADP_NAMESPACE
echo restore $1

echo Deleting disconnected restore pods
oc delete pods -l openshift.io/disconnected-from-dc=$1

for dc in $(oc get dc --all-namespaces -l openshift.io/replicas-modified=$1 -o jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{","}{.metadata.annotations.openshift\.io/original-replicas}{","}{.metadata.annotations.openshift\.io/original-paused}{"\n"}')
do
    IFS=',' read -ra dc_arr <<< "$dc"
    echo Found deployment ${dc_arr[0]}/${dc_arr[1]}, setting replicas: ${dc_arr[2]}, paused: ${dc_arr[3]}

    cat <<EOF | oc patch dc  -n ${dc_arr[0]} ${dc_arr[1]} --patch-file /dev/stdin
spec:
  replicas: ${dc_arr[2]}
  paused: ${dc_arr[3]}
EOF

done
sseago commented 2 years ago

One additional documentation note: The post-restore script will fail for Velero restores with names exceeding 63 chars, since those names are not valid labels.

kaovilai commented 2 years ago

This is cool

sseago commented 2 years ago

Note the bash script above has been added in a PR to oadp-operator to doc/scripts

sseago commented 2 years ago

@dymurray I've updated the PR to reflect the proposed label name changes, as well as updating it to work with the new GetBackup method signature.