kubernetes-sigs / azuredisk-csi-driver

Azure Disk CSI Driver
Apache License 2.0
147 stars 193 forks source link

Store snapshot at different Azure regions #1694

Closed slawekww closed 1 year ago

slawekww commented 1 year ago

Is your feature request related to a problem?/Why is this needed

This is new feature request for backup and restore snapshots at different Azure region. For now it is possible to create snapshots at different Azure resource groups but snapshot location is always the same as AKS location (disk location).

For Disaster Recovery in paired or different Azure region, Azure Snapshot can be copied between Azure regions via procedure and it is not possible to use copied snapshots to restore persistent volumes in AKS using CSI driver.

Describe the solution you'd like in detail

  1. CR VolumeSnapshotLocation in case of different Azure resource group to respect Azure resource group default location.
  2. During restore, CR VolumeSnapshotLocation should point to Azure resource group with copied snapshots and create pv based on snapshot and do not refer to initial disk location.

Currently it is error that disk cannot be mounted as pv refers to disk which is already mouted:

Warning  FailedAttachVolume  106s (x12 over 10m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-f4579593-9b42-4188-b5a3-bb8e24ce027b" : rpc error: code = Internal desc = Attach volume /subscriptions/subID/resourceGroups/backupedClusterResourceGroup/providers/Microsoft.Compute/disks/pvc-f4579593-9b42-4188-b5a3-bb8e24ce027b to instance aks-default-vmss000001 failed with disk(/subscriptions/subID/resourceGroupsbackupedClusterResourceGroup/providers/Microsoft.Compute/disks/pvc-f4579593-9b42-4188-b5a3-bb8e24ce027b) already attached to node(/subscriptions/subID/resourceGroups/backupedClusterResourceGroup/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-backupId-vmss/virtualMachines/aks-default-18145209-vmss_51), could not be attached to node(aks-default-newId-vmsNodeAtNewCluster)

Describe alternatives you've considered I try to use Velero with and without EnableCSI plugin and no success. It may be possible to use storage service with geo-redundancy and do not use Azure snapshots for Disaster Recovery on different region.

edreed commented 1 year ago

@slawekww It is possible to use the cross-region copied snapshot in the destination region by treating it as a pre-provisioned snapshot. The script below shows how this can be orchestrated. The script deploys a stateful workload to a K8s cluster in the West Europe region, takes a snapshot and then copies it to East US 2. It then creates a pre-provisioned snapshot in a K8s cluster in East US 2 and deploys a stateful workload by cloning the snapshot into a PersistentVolume.

IMO, adding this functionality to the CSI driver doesn't make a great deal of sense because the copied snapshot is not useable in original cluster.

#!/usr/bin/env bash

set -x -euo pipefail

# Connect to the source K8s cluster.
export SOURCE_PVC_NAME="pvc-azuredisk"
export SOURCE_NAMESPACE="default"
export SOURCE_AZURE_REGION="westus"
export SOURCE_CLUSTER_CONTEXT="edreed-aks-weu"
export SOURCE_VS_NAME="snapshot-azuredisk"

kubectl config use-context "$SOURCE_CLUSTER_CONTEXT"

# Deploy a stateful workload.
cat <<EOF | kubectl apply -f -
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: $SOURCE_PVC_NAME
  namespace: $SOURCE_NAMESPACE
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: managed-csi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: deployment-azuredisk
  namespace: $SOURCE_NAMESPACE
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      name: deployment-azuredisk
    spec:
      nodeSelector:
        "kubernetes.io/os": linux
      containers:
        - name: deployment-azuredisk
          image: mcr.microsoft.com/oss/nginx/nginx:1.17.3-alpine
          command:
            - "/bin/sh"
            - "-c"
            - while true; do echo $(date) >> /mnt/azuredisk/outfile; sleep 1; done
          volumeMounts:
            - name: azuredisk
              mountPath: "/mnt/azuredisk"
              readOnly: false
      volumes:
        - name: azuredisk
          persistentVolumeClaim:
            claimName: pvc-azuredisk
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
EOF

kubectl rollout status deployment deployment-azuredisk --watch

# Create a snapshot of the workload's PersistentVolume and wait for it to be ready to use.
cat <<EOF | kubectl apply -f -
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: $SOURCE_VS_NAME
  namespace: $SOURCE_NAMESPACE
spec:
  volumeSnapshotClassName: csi-azuredisk-vsc
  source:
    persistentVolumeClaimName: $SOURCE_PVC_NAME
EOF

export SOURCE_VSC_NAME
SOURCE_VSC_NAME="$(kubectl get vs "$SOURCE_VS_NAME" --output jsonpath='{.status.boundVolumeSnapshotContentName}')"

while [[ "$(kubectl get vsc "$SOURCE_VSC_NAME" --output jsonpath='{.status.readyToUse}')" != "true" ]]; do
  sleep 1
done

# Copy the snapshot to the target region and resource group.
export SOURCE_AZURE_SNAPSHOT_HANDLE
SOURCE_AZURE_SNAPSHOT_HANDLE="$(kubectl get vsc "$SOURCE_VSC_NAME" --output jsonpath='{.status.snapshotHandle}')"

export SOURCE_AZURE_SNAPSHOT_INFO=(${SOURCE_AZURE_SNAPSHOT_HANDLE//\// })
export SOURCE_AZURE_SNAPSHOT_NAME="${SOURCE_AZURE_SNAPSHOT_INFO[-1]}"
unset SOURCE_AZURE_SNAPSHOT_INFO

export TARGET_AZURE_REGION="eastus2"
export TARGET_CLUSTER_CONTEXT="edreed-aks-eus2"
export TARGET_AZURE_RESOURCE_GROUP="edreed-aks-eus2-nodepool-rg"
export TARGET_AZURE_SNAPSHOT_NAME="$SOURCE_AZURE_SNAPSHOT_NAME-$TARGET_AZURE_REGION"

export TARGET_AZURE_SNAPSHOT_HANDLE
TARGET_AZURE_SNAPSHOT_HANDLE="$(az snapshot create \
  --resource-group "$TARGET_AZURE_RESOURCE_GROUP" \
  --name "$TARGET_AZURE_SNAPSHOT_NAME" \
  --location "$TARGET_AZURE_REGION" \
  --source "$SOURCE_AZURE_SNAPSHOT_HANDLE" \
  --incremental \
  --copy-start \
  --query "[id]" \
  --output tsv)"

# Wait for the cross-region copy to complete.
while [[ "$(az snapshot show --resource-group "$TARGET_AZURE_RESOURCE_GROUP" --name "$TARGET_AZURE_SNAPSHOT_NAME" --query completionPercent --output tsv)" != "100.0" ]]; do
  sleep 30
done

# Connect to the target cluster.
export TARGET_NAMESPACE="default"
export TARGET_PVC_NAME="$SOURCE_PVC_NAME-$TARGET_AZURE_REGION"
export TARGET_VS_NAME="$SOURCE_VS_NAME-$TARGET_AZURE_REGION"
export TARGET_VSC_NAME="$SOURCE_VSC_NAME-$TARGET_AZURE_REGION"

kubectl config use-context "$TARGET_CLUSTER_CONTEXT"

# Create the pre-provisioned VolumeSnapshotContent and VolumeSnapshot objects for the target snapshot.
cat <<EOF | kubectl apply -f -
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: $TARGET_VSC_NAME
  namespace: $TARGET_NAMESPACE
spec:
  deletionPolicy: Delete
  driver: disk.csi.azure.com
  source:
    snapshotHandle: $TARGET_AZURE_SNAPSHOT_HANDLE
  volumeSnapshotRef:
    name: $TARGET_VS_NAME
    namespace: $TARGET_NAMESPACE
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: $TARGET_VS_NAME
  namespace: $TARGET_NAMESPACE
spec:
  volumeSnapshotClassName: csi-azuredisk-vsc
  source:
    volumeSnapshotContentName: $TARGET_VSC_NAME
EOF

# Deploy a stateful workload using a volume cloned from the target snapshot.
cat <<EOF | kubectl apply -f -
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: $TARGET_PVC_NAME
  namespace: $TARGET_NAMESPACE
spec:
  accessModes:
    - ReadWriteOnce
  dataSource:
    name: $TARGET_VS_NAME
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  resources:
    requests:
      storage: 10Gi
  storageClassName: managed-csi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: deployment-azuredisk
  namespace: $TARGET_NAMESPACE
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      name: deployment-azuredisk
    spec:
      nodeSelector:
        "kubernetes.io/os": linux
      containers:
        - name: deployment-azuredisk
          image: mcr.microsoft.com/oss/nginx/nginx:1.17.3-alpine
          command:
            - "/bin/sh"
            - "-c"
            - while true; do echo $(date) >> /mnt/azuredisk/outfile; sleep 1; done
          volumeMounts:
            - name: azuredisk
              mountPath: "/mnt/azuredisk"
              readOnly: false
      volumes:
        - name: azuredisk
          persistentVolumeClaim:
            claimName: $TARGET_PVC_NAME
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
    type: RollingUpdate
EOF

kubectl rollout status deployment deployment-azuredisk --watch
slawekww commented 1 year ago

Hi @edreed, thank you for example and script!

As my cluster has over 500+ pvcs to backup every night, I look for automatic tools which would be able to create backup for them. Currently I use Velero with Uploader=Kopia which allows to restore pvc on different cluster (or even re-created cluster with the same name) for many Storage Classes including Azure CSI Disk. Velero requires compute resources to create backup (run agent on each node), in compare to Azure snapshots which is triggered as Azure API and it is out of cluster compute resources. Note: Backup is for Disaster Recovery scenario when workloads from initial cluster must be restored on another cluster (different region most likely).

andyzhangx commented 1 year ago

we do have a WIP pr(https://github.com/kubernetes-sigs/azuredisk-csi-driver/pull/1791) to provide taking snapshot in region B on disk in region A, the snapshot can only be used to restore disk on region B, and cannot be attached to current AKS cluster node in region A, is that what you wanted? @slawekww Mainly such cross region snapshot is for disaster recovery

slawekww commented 1 year ago

@andyzhangx Yes, it is exactly scenario I look for. Goal is to restore snapshot in case of disaster recovery cluster at different region (default region is paired region). Perfect solution would be use this functionality by Velero and enable to create snapshots in two regions (original and paired region).