vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.63k stars 1.39k forks source link

Azure Plugin sets wrong subscription ID when restoring from different subscription #4191

Open chencivalue opened 3 years ago

chencivalue commented 3 years ago

What steps did you take and what happened: [A clear and concise description of what the bug is and what commands you ran.)

  1. Installed Velero with default backup and snapshot locations in subscription AAA
  2. Created new backup and snapshot locations with read-only permissions in subscription BBB
  3. I tried to restore from the new location (subscription BBB - not default) and got this error since the object ID consists of AAA instead of BBB:

error executing PVAction for persistentvolumes/pvc-398a2563-60e2-4663-9ca9-8eae681569f5: rpc error: code = Unknown desc = compute.SnapshotsClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'xxx' with object id 'xxx' does not have authorization to perform action 'Microsoft.Compute/snapshots/read' over scope '/subscriptions/AAA/resourceGroups/shared-validation/providers/Microsoft.Compute/snapshots/kubernetes-dynamic-pvc-398a2563-60e2-4663-9-8c00c66d-845b-4640-872b-c8efea0a87c8' or the scope is invalid. If access was recently granted, please refresh your credentials."

What did you expect to happen: The snapshot object ID should be created with the correct subscription ID rather than the default one.

The output of the following commands will help us better understand what's going on: (Pasting long output into a GitHub gist or other pastebin is fine.)

> velero restore describe zzzz
Name:        zzzz
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  PartiallyFailed (run 'velero restore logs zzzz-20210923125636' for more information)

Errors:
  Velero:     <none>
  Cluster:  error executing PVAction for persistentvolumes/pvc-398a2563-60e2-4663-9ca9-8eae681569f5: rpc error: code = Unknown desc = compute.SnapshotsClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'xxx' with object id 'xxx' does not have authorization to perform action 'Microsoft.Compute/snapshots/read' over scope '/subscriptions/AAA/resourceGroups/shared-validation/providers/Microsoft.Compute/snapshots/kubernetes-dynamic-pvc-398a2563-60e2-4663-9-8c00c66d-845b-4640-872b-c8efea0a87c8' or the scope is invalid. If access was recently granted, please refresh your credentials."
  Namespaces: <none>

Backup:  zzzz

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  demo-validation=demo-testing

Label selector:  app.kubernetes.io/instance=demo-validation,app.kubernetes.io/name=onlinedb

Restore PVs:  auto

Anything else you would like to add: After manually editing the default snapshot-location subscription id from AAA to BBB I managed to restore from subscription BBB (then I changed it back to AAA)

I took a quick look at the plugin code, and it seems that there is only one snapshot client that is initialized with the subscription id provided in the default location. When trying to restore a snapshot from a different subscription with this snapshot client, the problem occurs.

The snapshot client init (https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/velero-plugin-for-microsoft-azure/volume_snapshotter.go#L159) image

The failed lookup due to wrong subscription id (https://github.com/vmware-tanzu/velero-plugin-for-microsoft-azure/blob/main/velero-plugin-for-microsoft-azure/volume_snapshotter.go#L186) image

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

reasonerjt commented 3 years ago

This is probably a limitation right now and we should clarify if we can cover that in itemsnapshotter.

We should also check if CSI snapshotter can support multiple subscription ID.

The multiple subscription ID probably never worked in azure plugin.

ywk253100 commented 2 years ago

@chencivalue Per my understanding, you are trying to do restore across subscriptions, your use case is as following:

  1. You have a k8s cluster B in subscribe BBB. Velero is installed on this cluster and takes a backup named backup-on-bbb.
  2. Another k8s cluster A is in subscribe AAA. Velero is also installed on this cluster and you want to restore the backup-on-bbb to the cluster A
  3. You create new BSL and VSL pointing to the subscription BBB on cluster A. As Velero syncs backups from BSL automatically, you can see the backup-on-bbb after the first round of sync.
  4. You create a restore velero restore create restore-name --from-backup backup-on-bbb and get the failure

When doing the restoring, Velero always use the same name VSL specified during the backup.

If you don't specify the VSL during the backup on subscription BBB, Velero uses the default VSL as the target of snapshot.

So during the restore on subscription AAA, Velero still uses the VSL named default while subscription of the default VSL on AAA is AAA rather than BBB. So you get the failure.

So when doing the restore across subscriptions, the VSL on the restore-target cluster should be the same name and configuration with the one specified during the backup.

And if Velero supports specifying VSL when doing the restore, your case will be easier, but Velero doesn't support that at this moment