vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.77k stars 1.41k forks source link

AuthorizationFailure - This request is not authorized to perform this operation #4802

Closed murech closed 1 year ago

murech commented 2 years ago

What steps did you take and what happened: I executed command velero backup create --include-namespaces k8s-example. Backup is showing status PartiallyFailed and command velero backup logs shows message AuthorizationFailure - This request is not authorized to perform this operation..

What did you expect to happen: Velero backup can be performed for Azure Disk CSI Drivers (disk.csi.azure.com) and velero logs are shown.

The following information will help us better understand what's going on:

$ kubectl logs deployment/velero -n velero | grep error
time="2022-03-31T15:30:02Z" level=info msg="1 errors encountered backup up item" backup=velero/k8s-example logSource="pkg/backup/backup.go:413" name=k8s-example-665dd8bc46-9pfnx
time="2022-03-31T15:30:02Z" level=error msg="Error backing up item" backup=velero/k8s-example error="error executing custom action (groupResource=persistentvolumeclaims, namespace=k8s-example, name=k8s-example-disk-storage): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass managed-csi: failed to get volumesnapshotclass for provisioner disk.csi.azure.com, ensure that the desired volumesnapshot class has the velero.io/csi-volumesnapshot-class label" logSource="pkg/backup/backup.go:417" name=k8s-example-665dd8bc46-9pfnx
$ velero backup describe k8s-example --details
Name:         ←[1mk8s-example←[0m
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.7
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21

Phase:  ←[31mPartiallyFailed←[0m (run `velero backup logs k8s-example` for more information)

Errors:    1
Warnings:  0

Namespaces:
  Included:  k8s-example
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-03-31 17:29:48 +0200 CEST
Completed:  2022-03-31 17:30:03 +0200 CEST

Expiration:  2022-04-30 17:29:48 +0200 CEST

Total items to be backed up:  38
Items backed up:              38

Resource List:  <error getting backup resource list: request failed: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationFailure</Code><Message>This request is not authorized to perform this operation.
RequestId:7b983d39-201e-0053-7915-45dcf4000000
Time:2022-03-31T15:42:31.0244604Z</Message></Error>>

Velero-Native Snapshots:  <error getting snapshot info: request failed: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationFailure</Code><Message>This request is not authorized to perform this operation.
RequestId:6300e225-801e-005a-5615-459927000000
Time:2022-03-31T15:42:31.7912797Z</Message></Error>>

CSI Volume Snapshots: <none included>
$ velero plugin get
NAME                                            KIND
velero.io/crd-remap-version                     BackupItemAction
velero.io/csi-pvc-backupper                     BackupItemAction
velero.io/csi-volumesnapshot-backupper          BackupItemAction
velero.io/csi-volumesnapshotclass-backupper     BackupItemAction
velero.io/csi-volumesnapshotcontent-backupper   BackupItemAction
velero.io/pod                                   BackupItemAction
velero.io/pv                                    BackupItemAction
velero.io/service-account                       BackupItemAction
velero.io/csi-volumesnapshot-delete             DeleteItemAction
velero.io/csi-volumesnapshotcontent-delete      DeleteItemAction
velero.io/azure                                 ObjectStore
velero.io/add-pv-from-pvc                       RestoreItemAction
velero.io/add-pvc-from-pod                      RestoreItemAction
velero.io/admission-webhook-configuration       RestoreItemAction
velero.io/apiservice                            RestoreItemAction
velero.io/change-pvc-node-selector              RestoreItemAction
velero.io/change-storage-class                  RestoreItemAction
velero.io/cluster-role-bindings                 RestoreItemAction
velero.io/crd-preserve-fields                   RestoreItemAction
velero.io/csi-pvc-restorer                      RestoreItemAction
velero.io/csi-volumesnapshot-restorer           RestoreItemAction
velero.io/csi-volumesnapshotclass-restorer      RestoreItemAction
velero.io/csi-volumesnapshotcontent-restorer    RestoreItemAction
velero.io/init-restore-hook                     RestoreItemAction
velero.io/job                                   RestoreItemAction
velero.io/pod                                   RestoreItemAction
velero.io/restic                                RestoreItemAction
velero.io/role-bindings                         RestoreItemAction
velero.io/service                               RestoreItemAction
velero.io/service-account                       RestoreItemAction
velero.io/azure                                 VolumeSnapshotter
$ velero get backup-location
NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                   ACCESS MODE   DEFAULT
default   azure      velero          Available   2022-03-31 17:44:59 +0200 CEST   ReadWrite     true
$ velero get snapshot-location
NAME                    PROVIDER
azure-volume-snapshot   azure

Anything else you would like to add: Cloud Credentials: secret is using Azure Service Principal (SPN) credentials and SPN has role 'Contributor' for resource group and Storage Account. Azure Storage Account Replication: Geo-redundant storage (GRS) Azure Storage Account Kind: StorageV2 (general purpose v2) Azure Storage Account Containers: velero

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

reasonerjt commented 2 years ago

Hi,

As for the volumesnapshot class issue, you need to create the volumesnapshotclass on the cluster and explicitly set the label to tell velero to use it: https://velero.io/docs/v1.8/csi/#installing-velero-with-csi-support

As for the authorization error, the way velero backup describe xx --details works is that it tries to download additional information from the azure object store to populate more information. This seems an issue in the setting in azure, that you can download the resource via the URL generated by the azure plugin.

murech commented 2 years ago

Hi @reasonerjt Thanks for the information about having to create a VolumeSnapshotClass class. I was able to perform the snapshot by creating the following VolumeSnapshotClass:

apiVersion: snapshot.storage.k8s.io/v1 
kind: VolumeSnapshotClass 
metadata: 
    name: csi-azure-disk-vsc
    labels:
        velero.io/csi-volumesnapshot-class: "true"
driver: disk.csi.azure.com 
deletionPolicy: Retain
murech commented 2 years ago

As for the authorization error, I did some further investigations in order to reproduce the error.

1) request velero logs :

$ velero backup logs k8s-example
An error occurred: request failed: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationFailure</Code><Message>This request is not authorized to perform this operation.
RequestId:eb005a0d-701e-0061-66f7-48dc83000000
Time:2022-04-05T14:18:08.4342515Z</Message></Error>

2) verify downloadrequests in velero namespace:

$ kubectl get downloadrequests -n velero
NAME                                                AGE
k8s-example-72928509-0ea0-4841-9ed6-0c2eb08de176   26s

3) show downloadrequests:

ame:         k8s-example-72928509-0ea0-4841-9ed6-0c2eb08de176
Namespace:    velero
Labels:       <none>
Annotations:  <none>
API Version:  velero.io/v1
Kind:         DownloadRequest
Metadata:
  Creation Timestamp:  2022-04-05T14:18:07Z
  Generation:          1
  Managed Fields:
    API Version:  velero.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:downloadURL:
        f:expiration:
        f:phase:
    Manager:      velero-server
    Operation:    Update
    Time:         2022-04-05T14:18:07Z
    API Version:  velero.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        .:
        f:target:
          .:
          f:kind:
          f:name:
    Manager:         velero.exe
    Operation:       Update
    Time:            2022-04-05T14:18:07Z
  Resource Version:  99480986
  UID:               3dd68257-294e-4601-93ee-97aa3e2f553c
Spec:
  Target:
    Kind:  BackupLog
    Name:  k8s-example
Status:
  Download URL:  https://somestorageaccount.blob.core.windows.net/velero/backups/k8s-example/k8s-example-logs.gz?se=2022-04-05T14%3A28%3A07Z&sig=2Sp8Kk%2BDSUh2OzN%2BSc7dMj0jsSr74vTcGoIcdK7Ldy8%3D&sp=r&sr=b&sv=2018-03-28
  Expiration:    2022-04-05T14:28:07Z
  Phase:         Processed
Events:          <none>

4) open download url with browser:

<Error>
<Code>AuthenticationFailed</Code>
<Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:5db452de-301e-0002-13fa-484178000000 Time:2022-04-05T14:32:50.1038740Z</Message>
<AuthenticationErrorDetail>Signature fields not well formed.</AuthenticationErrorDetail>
</Error>

I had a look at the Azure storage account configuration and noticed that Networking (Firewalls and virtual networks) configuration has the virtual network of the AKS cluster specified. This means that the storage account can only be accessed from within the VNET of the AKS cluster.

@reasonerjt:

blackpiglet commented 2 years ago

Not familiar with Azure concept, so cannot confirm on the first question. For the second question, I think the logs of specified backup is already included in Velero server pod. After backup completed, Velero server use backup name as filter to find related information, and upload to S3.

ywk253100 commented 1 year ago

I'm removing the 1.11-candidate label and closing this issue as this works as expected and nothing needs to do in the velero side.

davidkarlsen commented 1 year ago

Indeed the traffic seems to be initiated from the cli. You can verify this by tcpdump'ing:

sudo tcpdump -i any host yourstorageaccount.blob.core.windows.net

and see a bunch of packets. Which means you need to open the accessfilter towards for those client IPs which is a bit questionable security-design IMHO.

It seems that it's a signed url that is returned (using access-token) - and this is fetched from the client directly.

andreyolv commented 1 year ago

Any solution for this?

I'm using private endpoint in azure with public network access disabled. For testing, if I enable public access this error disappears, but is not what I desire.

If I telnet to the private address in the cluster, it resolves to the static IP correcly.

I think what @murech said makes sense. It seems that the cli tries to directly access the DownloadRequest, and how is outside the network, it gives this error.

andreyolv commented 1 year ago

If I create a pod with an Ubuntu image linked to the velero service account, and install the velero cli within that container, I can execute the commands that return the logs without problems. I think this reinforces the above hypothesis.