vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.79k stars 1.41k forks source link

Helm install using temporary credentials with AWS plugin results in api error PermanentRedirect: The bucket you are attempting to access must be addressed using the specified endpoint #8295

Closed dfry closed 1 month ago

dfry commented 1 month ago

What steps did you take and what happened:

I configure temporary credentials with a granted role with the appropriate permissions for velero.

The credentials are saved to a secret in the velero namespace with the following format:

[default]
aws_access_key_id = ******************************
aws_secret_access_key = *********************************************
aws_session_token = FwoGZXIvYXdzEI3//////////wEaDJZjxIDMq2N4EicuPyLkATVHojezSvr1zUDdKyMUFoGD8e5Ge5Oa2QFFtRRLW1p/2Z+7294v+t5fj9AM4p0NxKrdqgvVjBFmqwzhcO6rMOhronkHDqyj7vVzDBQ6e4LCMFziI93EtkZcDKyGQp0zx+3p0+22q/tx+U0Rw3NuqHLsewW4fYA44sZv5xJAdjPTNBo2QR6INwKzP5LzAAA01Ln7983o4LRF8KgEXSqpEwBjfqOFECj5cyYk45/x1nwXqrlr25hkDu9Mt/Y5wQFjPcjTPFl9Wayv3X5knkUg4N2dE10xAAIPOAaA/VPljPAfzvGbkiiegLS4BjItey+0i76Zp/jeF+FeQRUNKHZ/K8pAVfaWrfvraT8Rw6k51Lb+nI+JoHO6JJ0Z
region = eu-west-2
apiVersion: v1
kind: Secret
data:
  cloud: >-
    <<content snipped>>
type: Opaque

I configure the helm chart values as follows:

configuration:
  logLevel: debug
  # Parameters for the BackupStorageLocation(s). Configure multiple by adding other element(s) to the backupStorageLocation slice.
  # See https://velero.io/docs/v1.6/api-types/backupstoragelocation/
  backupStorageLocation:
    # name is the name of the backup storage location where backups should be stored. If a name is not provided,
    # a backup storage location will be created with the name "default". Optional.
    - name: cloudprovider-objectstorage
      # provider is the name for the backup storage location provider.
      provider: velero.io/aws
      # bucket is the name of the bucket to store backups in. Required.
      bucket: ${ARGOCD_ENV_object_storage_bucket}
      prefix: "backups"
      # default indicates this location is the default backup storage location. Optional.
      default: true
      accessMode: ReadWrite
      credential:
        key: ${ARGOCD_ENV_object_storage_secret_key}
        name: ${ARGOCD_ENV_object_storage_secret_name}
      # Additional provider-specific configuration. See link above
      # for details of required/optional fields for your provider.
      config:
        profile: default
        region: ${ARGOCD_ENV_object_storage_region}

  # Parameters for the VolumeSnapshotLocation(s). Configure multiple by adding other element(s) to the volumeSnapshotLocation slice.
  # See https://velero.io/docs/v1.6/api-types/volumesnapshotlocation/
  volumeSnapshotLocation:
    # name is the name of the volume snapshot location where snapshots are being taken. Required.
    - name: cloudprovider-objectstorage
      # provider is the name for the volume snapshot provider.
      provider: velero.io/aws
      credential:
        key: ${ARGOCD_ENV_object_storage_secret_key}
        name: ${ARGOCD_ENV_object_storage_secret_name}
      config:
        profile: default
initContainers:
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:${ARGOCD_ENV_velero_plugin_version}
    imagePullPolicy: IfNotPresent
    volumeMounts:
      - mountPath: /target
        name: plugins
      - name: cloud-credentials
        mountPath: /credentials

credentials:
  useSecret: true
  existingSecret: ${ARGOCD_ENV_object_storage_secret_name}

env vars referenced above are substituted before deployment with the below:

 - name: velero_helm_version
          value: 7.2.1
        - name: velero_namespace
          value: velero
        - name: velero_plugin_version
          value: v1.9.2 (also have tried latest v.10.1, same behaviour)
        - name: object_storage_region
          value: eu-west-2
        - name: object_storage_secret_name
          value: velero-cloud-api-secret
        - name: object_storage_secret_key
          value: cloud
        - name: object_storage_bucket
          value: velero

I am getting the following error message in the logs:

time="2024-10-14T12:35:34Z" level=debug msg="plugin started" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" path=/plugins/velero-plugin-for-aws pid=1191
time="2024-10-14T12:35:34Z" level=debug msg="waiting for RPC address" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" plugin=/plugins/velero-plugin-for-aws
time="2024-10-14T12:35:34Z" level=debug msg="enqueueing resources ..." logSource="pkg/util/kube/periodical_enqueue_source.go:71" resource="*v1.RestoreList"
time="2024-10-14T12:35:34Z" level=debug msg="enqueueing resources ..." logSource="pkg/util/kube/periodical_enqueue_source.go:71" resource="*v1.BackupList"
time="2024-10-14T12:35:34Z" level=debug msg="no resources, skip" logSource="pkg/util/kube/periodical_enqueue_source.go:82" resource="*v1.RestoreList"
time="2024-10-14T12:35:34Z" level=debug msg="no resources, skip" logSource="pkg/util/kube/periodical_enqueue_source.go:82" resource="*v1.BackupList"
time="2024-10-14T12:35:34Z" level=debug msg="Setting log level to DEBUG" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="/go/pkg/mod/github.com/vmware-tanzu/velero@v1.13.0/pkg/plugin/framework/server.go:242" pluginName=velero-plugin-for-aws
time="2024-10-14T12:35:34Z" level=debug msg="plugin address" address=/tmp/plugin1638687025 backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" network=unix pluginName=velero-plugin-for-aws
time="2024-10-14T12:35:34Z" level=debug msg="using plugin" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" version=2
time="2024-10-14T12:35:34Z" level=debug msg="waiting for stdio data" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" pluginName=stdio
time="2024-10-14T12:35:34Z" level=error msg="Error listing backups in backup store" backupLocation=velero/cloudprovider-objectstorage controller=backup-sync error="rpc error: code = Unknown desc = operation error S3: ListObjectsV2, https response error StatusCode: 301, RequestID: X4G2CN5NRT0YG904, HostID: jKhfW3WO2sYeWyGbVfLmec6iWWtlZjRKWBVtGDtgxGSQW0NOO6YA+Zl6b7QV8tLngs7cuVMEd/R7xicYwxaueA==, api error PermanentRedirect: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint." error.file="/go/src/velero-plugin-for-aws/velero-plugin-for-aws/object_store.go:338" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:109"
time="2024-10-14T12:35:34Z" level=debug msg="received EOF, stopping recv loop" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync err="rpc error: code = Unavailable desc = error reading from server: EOF" logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75" pluginName=stdio
time="2024-10-14T12:35:34Z" level=info msg="plugin process exited" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync id=1191 logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-aws
time="2024-10-14T12:35:34Z" level=debug msg="plugin exited" backupLocation=velero/cloudprovider-objectstorage cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/process/logrus_adapter.go:75"

What did you expect to happen:

I expect the s3 commands to not fail.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add: I also tested this with aws cli v2 as well as the github.com/aws/aws-sdk-go-v2/aws making use of the same credentials as with the plugin, with no issues.

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

blackpiglet commented 1 month ago

I'm not an expert on the Velero Helm chart, so I may not be able to guide you through how to set up the environment by Helm, but I found some issues in your Helm values.yaml configuration. That may help you to further debug the error you met.

First, the provider should aws, not velero.io/aws.

      # provider is the name for the backup storage location provider.
      provider: aws

Please find the example of setting AWS environment in this document: https://github.com/vmware-tanzu/velero-plugin-for-aws?tab=readme-ov-file#install-and-start-velero

Second, please consider whether the prefix is needed.

prefix: "backups"

The prefix is used to host multiple Velero Object Storage in the same bucket. You use a random ID as the prefix to make sure it will not conflict with others, but backups seems not a good one.

dfry commented 1 month ago

thanks for the suggestions @blackpiglet , regarding the provider name and the prefix, i had changed those for troubleshooting purposes. I changed the provider to aws and removed the prefix, same error:

apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  creationTimestamp: '2024-10-11T13:55:09Z'
  generation: 5058
  labels:
    app.kubernetes.io/instance: velero
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: velero
    helm.sh/chart: velero-7.2.1
    k8slens-edit-resource-version: v1
  name: cloudprovider-objectstorage
  namespace: velero
  resourceVersion: '4360464'
  uid: d269e2b4-2347-42c6-8419-a242ade95808
spec:
  accessMode: ReadWrite
  config:
    profile: default
    region: eu-west-2
  credential:
    key: cloud
    name: velero-cloud-api-secret
  default: true
  objectStorage:
    bucket: velero
  provider: aws
status:
  lastValidationTime: '2024-10-15T08:25:39Z'
  message: >-
    BackupStorageLocation "cloudprovider-objectstorage" is unavailable: rpc
    error: code = Unknown desc = operation error S3: ListObjectsV2, https
    response error StatusCode: 301, RequestID: MD7ZCESMM2ZYZR44, HostID:
    iAClEl8eUfcfXOhBhiLz/zHLTmk65k53lCaqh5CBE89exEs27+ez3UN+54W+U5iydXp6g2koAs0=,
    api error PermanentRedirect: The bucket you are attempting to access must be
    addressed using the specified endpoint. Please send all future requests to
    this endpoint.
  phase: Unavailable

I have a suspicion that the underlying cause has to do with the bucket being created in a non-default region, I am going to do some more tests on my side, but the logs indicate that the endpoint that the object store code uses results in a redirect.

dfry commented 1 month ago

I found the issue and it is related to the bucket. My cicd was misconfigured and was using "velero" as the bucket which obviously wouldn't work since it is not unique and/or owned by my account.

In any case, I will close this and leave it with this feedback that obviously the error message is misleading. The credentials I am using don't have the correct IAM role to see the "velero" bucket so I would have expected a 403 or something else.