vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.55k stars 1.38k forks source link

AWS plugin for velero fails exactly at 2.37 GB on DigitalOcean S3 Object Storage. #8041

Open busyboy77 opened 1 month ago

busyboy77 commented 1 month ago

What steps did you take and what happened: with Velero 1.14.0

I'm running a VM on Contabo with RKE2 running v1.27.12+rke2r1 version of K8s. and I'm trying to push the velero based backups to DigitalOcean's S3 Object Storage.

tied to Digitalocean's S3 Object storage using below given command

velero install \
    --secret-file=./credentials-velero \
    --provider=aws \
    --bucket=efvelero \
    --backup-location-config region=fra1,s3ForcePathStyle=true,s3Url=https://fra1.digitaloceanspaces.com,publicUrl=https://fra1.digitaloceanspaces.com region=fra1 \
    --plugins=velero/velero-plugin-for-aws:v1.10.0 \
    --use-volume-snapshots=true \
    --use-node-agent=true \
    --features=EnableCSI \
    --snapshot-location-config region=fra1\
    --wait

my backup-location status

# velero get backup-locations
NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                   ACCESS MODE   DEFAULT
default   aws        efvelero        Available   2024-07-23 16:31:10 +0200 CEST   ReadWrite     true

list of velero plugins

# velero plugin get
NAME                                            KIND
velero.io/crd-remap-version                     BackupItemAction
velero.io/crd-remap-version                     BackupItemAction
velero.io/pod                                   BackupItemAction
velero.io/pod                                   BackupItemAction
velero.io/pv                                    BackupItemAction
velero.io/pv                                    BackupItemAction
velero.io/service-account                       BackupItemAction
velero.io/service-account                       BackupItemAction
velero.io/csi-pvc-backupper                     BackupItemActionV2
velero.io/csi-volumesnapshot-backupper          BackupItemActionV2
velero.io/csi-volumesnapshotclass-backupper     BackupItemActionV2
velero.io/csi-volumesnapshotcontent-backupper   BackupItemActionV2
velero.io/csi-volumesnapshot-delete             DeleteItemAction
velero.io/csi-volumesnapshotcontent-delete      DeleteItemAction
velero.io/dataupload-delete                     DeleteItemAction
velero.io/aws                                   ObjectStore
velero.io/add-pv-from-pvc                       RestoreItemAction
velero.io/add-pv-from-pvc                       RestoreItemAction
velero.io/add-pvc-from-pod                      RestoreItemAction
velero.io/add-pvc-from-pod                      RestoreItemAction
velero.io/admission-webhook-configuration       RestoreItemAction
velero.io/admission-webhook-configuration       RestoreItemAction
velero.io/apiservice                            RestoreItemAction
velero.io/apiservice                            RestoreItemAction
velero.io/change-image-name                     RestoreItemAction
velero.io/change-image-name                     RestoreItemAction
velero.io/change-pvc-node-selector              RestoreItemAction
velero.io/change-pvc-node-selector              RestoreItemAction
velero.io/change-storage-class                  RestoreItemAction
velero.io/change-storage-class                  RestoreItemAction
velero.io/cluster-role-bindings                 RestoreItemAction
velero.io/cluster-role-bindings                 RestoreItemAction
velero.io/crd-preserve-fields                   RestoreItemAction
velero.io/crd-preserve-fields                   RestoreItemAction
velero.io/dataupload                            RestoreItemAction
velero.io/dataupload                            RestoreItemAction
velero.io/init-restore-hook                     RestoreItemAction
velero.io/init-restore-hook                     RestoreItemAction
velero.io/job                                   RestoreItemAction
velero.io/job                                   RestoreItemAction
velero.io/pod                                   RestoreItemAction
velero.io/pod                                   RestoreItemAction
velero.io/pod-volume-restore                    RestoreItemAction
velero.io/pod-volume-restore                    RestoreItemAction
velero.io/role-bindings                         RestoreItemAction
velero.io/role-bindings                         RestoreItemAction
velero.io/secret                                RestoreItemAction
velero.io/secret                                RestoreItemAction
velero.io/service                               RestoreItemAction
velero.io/service                               RestoreItemAction
velero.io/service-account                       RestoreItemAction
velero.io/service-account                       RestoreItemAction
velero.io/csi-pvc-restorer                      RestoreItemActionV2
velero.io/csi-volumesnapshot-restorer           RestoreItemActionV2
velero.io/csi-volumesnapshotclass-restorer      RestoreItemActionV2
velero.io/csi-volumesnapshotcontent-restorer    RestoreItemActionV2
velero.io/aws 

so far logs kubectl logs deploy velero .txt

tried to take backup of the mongodb pod ( deployed using helm chart

Annotated the pod using

kubectl -n ef-external  annotate pod/mongo-mongodb-0 backup.velero.io/backup-volumes=datadir

Created backup for mongo using

# velero backup create ef-mongo  --include-namespaces ef-external --include-resources pods,persistentvolumeclaims,persistentvolumes   --selector app.kubernetes.io/name=mongodb 

Backup request "ef-mongo" submitted successfully.
Run `velero backup describe ef-mongo` or `velero backup logs ef-mongo` for more details.

So far logs -- still in progress

velero-backup-describe-ef-mongo--details.txt

backup is uploading untill it reaches 2.37 Gb

image

after 2 minutes image

Now the Failure part

backup is failed with

# velero backup describe ef-mongo --details
Name:         ef-mongo
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.27.12+rke2r1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=27

Phase:  Failed (run `velero backup logs ef-mongo` for more information)

Namespaces:
  Included:  ef-external
  Excluded:  <none>

Resources:
  Included:        pods, persistentvolumeclaims, persistentvolumes
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app.kubernetes.io/name=mongodb

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-07-23 16:37:06 +0200 CEST
Completed:  <n/a>

Expiration:  2024-08-22 16:37:06 +0200 CEST

Total items to be backed up:  3
Items backed up:              3

Resource List:  <backup resource list not found>

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots: <none included or not detectable>

  Pod Volume Backups - kopia:
    Completed:
      ef-external/mongo-mongodb-0: datadir

HooksAttempted:  0
HooksFailed:     0

with logs as per kubectl-logs-deployment-velero-n-velero.txt

It always fails the backup exactly at 2.37 Gb with 131 items image

Just to note:

1.,the same backup works with minio deployment without any errors

  1. I have tried to upload a 10GB file to DO s3 bucket and I can confirm there is no resource limit on the DO side. it works for larger files.

What did you expect to happen:

The backup fails after a certain period of time

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help Attached as bundle-2024-07-23-16-48-25.tar.gz

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

Phase: Failed (run velero backup logs ef-mongo for more information)

Namespaces: Included: ef-external Excluded:

Resources: Included: pods, persistentvolumeclaims, persistentvolumes Excluded: Cluster-scoped: auto

Label selector: app.kubernetes.io/name=mongodb

Or label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto Snapshot Move Data: false Data Mover: velero

TTL: 720h0m0s

CSISnapshotTimeout: 10m0s ItemOperationTimeout: 4h0m0s

Hooks:

Backup Format Version: 1.1.0

Started: 2024-07-23 16:37:06 +0200 CEST Completed: <n/a>

Expiration: 2024-08-22 16:37:06 +0200 CEST

Total items to be backed up: 3 Items backed up: 3

Backup Volumes: Velero-Native Snapshots:

CSI Snapshots:

Pod Volume Backups - kopia (specify --details for more information): Completed: 1

HooksAttempted: 0 HooksFailed: 0


- `velero backup logs <backupname>`

velero backup logs ef-mongo An error occurred: file not found


- `velero restore describe <restorename>` or `kubectl get restore/<restorename> -n velero -o yaml` N/A
- `velero restore logs <restorename>` N/A

**Anything else you would like to add:**
<!--Miscellaneous information that will assist in solving the issue.-->

**Environment:**

- Velero version (use `velero version`):  1.14.0
- Velero features (use `velero client config get features`): 

velero client config get features

features:


- Kubernetes version (use `kubectl version`):

v1.27.12+rke2r1


- Kubernetes installer & version:
rke2 
- Cloud provider or hardware configuration:

WMWare VM

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

busyboy77 commented 1 month ago

Update:

I also tried with

kubectl patch deployment velero -n velero --patch \
'{"spec":{"template":{"spec":{"containers":[{"name": "velero", "resources": {"limits":{"cpu": "1", "memory": "2Gi"}, "requests": {"cpu": "1", "memory": "2Gi"}}}]}}}}'

to make sure if this due to resources constraints but still the problem persists.

blackpiglet commented 1 month ago

Please refer this link: https://github.com/vmware-tanzu/velero/issues/7543#issuecomment-2159961416

Apply this configuration should fix your issue.

checksumAlgorithm: ""

Duplicate to #7543