openebs / velero-plugin

Velero plugin for backup/restore of OpenEBS cStor volumes
https://docs.openebs.io
Apache License 2.0
63 stars 32 forks source link

Backup of large volume fails #185

Open miguelcostaUI opened 1 year ago

miguelcostaUI commented 1 year ago

I have a volume that has about 150GB and the backup of it fails.

What steps did you take and what happened:

Using this VolumeSnapshotLocation:

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: openebs.io/cstor-blockstore
  config:
    bucket: velero
    prefix: cstor
    provider: aws
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://10.0.1.221:9000
    restoreAllIncrementalSnapshots: "true"
    autoSetTargetIP: "true"
velero create backup backup-test-cstor-2

Backup is created but fails to upload caused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000).

velero backup logs backup-test-cstor-2
time="2023-05-19T05:02:27Z" level=warning msg="Failed to close file interface : blob (code=Unknown): MultipartUpload: upload multipart failed\n\tupload id: YjQ1ZWE0ODAtN2Q5MS00ZDkyLTg5NDgtMjU5MDZiY2YzMjE0LmJhMDkzODUxLWEzM2ItNDRjYi1hOTdjLWVlMDMxMGEyNTVhNQ\ncaused by: TotalPartsExceeded: exceeded total allowed configured MaxUploadParts (10000). Adjust PartSize to fit in this limit" backup=velero/backup-test-cstor-3 cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/conn.go:322" pluginName=velero-blockstore-openebs
time="2023-05-19T05:02:37Z" level=error msg="Error backing up item" backup=velero/backup-test-cstor-3 error="error taking snapshot of volume: rpc error: code = Unknown desc = Failed to upload snapshot, status:{Failed}" logSource="pkg/backup/backup.go:435" name=influxdb-influxdb2-0

This is strange since I thought the multiPartChunkSize was calculated from the file size.

Then I tried defining the multiPartChunkSize.

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: openebs.io/cstor-blockstore
  config:
    bucket: velero
    prefix: cstor
    provider: aws
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://10.0.1.221:9000
    multiPartChunkSize: 64Mi
    restoreAllIncrementalSnapshots: "true"
    autoSetTargetIP: "true"
velero create backup backup-test-cstor-3

But with this the backup just fails with another error that is not very informative.

velero backup logs backup-test-cstor-3
time="2023-05-18T09:20:03Z" level=info msg="1 errors encountered backup up item" backup=velero/backup-test-cstor-2 logSource="pkg/backup/backup.go:431" name=influxdb-influxdb2-0
time="2023-05-18T09:20:03Z" level=error msg="Error backing up item" backup=velero/backup-test-cstor-2 error="error taking snapshot of volume: rpc error: code = Unavailable desc = error reading from server: EOF" logSource="pkg/backup/backup.go:435" name=influxdb-influxdb2-0

Is there anything that I'm missing to make the backup of large volumes work?

What did you expect to happen: Backup to succeed and upload successfully.

Anything else you would like to add: I'm also receiving a lot of these warnings and I'm not sure what they are or how to fix them.

time="2023-05-18T09:20:03Z" level=warning msg="Epoll wait failed : interrupted system call" backup=velero/backup-test-cstor-2 cmd=/plugins/velero-blockstore-openebs logSource="/go/src/github.com/openebs/velero-plugin/pkg/clouduploader/server.go:302" pluginName=velero-blockstore-openebs

Environment: