vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.42k stars 1.37k forks source link

VolumeSnapshotContents are retained after VolumeSnapshots have been deleted #7511

Closed dmrub closed 2 months ago

dmrub commented 4 months ago

What steps did you take and what happened:

I have a Velero schedule that creates a backup of the running application at regular intervals, including the creation of CSI snapshots. The VolumeSnapshotClass used has the deletionPolicy set to Delete. After the backup, I see that there are no VolumeSnapshots, but there are still VolumeSnapshotContents.

What did you expect to happen: VolumeSnapshotContents are also deleted.

The following information will help us better understand what's going on:

bundle-2024-03-08-13-10-57.tar.gz

Anything else you would like to add:

I see errors in output of the snapshot-controller

$ kubectl logs -n kube-system snapshot-controller-c4f65cdf6-ghfhp
...
I0308 08:10:57.956643       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"velero-nginx-example-backup-every-two-hours-20240308081026qttc9", UID:"938944ac-9078-4bf0-bac1-5287e55ecd57", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1344536", FieldPath:""}): type: 'Normal' reason: 'CreatingSnapshot' Waiting for a snapshot velero/velero-nginx-example-backup-every-two-hours-20240308081026qttc9 to be created by the CSI driver.
I0308 08:11:00.575457       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"velero-nginx-example-backup-every-two-hours-20240308081026qttc9", UID:"938944ac-9078-4bf0-bac1-5287e55ecd57", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1344544", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot velero/velero-nginx-example-backup-every-two-hours-20240308081026qttc9 was successfully created by the CSI driver.
I0308 08:11:00.575513       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"velero-nginx-example-backup-every-two-hours-20240308081026qttc9", UID:"938944ac-9078-4bf0-bac1-5287e55ecd57", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1344544", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot velero/velero-nginx-example-backup-every-two-hours-20240308081026qttc9 is ready to use.
I0308 08:11:00.582243       1 snapshot_controller.go:1020] checkandRemovePVCFinalizer[velero-nginx-example-backup-every-two-hours-20240308081026qttc9]: Remove Finalizer for PVC nginx-example-backup-every-two-hours-20240308081026-n955z as it is not used by snapshots in creation
I0308 08:11:00.587981       1 snapshot_controller.go:1020] checkandRemovePVCFinalizer[velero-nginx-example-backup-every-two-hours-20240308081026qttc9]: Remove Finalizer for PVC nginx-example-backup-every-two-hours-20240308081026-n955z as it is not used by snapshots in creation
E0308 08:11:00.591900       1 snapshot_controller.go:1023] checkandRemovePVCFinalizer [velero-nginx-example-backup-every-two-hours-20240308081026qttc9]: removePVCFinalizer failed to remove finalizer snapshot controller failed to update nginx-example-backup-every-two-hours-20240308081026-n955z on API server: Operation cannot be fulfilled on persistentvolumeclaims "nginx-example-backup-every-two-hours-20240308081026-n955z": the object has been modified; please apply your changes to the latest version and try again
E0308 08:11:00.591933       1 snapshot_controller.go:191] error check and remove PVC finalizer for snapshot [velero-nginx-example-backup-every-two-hours-20240308081026qttc9]: snapshot controller failed to update nginx-example-backup-every-two-hours-20240308081026-n955z on API server: Operation cannot be fulfilled on persistentvolumeclaims "nginx-example-backup-every-two-hours-20240308081026-n955z": the object has been modified; please apply your changes to the latest version and try again
I0308 08:11:00.592021       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"velero-nginx-example-backup-every-two-hours-20240308081026qttc9", UID:"938944ac-9078-4bf0-bac1-5287e55ecd57", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1344641", FieldPath:""}): type: 'Warning' reason: 'ErrorPVCFinalizer' Error check and remove PVC Finalizer for VolumeSnapshot
E0308 08:11:04.746378       1 snapshot_controller_base.go:470] could not sync snapshot "velero/velero-nginx-example-backup-every-two-hours-20240308081026qttc9": snapshot controller failed to update velero-nginx-example-backup-every-two-hours-20240308081026qttc9 on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-nginx-example-backup-every-two-hours-20240308081026qttc9": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshots/velero/velero-nginx-example-backup-every-two-hours-20240308081026qttc9, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 938944ac-9078-4bf0-bac1-5287e55ecd57, UID in object meta: 
E0308 08:17:36.923486       1 snapshot_controller.go:1369] getSnapshotDriverName: failed to get snapshotContent: nginx-example-backup-every-two-hours-20240308081026-n955z
E0308 08:17:36.931809       1 snapshot_controller.go:1369] getSnapshotDriverName: failed to get snapshotContent: nginx-example-backup-every-two-hours-20240308081026-n955z
I0308 10:11:10.159559       1 snapshot_controller.go:660] createSnapshotContent: Creating content for snapshot nginx-example/velero-nginx-logs-lvqv7 through the plugin ...
I0308 10:11:10.166366       1 snapshot_controller.go:941] Added protection finalizer to persistent volume claim nginx-example/nginx-logs
I0308 10:11:10.174053       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"nginx-example", Name:"velero-nginx-logs-lvqv7", UID:"09d5785a-4cd7-4be0-810a-22d3edbbd616", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401519", FieldPath:""}): type: 'Normal' reason: 'CreatingSnapshot' Waiting for a snapshot nginx-example/velero-nginx-logs-lvqv7 to be created by the CSI driver.
I0308 10:11:14.089532       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"nginx-example", Name:"velero-nginx-logs-lvqv7", UID:"09d5785a-4cd7-4be0-810a-22d3edbbd616", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401528", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot nginx-example/velero-nginx-logs-lvqv7 was successfully created by the CSI driver.
I0308 10:11:14.089568       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"nginx-example", Name:"velero-nginx-logs-lvqv7", UID:"09d5785a-4cd7-4be0-810a-22d3edbbd616", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401528", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot nginx-example/velero-nginx-logs-lvqv7 is ready to use.
I0308 10:11:14.095036       1 snapshot_controller.go:1020] checkandRemovePVCFinalizer[velero-nginx-logs-lvqv7]: Remove Finalizer for PVC nginx-logs as it is not used by snapshots in creation
I0308 10:11:15.210722       1 snapshot_controller.go:660] createSnapshotContent: Creating content for snapshot nginx-example/velero-nginx-html-nwfhb through the plugin ...
I0308 10:11:15.216736       1 snapshot_controller.go:941] Added protection finalizer to persistent volume claim nginx-example/nginx-html
I0308 10:11:15.224968       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"nginx-example", Name:"velero-nginx-html-nwfhb", UID:"1c161693-1873-4fcd-a015-b7841c089421", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401622", FieldPath:""}): type: 'Normal' reason: 'CreatingSnapshot' Waiting for a snapshot nginx-example/velero-nginx-html-nwfhb to be created by the CSI driver.
E0308 10:11:15.227373       1 snapshot_controller_base.go:470] could not sync snapshot "nginx-example/velero-nginx-logs-lvqv7": snapshot controller failed to update velero-nginx-logs-lvqv7 on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-nginx-logs-lvqv7": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshots/nginx-example/velero-nginx-logs-lvqv7, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 09d5785a-4cd7-4be0-810a-22d3edbbd616, UID in object meta: 
I0308 10:11:17.799482       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"nginx-example", Name:"velero-nginx-html-nwfhb", UID:"1c161693-1873-4fcd-a015-b7841c089421", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401633", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot nginx-example/velero-nginx-html-nwfhb was successfully created by the CSI driver.
I0308 10:11:17.799515       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"nginx-example", Name:"velero-nginx-html-nwfhb", UID:"1c161693-1873-4fcd-a015-b7841c089421", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401633", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot nginx-example/velero-nginx-html-nwfhb is ready to use.
I0308 10:11:17.804496       1 snapshot_controller.go:1020] checkandRemovePVCFinalizer[velero-nginx-html-nwfhb]: Remove Finalizer for PVC nginx-html as it is not used by snapshots in creation
I0308 10:11:19.272424       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"nginx-example-backup-every-two-hours-20240308101026-wk8ks", UID:"604097ad-3837-43c5-acd3-0088a3044185", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401711", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot velero/nginx-example-backup-every-two-hours-20240308101026-wk8ks was successfully created by the CSI driver.
I0308 10:11:19.272488       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"nginx-example-backup-every-two-hours-20240308101026-wk8ks", UID:"604097ad-3837-43c5-acd3-0088a3044185", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401711", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot velero/nginx-example-backup-every-two-hours-20240308101026-wk8ks is ready to use.
E0308 10:11:20.266638       1 snapshot_controller_base.go:470] could not sync snapshot "nginx-example/velero-nginx-html-nwfhb": snapshot controller failed to update velero-nginx-html-nwfhb on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-nginx-html-nwfhb": StorageError: invalid object, Code: 4, Key: /registry/snapshot.storage.k8s.io/volumesnapshots/nginx-example/velero-nginx-html-nwfhb, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 1c161693-1873-4fcd-a015-b7841c089421, UID in object meta: 
I0308 10:11:24.315547       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"nginx-example-backup-every-two-hours-20240308101026-4p6kr", UID:"9a0c821b-1bea-43e0-85e5-42d5638a9b3a", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401853", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot velero/nginx-example-backup-every-two-hours-20240308101026-4p6kr was successfully created by the CSI driver.
I0308 10:11:24.315583       1 event.go:364] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"velero", Name:"nginx-example-backup-every-two-hours-20240308101026-4p6kr", UID:"9a0c821b-1bea-43e0-85e5-42d5638a9b3a", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"1401853", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot velero/nginx-example-backup-every-two-hours-20240308101026-4p6kr is ready to use.
E0308 10:11:30.361268       1 snapshot_controller.go:1369] getSnapshotDriverName: failed to get snapshotContent: nginx-example-backup-every-two-hours-20240308101026-wk8ks
E0308 10:11:30.368665       1 snapshot_controller.go:1369] getSnapshotDriverName: failed to get snapshotContent: nginx-example-backup-every-two-hours-20240308101026-wk8ks
E0308 10:16:51.417502       1 snapshot_controller.go:1369] getSnapshotDriverName: failed to get snapshotContent: nginx-example-backup-every-two-hours-20240308101026-4p6kr
E0308 10:16:51.425915       1 snapshot_controller.go:1369] getSnapshotDriverName: failed to get snapshotContent: nginx-example-backup-every-two-hours-20240308101026-4p6kr

Additional command line output

$ velero get schedule nginx-example-backup-every-two-hours
NAME                                   STATUS    CREATED                         SCHEDULE              BACKUP TTL   LAST BACKUP   SELECTOR   PAUSED
nginx-example-backup-every-two-hours   Enabled   2024-03-06 15:13:05 +0100 CET   10 0,8-20/2 * * 1-6   10h0m0s      1h ago        <none>     false

$ kubectl get volumesnapshots.snapshot.storage.k8s.io -A
No resources found

$ kubectl get volumesnapshotcontents.snapshot.storage.k8s.io -A
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER                   VOLUMESNAPSHOTCLASS   VOLUMESNAPSHOT                              VOLUMESNAPSHOTNAMESPACE                   AGE
snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1   true         10737418240   Retain           linstor.csi.linbit.com   linstor               name-7745dfc1-7ca9-45e6-b71d-bb65a32d0a37   ns-7745dfc1-7ca9-45e6-b71d-bb65a32d0a37   15h
snapcontent-8a82c317-fed4-44a3-9c26-248681cb11a9   true         10737418240   Retain           linstor.csi.linbit.com   linstor               name-522798f5-2388-4384-96a8-f2ef88a844e9   ns-522798f5-2388-4384-96a8-f2ef88a844e9   17h
snapcontent-938944ac-9078-4bf0-bac1-5287e55ecd57   true         10737418240   Retain           linstor.csi.linbit.com   linstor               name-d2738fe9-85c2-43d2-b886-4b7ff0fee405   ns-d2738fe9-85c2-43d2-b886-4b7ff0fee405   63m
snapcontent-b2b2e36e-90ee-4984-8628-6f02b99b1463   true         10737418240   Retain           linstor.csi.linbit.com   linstor               name-a1ab0000-307c-4842-8439-edb6c81f227b   ns-a1ab0000-307c-4842-8439-edb6c81f227b   9h

$ kubectl describe volumesnapshotcontents.snapshot.storage.k8s.io snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1 
Name:         snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1
Namespace:    
Labels:       velero.io/backup-name=velero-backup-every-two-hours-20240307181025
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1
Kind:         VolumeSnapshotContent
Metadata:
  Creation Timestamp:  2024-03-07T18:11:26Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
  Generation:        1
  Resource Version:  947440
  UID:               af8bfd2f-40ad-4705-add5-a807ec0fa023
Spec:
  Deletion Policy:  Retain
  Driver:           linstor.csi.linbit.com
  Source:
    Snapshot Handle:           snapshot-24a68189-c19b-4a1b-968c-ceaa3e2876a1
  Volume Snapshot Class Name:  linstor
  Volume Snapshot Ref:
    API Version:  snapshot.storage.k8s.io/v1
    Kind:         VolumeSnapshot
    Name:         name-7745dfc1-7ca9-45e6-b71d-bb65a32d0a37
    Namespace:    ns-7745dfc1-7ca9-45e6-b71d-bb65a32d0a37
Status:
  Creation Time:    1709835080862000000
  Ready To Use:     true
  Restore Size:     10737418240
  Snapshot Handle:  snapshot-24a68189-c19b-4a1b-968c-ceaa3e2876a1
Events:             <none>

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

kaovilai commented 4 months ago

This is expected.

VolumeSnapshotContent is first patched to Retain so that after backup namespaced resource VolumeSnapshot is removed so that namespace could be deleted without cascading deletion to (now Retained) VolumeSnapshotContent that is needed to restore from backup.

The VolumeSnapshot objects will be removed from the cluster after the backup is uploaded to the object storage, so that the namespace that is backed up can be deleted without removing the snapshot in the storage provider if the DeletionPolicy is Delete.

The only case when volumesnapshotcontent objects will be removed by velero is when backup is expired or deleted.

When the Velero backup expires, the VolumeSnapshot objects will be deleted and the VolumeSnapshotContent objects will be updated to have a DeletionPolicy of Delete, to free space on the storage system.

kaovilai commented 4 months ago

After the backup, I see that there are no VolumeSnapshots, but there are still VolumeSnapshotContents.

If volumesnapshotcontents are removed as well, velero wouldn't be able to restore your data.

Alternatively you can look into https://velero.io/docs/v1.13/csi-snapshot-data-movement/ which removes the need for retained snapshot on cluster by moving data to object store.

dmrub commented 4 months ago

I use snapshot data movement, so I expect "VolumeSnapshotContent" objects to always be deleted after the backup and data movement are complete. Here is my Velero schedule:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: nginx-example-backup-every-two-hours
  namespace: velero
  annotations:
    velero.io/csi-volumesnapshot-class_disk.csi.cloud.com: "linstor"
spec:
  schedule: "10 0,8-20/2 * * 1-6"
  template:
    csiSnapshotTimeout: 20m
    snapshotVolumes: true
    snapshotMoveData: true
    includedNamespaces:
      - "nginx-example"
    includedResources:
      - "*"
    storageLocation: default
    volumeSnapshotLocations:
      - default
    ttl: 168h0m0s
weshayutin commented 4 months ago

hrm.. I'm no expert, but wouldn't you want to ONLY set snapshotMoveData: true and not snapshotVolumes?

kaovilai commented 4 months ago

Related PRs: https://github.com/vmware-tanzu/velero/pull/6827 Already fixed issue: https://github.com/vmware-tanzu/velero/issues/6786 Also cherrypicked to 1.12

1.13.0 specifically contain this fix.

kaovilai commented 4 months ago

other relevant CSI issues:

kaovilai commented 4 months ago

Ok based on the bundle file provided at kubecapture/velero.io_v1/velero/backups-202403081310.2979.json

you in fact have not enabled snapshotMoveData on the backup name associated with "leftover" VolumeSnapshotContent, so this falls back to the expected behavior case of not using snapshotMoveData with CSI.

        {
            "apiVersion": "velero.io/v1",
            "kind": "Backup",
            "metadata": {
                "annotations": {
                    "velero.io/resource-timeout": "10m0s",
                    "velero.io/source-cluster-k8s-gitversion": "v1.28.7",
                    "velero.io/source-cluster-k8s-major-version": "1",
                    "velero.io/source-cluster-k8s-minor-version": "28"
                },
                "creationTimestamp": "2024-03-07T18:10:25Z",
                "generation": 6,
                "labels": {
                    "kustomize.toolkit.fluxcd.io/name": "stage07",
                    "kustomize.toolkit.fluxcd.io/namespace": "flux-system",
                    "velero.io/schedule-name": "velero-backup-every-two-hours",
                    "velero.io/storage-location": "default"
                },
                "name": "velero-backup-every-two-hours-20240307181025",
                "namespace": "velero",
                "resourceVersion": "947441",
                "uid": "e714b1df-dea1-445b-bbf5-4d83a50afe01"
            },
            "spec": {
                "csiSnapshotTimeout": "10m0s",
                "defaultVolumesToFsBackup": false,
                "hooks": {},
                "includedNamespaces": [
                    "velero"
                ],
                "includedResources": [
                    "*"
                ],
                "itemOperationTimeout": "4h0m0s",
                "metadata": {},
                "snapshotMoveData": false,
                "storageLocation": "default",
                "ttl": "168h0m0s",
                "volumeSnapshotLocations": [
                    "default"
                ]
            },
            "status": {
                "backupItemOperationsAttempted": 2,
                "backupItemOperationsCompleted": 2,
                "completionTimestamp": "2024-03-07T18:11:26Z",
                "csiVolumeSnapshotsAttempted": 1,
                "csiVolumeSnapshotsCompleted": 1,
                "expiration": "2024-03-14T18:11:14Z",
                "formatVersion": "1.1.0",
                "hookStatus": {},
                "phase": "Completed",
                "progress": {
                    "itemsBackedUp": 152,
                    "totalItems": 152
                },
                "startTimestamp": "2024-03-07T18:11:14Z",
                "version": 1
            }
        },
kaovilai commented 4 months ago

Schedule for this backup on the bundle file provided which is now paused, still did not have snapshotMoveData set which means the backup generated from schedule will not use snapshotMoveData.

        {
            "apiVersion": "velero.io/v1",
            "kind": "Schedule",
            "metadata": {
                "creationTimestamp": "2024-03-06T10:09:31Z",
                "generation": 20,
                "labels": {
                    "kustomize.toolkit.fluxcd.io/name": "stage07",
                    "kustomize.toolkit.fluxcd.io/namespace": "flux-system"
                },
                "name": "velero-backup-every-two-hours",
                "namespace": "velero",
                "resourceVersion": "1453983",
                "uid": "36d3badf-b6a4-43ce-b5d6-78a6a4c109cf"
            },
            "spec": {
                "paused": true,
                "schedule": "10 0,8-20/2 * * 1-6",
                "template": {
                    "csiSnapshotTimeout": "0s",
                    "hooks": {},
                    "includedNamespaces": [
                        "velero"
                    ],
                    "includedResources": [
                        "*"
                    ],
                    "itemOperationTimeout": "0s",
                    "metadata": {},
                    "storageLocation": "default",
                    "ttl": "168h0m0s",
                    "volumeSnapshotLocations": [
                        "default"
                    ]
                }
            },
            "status": {
                "lastBackup": "2024-03-08T10:10:26Z",
                "phase": "Enabled"
            }
        }
dmrub commented 4 months ago

@kaovilai maybe this issue is due to some weird misconfiguration? I have multiple schedules and only nginx-example should create volume snapshots and move them to S3 storage. However, there is also a velero-backup schedule that should only store one configuration of the velero namespace:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: velero-backup-every-two-hours
  namespace: velero
spec:
  schedule: '10 0,8-20/2 * * 1-6'
  template:
    includedNamespaces:
      - 'velero'
    includedResources:
      - '*'
    storageLocation: default
    volumeSnapshotLocations:
      - default
    ttl: 168h0m0s

But many of the operations take place in the velero namespace (e.g. data uploads). Now, after you pointed me to the volumesnapshotcontent, which was without volumesnapshot (I actually missed that its name is velero.io/backup-name=velero-backup-every-two-hours-20240307181025 and not nginx-example...) I looked at the backup and saw that the velero-backup contains CSI snapshots of the nginx-example namespace (see below). In the velero-backup-every-two-hours schedule, neither the snapshotMoveData nor snapshotVolumes properties are set. Why does velero-backup get this snapshot from nginx-example? It looks like a temporary snapshot object as part of a nginx-example backup process !

$ velero describe backup velero-backup-every-two-hours-20240307181025  --details
Name:         velero-backup-every-two-hours-20240307181025
Namespace:    velero
Labels:       kustomize.toolkit.fluxcd.io/name=stage07
              kustomize.toolkit.fluxcd.io/namespace=flux-system
              velero.io/schedule-name=velero-backup-every-two-hours
              velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.28.7
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=28

Phase:  Completed

Namespaces:
  Included:  velero
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  168h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-03-07 19:11:14 +0100 CET
Completed:  2024-03-07 19:11:26 +0100 CET

Expiration:  2024-03-14 19:11:14 +0100 CET

Total items to be backed up:  152
Items backed up:              152

Backup Item Operations:
  Operation for volumesnapshots.snapshot.storage.k8s.io velero/velero-nginx-example-backup-every-two-hours-20240307181025w7lhg:
    Backup Item Action Plugin:  velero.io/csi-volumesnapshot-backupper
    Operation ID:               velero/velero-nginx-example-backup-every-two-hours-20240307181025w7lhg/2024-03-07T18:11:24Z
    Items to Update:
              volumesnapshots.snapshot.storage.k8s.io velero/velero-nginx-example-backup-every-two-hours-20240307181025w7lhg
    Phase:    Completed
    Created:  2024-03-07 19:11:24 +0100 CET
    Started:  2024-03-07 19:11:24 +0100 CET
  Operation for volumesnapshotcontents.snapshot.storage.k8s.io /snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1:
    Backup Item Action Plugin:  velero.io/csi-volumesnapshotcontent-backupper
    Operation ID:               snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1/2024-03-07T18:11:24Z
    Items to Update:
              volumesnapshotcontents.snapshot.storage.k8s.io /snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1
    Phase:    Completed
    Created:  2024-03-07 19:11:24 +0100 CET
    Started:  2024-03-07 19:11:24 +0100 CET
Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - backuprepositories.velero.io
    - backups.velero.io
    - backupstoragelocations.velero.io
    - datauploads.velero.io
    - helmreleases.helm.toolkit.fluxcd.io
    - prometheusrules.monitoring.coreos.com
    - schedules.velero.io
    - sealedsecrets.bitnami.com
    - servicemonitors.monitoring.coreos.com
    - volumesnapshotlocations.velero.io
    - volumesnapshots.snapshot.storage.k8s.io
  apps/v1/ControllerRevision:
    - velero/node-agent-58788bcc87
  apps/v1/DaemonSet:
    - velero/node-agent
  apps/v1/Deployment:
    - velero/velero
  apps/v1/ReplicaSet:
    - velero/velero-db67f5587
  bitnami.com/v1alpha1/SealedSecret:
    - velero/credentials-velero
  discovery.k8s.io/v1/EndpointSlice:
    - velero/velero-gn2t9
  helm.toolkit.fluxcd.io/v2beta2/HelmRelease:
    - velero/velero
  monitoring.coreos.com/v1/PrometheusRule:
    - velero/velero
  monitoring.coreos.com/v1/ServiceMonitor:
    - velero/velero
  rbac.authorization.k8s.io/v1/ClusterRole:
    - cluster-admin
  rbac.authorization.k8s.io/v1/ClusterRoleBinding:
    - velero-server
  rbac.authorization.k8s.io/v1/Role:
    - velero/velero-server
  rbac.authorization.k8s.io/v1/RoleBinding:
    - velero/velero-server
  snapshot.storage.k8s.io/v1/VolumeSnapshot:
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt
    - velero/velero-nginx-example-backup-every-two-hours-20240307181025w7lhg
  snapshot.storage.k8s.io/v1/VolumeSnapshotClass:
    - linstor
  snapshot.storage.k8s.io/v1/VolumeSnapshotContent:
    - nginx-example-backup-every-two-hours-20240307181025-hhlnt
    - snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1
  v1/ConfigMap:
    - velero/kube-root-ca.crt
  v1/Endpoints:
    - velero/velero
  v1/Event:
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8dee766f31c9
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8dee777619cc
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8dee7790cd87
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8dee77e2ccda
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8dee77e3174d
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8dee77e85938
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8deef03eb978
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8def2a542157
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt.17ba8def4e25116b
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8ded4ac7f71e
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8ded4b79ec7e
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8ded4b7a4872
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8ded4bef8378
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8ded4c0d9978
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8ded8aef45bd
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8dedc31f02c3
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8dee50854c35
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8defa82f1086
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8defa9066cdd
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8defac35bcc2
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9.17ba8df03daa309e
  v1/Namespace:
    - velero
  v1/PersistentVolume:
    - pvc-71956ca5-daa7-424a-8ffb-78684b7c2ab7
  v1/PersistentVolumeClaim:
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt
  v1/Pod:
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt
    - velero/node-agent-bxdzk
    - velero/node-agent-qm9jm
    - velero/velero-db67f5587-gbk6q
  v1/Secret:
    - velero/credentials-velero
    - velero/sh.helm.release.v1.velero.v1
    - velero/sh.helm.release.v1.velero.v2
    - velero/velero
    - velero/velero-repo-credentials
  v1/Service:
    - velero/velero
  v1/ServiceAccount:
    - velero/default
    - velero/velero-server
  velero.io/v1/Backup:
    - velero/cert-manager-backup-every-two-hours-20240306201024
    - velero/cert-manager-backup-every-two-hours-20240307001024
    - velero/cert-manager-backup-every-two-hours-20240307081024
    - velero/cert-manager-backup-every-two-hours-20240307101025
    - velero/cert-manager-backup-every-two-hours-20240307121025
    - velero/cert-manager-backup-every-two-hours-20240307141025
    - velero/cert-manager-backup-every-two-hours-20240307161025
    - velero/cert-manager-backup-every-two-hours-20240307181025
    - velero/flux-system-backup-every-two-hours-20240306201024
    - velero/flux-system-backup-every-two-hours-20240307001024
    - velero/flux-system-backup-every-two-hours-20240307081024
    - velero/flux-system-backup-every-two-hours-20240307101024
    - velero/flux-system-backup-every-two-hours-20240307121025
    - velero/flux-system-backup-every-two-hours-20240307141025
    - velero/flux-system-backup-every-two-hours-20240307161025
    - velero/flux-system-backup-every-two-hours-20240307181025
    - velero/minio-operator-backup-every-two-hours-20240306201024
    - velero/minio-operator-backup-every-two-hours-20240307001024
    - velero/minio-operator-backup-every-two-hours-20240307081024
    - velero/minio-operator-backup-every-two-hours-20240307101024
    - velero/minio-operator-backup-every-two-hours-20240307121025
    - velero/minio-operator-backup-every-two-hours-20240307141025
    - velero/minio-operator-backup-every-two-hours-20240307161025
    - velero/minio-operator-backup-every-two-hours-20240307181025
    - velero/nginx-example-backup-every-two-hours-20240307101024
    - velero/nginx-example-backup-every-two-hours-20240307121025
    - velero/nginx-example-backup-every-two-hours-20240307135802
    - velero/nginx-example-backup-every-two-hours-20240307141025
    - velero/nginx-example-backup-every-two-hours-20240307161025
    - velero/nginx-example-backup-every-two-hours-20240307181025
    - velero/nginx-linstor-1
    - velero/nginx-linstor-12
    - velero/nginx-linstor-13
    - velero/piraeus-datastore-backup-every-two-hours-20240306201024
    - velero/piraeus-datastore-backup-every-two-hours-20240307001024
    - velero/piraeus-datastore-backup-every-two-hours-20240307081024
    - velero/piraeus-datastore-backup-every-two-hours-20240307101025
    - velero/piraeus-datastore-backup-every-two-hours-20240307121025
    - velero/piraeus-datastore-backup-every-two-hours-20240307141025
    - velero/piraeus-datastore-backup-every-two-hours-20240307161025
    - velero/piraeus-datastore-backup-every-two-hours-20240307181025
    - velero/traefik-backup-every-two-hours-20240306201024
    - velero/traefik-backup-every-two-hours-20240307001024
    - velero/traefik-backup-every-two-hours-20240307081024
    - velero/traefik-backup-every-two-hours-20240307101025
    - velero/traefik-backup-every-two-hours-20240307121025
    - velero/traefik-backup-every-two-hours-20240307141025
    - velero/traefik-backup-every-two-hours-20240307161025
    - velero/traefik-backup-every-two-hours-20240307181025
    - velero/velero-backup-every-two-hours-20240306201024
    - velero/velero-backup-every-two-hours-20240307001024
    - velero/velero-backup-every-two-hours-20240307081024
    - velero/velero-backup-every-two-hours-20240307101025
    - velero/velero-backup-every-two-hours-20240307121025
    - velero/velero-backup-every-two-hours-20240307141025
    - velero/velero-backup-every-two-hours-20240307161025
    - velero/velero-backup-every-two-hours-20240307181025
  velero.io/v1/BackupRepository:
    - velero/nginx-example-default-kopia-29q4v
  velero.io/v1/BackupStorageLocation:
    - velero/default
  velero.io/v1/Schedule:
    - velero/cert-manager-backup-every-two-hours
    - velero/flux-system-backup-every-two-hours
    - velero/minio-operator-backup-every-two-hours
    - velero/nginx-example-backup-every-two-hours
    - velero/piraeus-datastore-backup-every-two-hours
    - velero/traefik-backup-every-two-hours
    - velero/velero-backup-every-two-hours
  velero.io/v1/VolumeSnapshotLocation:
    - velero/default
  velero.io/v2alpha1/DataUpload:
    - velero/nginx-example-backup-every-two-hours-20240307101024-5n5ft
    - velero/nginx-example-backup-every-two-hours-20240307101024-kdvqn
    - velero/nginx-example-backup-every-two-hours-20240307121025-7s77p
    - velero/nginx-example-backup-every-two-hours-20240307121025-df8jq
    - velero/nginx-example-backup-every-two-hours-20240307135802-r6h4s
    - velero/nginx-example-backup-every-two-hours-20240307135802-xhbmn
    - velero/nginx-example-backup-every-two-hours-20240307141025-87fwr
    - velero/nginx-example-backup-every-two-hours-20240307141025-j6p89
    - velero/nginx-example-backup-every-two-hours-20240307161025-j474w
    - velero/nginx-example-backup-every-two-hours-20240307161025-pnllz
    - velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt
    - velero/nginx-example-backup-every-two-hours-20240307181025-xrqg9
    - velero/nginx-linstor-1-5kxg8
    - velero/nginx-linstor-1-c94md
    - velero/nginx-linstor-12-f9j9x
    - velero/nginx-linstor-12-wl8m2
    - velero/nginx-linstor-13-2pgmb
    - velero/nginx-linstor-13-pxfj2

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots:
    velero/nginx-example-backup-every-two-hours-20240307181025-hhlnt:
      Snapshot:
        Operation ID: velero/velero-nginx-example-backup-every-two-hours-20240307181025w7lhg/2024-03-07T18:11:24Z
        Snapshot Content Name: snapcontent-24a68189-c19b-4a1b-968c-ceaa3e2876a1
        Storage Snapshot ID: snapshot-24a68189-c19b-4a1b-968c-ceaa3e2876a1
        Snapshot Size (bytes): 10737418240
        CSI Driver: linstor.csi.linbit.com

  Pod Volume Backups: <none included>

HooksAttempted:  0
HooksFailed:     0
kaovilai commented 3 months ago

It looks like a temporary snapshot object as part of a nginx-example backup process !

I don't think backing up velero namespace was ever recommended.

Tho I have not found a dedicated doc saying so. But there have been chats in the past.

In the meantime you can add --include-cluster-resources=false to this schedule to avoid said issue.

For "syncing schedules" there have been examples of using argocd, we can also reopen https://github.com/vmware-tanzu/velero/issues/2876

kaovilai commented 3 months ago

Why does velero-backup get this snapshot from nginx-example? It looks like a temporary snapshot object as part of a nginx-example backup process !

Velero does simply what it's being told to do, and it does not have any logic currently that "hey this is velero namespace, treat it differently."

kaovilai commented 3 months ago

Nothing was actually broken, this is simply code not living up to your expectation but not exactly malfunctioning.

sseago commented 3 months ago

"In the meantime you can add --include-cluster-resources=false to this schedule to avoid said issue." -- if you're backing up volume information, you probably don't want this, you probably want it set to nil (the default value), since that will pull in only relevant cluser resources, but setting it to false will pull none in -- no VSCs, no PVs, etc.

sseago commented 3 months ago

But to answer the original question, the reason VolumeSnapshotContents are not deleted is that if you're not using datamover, if Velero deletes the VSCs after backup, then it won't be able to restore, since the snapshot bits will be removed. With datamover, VSC contents are copied into the BackupStorageLocation, so VSCs can be deleted post-backup, but without DataMover, the VolumeSnapshotContents are not temporary data, they are required for restore to work.

kaovilai commented 3 months ago

"In the meantime you can add --include-cluster-resources=false to this schedule to avoid said issue." -- if you're backing up volume information, you probably don't want this, you probably want it set to nil (the default value), since that will pull in only relevant cluser resources, but setting it to false will pull none in -- no VSCs, no PVs, etc.

@sseago op wants to backup velero namespace. There's only temporary data mover PVC that they don't care about in the namespace. They probably only want velero.io resources.

dmrub commented 3 months ago

@kaovilai @sseago the last comment describes exactly the situation, we just want to backup all configurations in the velero namespace, like schedules, backup locations, etc. and of course avoid errors like this.

sseago commented 3 months ago

You probably want a specific short list of included resources, then, excluding everything else. You'd probably want BackupStorageLocations, Secrets, and Schedules. I don't think you'd want Backups/Restores/etc. since those aren't really useful without the related BSL resources, and once you add a BSL, any backups in that BSL are synced to the cluster for you.

reasonerjt commented 2 months ago

I think based on the latest comment by @sseago a solution has been provided.

Closing this issue.

headyj commented 1 month ago

@reasonerjt I am personally facing the same issue with the latest version of velero (v1.13.2). Seems that it is happening sporadically with different volumes (sometimes our daily backup goes well, but failed the day after). So I'm not quite sure why this issue has been closed as I don't see any solution in the comments (except excluding the volumes, which obviously is not a good solution)

Is it possible that is has something to do with Argo CD autosync?