Closed chrislinan closed 2 years ago
Hi @ashish-amarnath Maybe this issue has some relations with this one: https://github.com/vmware-tanzu/velero/issues/3465
@chrislinan After reading the code, I think the S3 file is the metadata of backup's volume snapshot. The file's existence doesn't mean the related snapshots are still kept. Could you please check whether these snapshots still exist in AWS console?
@chrislinan After reading the code, I think the S3 file is the metadata of backup's volume snapshot. The file's existence doesn't mean the related snapshots are still kept. Could you please check whether these snapshots still exist in AWS console?
Yes, these snapshots still exist in AWS console. And I expect them to be delete.
NAME KIND
velero.io/crd-remap-version BackupItemAction
velero.io/csi-pvc-backupper BackupItemAction
velero.io/csi-volumesnapshot-backupper BackupItemAction
velero.io/csi-volumesnapshotclass-backupper BackupItemAction
velero.io/csi-volumesnapshotcontent-backupper BackupItemAction
velero.io/pod BackupItemAction
velero.io/pv BackupItemAction
velero.io/service-account BackupItemAction
velero.io/csi-volumesnapshot-delete DeleteItemAction
velero.io/csi-volumesnapshotcontent-delete DeleteItemAction
velero.io/aws ObjectStore
velero.io/add-pv-from-pvc RestoreItemAction
velero.io/add-pvc-from-pod RestoreItemAction
velero.io/apiservice RestoreItemAction
velero.io/change-pvc-node-selector RestoreItemAction
velero.io/change-storage-class RestoreItemAction
velero.io/cluster-role-bindings RestoreItemAction
velero.io/crd-preserve-fields RestoreItemAction
velero.io/csi-pvc-restorer RestoreItemAction
velero.io/csi-volumesnapshot-restorer RestoreItemAction
velero.io/csi-volumesnapshotclass-restorer RestoreItemAction
velero.io/csi-volumesnapshotcontent-restorer RestoreItemAction
velero.io/init-restore-hook RestoreItemAction
velero.io/job RestoreItemAction
velero.io/pod RestoreItemAction
velero.io/restic RestoreItemAction
velero.io/role-bindings RestoreItemAction
velero.io/service RestoreItemAction
velero.io/service-account RestoreItemAction
velero.io/aws VolumeSnapshotter
velero backup create --from-schedule <my-schedule>
apiVersion: velero.io/v1
kind: Backup
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"velero.io/v1","kind":"Schedule","metadata":{"annotations":{},"labels":{"app.kubernetes.io/name":"velero","helm.sh/chart":"hc-disaster-recovery-2.9.15"},"name":"velero-backup-pvs","namespace":"default"},"spec":{"schedule":"0
0 0 0
0","template":{"excludedNamespaces":["kube-system","kube-public","kube-node-lease","velero-restore-test-alertmanager","velero-restore-test-loki","velero-restore-test-loki-longterm","velero-restore-test-prometheus"],"hooks":{},"includedNamespaces":["*"],"includedResources":["pv","pvc"],"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["alertmanager","logstash","loki","loki-longterm","prometheus"]}]},"ttl":"48h0m0s"}}}
velero.io/source-cluster-k8s-gitversion: v1.20.9
velero.io/source-cluster-k8s-major-version: '1'
velero.io/source-cluster-k8s-minor-version: '20'
creationTimestamp: '2021-11-25T10:16:47Z'
generation: 7
labels:
app.kubernetes.io/name: velero
helm.sh/chart: hc-disaster-recovery-2.9.15
velero.io/schedule-name: velero-backup-pvs
velero.io/storage-location: default
managedFields:
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:kubectl.kubernetes.io/last-applied-configuration': {}
'f:labels':
.: {}
'f:app.kubernetes.io/name': {}
'f:helm.sh/chart': {}
'f:velero.io/schedule-name': {}
'f:spec':
.: {}
'f:excludedNamespaces': {}
'f:hooks': {}
'f:includedNamespaces': {}
'f:includedResources': {}
'f:labelSelector':
.: {}
'f:matchExpressions': {}
'f:ttl': {}
'f:status': {}
manager: velero
operation: Update
time: '2021-11-25T10:16:47Z'
- apiVersion: velero.io/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
'f:velero.io/source-cluster-k8s-gitversion': {}
'f:velero.io/source-cluster-k8s-major-version': {}
'f:velero.io/source-cluster-k8s-minor-version': {}
'f:labels':
'f:velero.io/storage-location': {}
'f:spec':
'f:defaultVolumesToRestic': {}
'f:storageLocation': {}
'f:volumeSnapshotLocations': {}
'f:status':
'f:completionTimestamp': {}
'f:expiration': {}
'f:formatVersion': {}
'f:phase': {}
'f:progress':
.: {}
'f:itemsBackedUp': {}
'f:totalItems': {}
'f:startTimestamp': {}
'f:version': {}
'f:warnings': {}
manager: velero-server
operation: Update
time: '2021-11-25T10:17:03Z'
name: velero-backup-pvs-20211125101647
namespace: default
resourceVersion: '149812'
uid: 588e3e98-d849-4f8b-9717-8641a32a93f1
selfLink: >-
/apis/velero.io/v1/namespaces/default/backups/velero-backup-pvs-20211125101647
status:
completionTimestamp: '2021-11-25T10:17:02Z'
expiration: '2021-11-27T10:16:47Z'
formatVersion: 1.1.0
phase: Completed
progress:
itemsBackedUp: 10
totalItems: 10
startTimestamp: '2021-11-25T10:16:47Z'
version: 1
warnings: 5
spec:
defaultVolumesToRestic: false
excludedNamespaces:
- kube-system
- kube-public
- kube-node-lease
- velero-restore-test-alertmanager
- velero-restore-test-loki
- velero-restore-test-loki-longterm
- velero-restore-test-prometheus
hooks: {}
includedNamespaces:
- '*'
includedResources:
- pv
- pvc
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- alertmanager
- logstash
- loki
- loki-longterm
- prometheus
storageLocation: default
ttl: 48h0m0s
volumeSnapshotLocations:
- default
the snapshots are created successfully on AWS.
2. delete the backup by this command: `velero backup delete <my-backup>`
3. The snapshots still exist on AWS, and the volumesnapshots are still in my cluster.
So the problem is why snapshots are not deleted ?
## logs
time="2021-11-25T10:02:31Z" level=info msg="Setting up backup log" backup=default/velero-backup-pvs-20211125100231 controller=backup logSource="pkg/controller/backup_controller.go:534"
time="2021-11-25T10:02:31Z" level=info msg="Setting up backup temp file" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/controller/backup_controller.go:556"
time="2021-11-25T10:02:31Z" level=info msg="Setting up plugin manager" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/controller/backup_controller.go:563"
time="2021-11-25T10:02:31Z" level=info msg="Getting backup item actions" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/controller/backup_controller.go:567"
time="2021-11-25T10:02:33Z" level=info msg="Setting up backup store to check for backup existence" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/controller/backup_controller.go:573"
time="2021-11-25T10:02:33Z" level=info msg="Writing backup version file" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/backup/backup.go:215"
time="2021-11-25T10:02:33Z" level=info msg="Including namespaces: *" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/backup/backup.go:221"
time="2021-11-25T10:02:33Z" level=info msg="Excluding namespaces: kube-node-lease, kube-public, kube-system, velero-restore-test-alertmanager, velero-restore-test-loki, velero-restore-test-loki-longterm, velero-restore-test-prometheus" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/backup/backup.go:222"
time="2021-11-25T10:02:33Z" level=info msg="Including resources: persistentvolumeclaims, persistentvolumes" backup=default/velero-backup-pvs-20211125100231 logSource="pkg/backup/backup.go:225"
time="2021-11-25T10:02:33Z" level=info msg="Excluding resources:
I'm not familiar with the CSI plugin. After trying to read related code, this seems related to CSI plugin's delete action. On backup deletion, CSI plugin will iterate over the VolumeSnapshot and VolumeSnapshotContent which are created during the backup process. If the VolumeSnapshot and VolumeSnapshotContent are deleted, the related Snapshot cannot be deleted. I think CSI support is not an ideal solution yet. It is still in beta.
Yes, this is an issue related to CSI plugin, when I check the code, I find velero tried to get volumesnapshots by this method call:
if snapshots, err := backupStore.GetBackupVolumeSnapshots(backup.Name); err != nil {
When I debug this code, I find the snapshots
slice is empty.
Also I find there is actually another method in backupStore
for csi volumesnapshots which name is GetCSIVolumeSnapshots
.
And I think we need to add a condition check before this line: if snapshots, err := backupStore.GetBackupVolumeSnapshots(backup.Name); err != nil {
The condition check looks like:
if features.IsEnabled(velerov1api.CSIFeatureFlag){
snapshots, err := backupStore.GetCSIVolumeSnapshots(backup.Name)
}else{
snapshots, err := backupStore.GetBackupVolumeSnapshots(backup.Name)
}
When I tried to change the code, I find the return value type is different between these two method although their functionality is almost the same.
@chrislinan Hi, Thanks for the detailed illustration. If you are using AWS cluster, could you try the new version of https://github.com/vmware-tanzu/velero-plugin-for-aws? The support of CSI snapshotting function is added recently. I think this may be an alternative of modifying of the CSI plugin code. https://github.com/vmware-tanzu/velero-plugin-for-aws/pull/93
Yes I already use the new version of velero-plugin-for-aws as a workaround. it works fine.
I'm also using the following initContainers for Velero 1.7.1
, and am seeing that volumeSnapshotContent
isn't getting patched to DeletionPolicy: Delete
before the volumeSnapshots
are removed by Velero.
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.4.0-rc1
volumeMounts:
- mountPath: /target
name: plugins
- name: velero-plugin-for-csi
image: velero/velero-plugin-for-csi:v0.2.0
volumeMounts:
- mountPath: /target
name: plugins
Certainly creating volumeSnapshots
and corresponding volumeSnapshotContent
doesn't appear to be an issue, although there are warnings about various APIs being deprecated as this operations occur.
We're running
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.11", GitCommit:"27522a29febbcc4badac257763044d0d90c11abd", GitTreeState:"clean", BuildDate:"2021-09-15T19:16:25Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Are these things that you're anticipating being fixed in v1.8.x ?
@aglees This is the v1.9 roadmap of Velero: https://github.com/vmware-tanzu/velero/wiki/1.9-Roadmap-(Work-in-Progress) It seems CSI plugin is not included, so I think CSI is not planned in the near future. If it's possible, I still suggest you to upgrade Velero to v1.8 and use cloud provider's specific plugin's v1.4 version, which already has the CSI support ability.
@blackpiglet thanks for responding. Having tried out just the AWS plugin v1.4.0-rc1
on its own, I can see that we're not getting any CSI operations. Maybe there's some additional setup required for that plugin above and beyond what's currently required for getting CSI going with v1.3.0
of AWS and v0.2.0
of VeleroCSI? Do you know of any?
@ashish-amarnath it looks like some of the code in /internal/delete/volumesnapshot_action.go
of the CSI Plugin is being hit, however we're not seeing that those snapshot contents are being patched. Given that the Velero pod operates as cluster-admin
I doubt that's an RBAC issue. Any ideas for what's going wrong there? Could we put in additional debugging code to investigate?
As you can see below, we're almost certainly hitting the start of that function during the course of deleting a backup.
velero-66bdc79cd9-bvtkb velero time="2022-01-25T09:10:19Z" level=info msg="Deleting Volumesnapshot graylog/velero-data-graylog-elasticsearch-data-0-sdl24" backup=graylog-backup-pv cmd=/plugins/velero-plugin-for-csi controller=backup-deletion logSource="/go/src/velero-plugin-for-csi/internal/delete/volumesnapshot_action.go:49" name=graylog-backup-pv-vvrdm namespace=velero pluginName=velero-plugin-for-csi
By saying no CSI operation, do you mean no VolumeContent and VolumeSnapshot is created? If yes, I think this is expected behavior, because the AWS plugin will call AWS API directly. You can check on AWS console to ensure whether the snapshot is created correctly.
By saying no CSI operation, do you mean no VolumeContent and VolumeSnapshot is created?
Correct.
If yes, I think this is expected behavior, because the AWS plugin will call AWS API directly. You can check on AWS console to ensure whether the snapshot is created correctly.
I'll review the IAM roles for Velero and will see if there's anything not being hit on that.
@blackpiglet thanks for the pointers on this. They key piece was the backup logs that indicated Snapshots were being skipped because they were cluster-scoped
, as shown below.
velero-79c9466668-9xqjt velero time="2022-01-25T11:57:52Z" level=info msg="Skipping resource because it's cluster-scoped" backup=velero/graylog-backup-pv group=snapshot.storage.k8s.io/v1 logSource="pkg/backup/item_collector.go:201" resource=volumesnapshotclasses
Once I'd changed the backups such that they included includeClusterResources: true
, then that got the ball rolling with AWS snapshots.
I'm also pleased to see that AWS snapshots were also deleted along with the backups.
Many thanks for your help.
@blackpiglet I've encountered an issue with restoring from snapshots, and have update #4199 with some details on that. If you've got any experience / expertise on that it'd be great to hear it.
Can this issue be closed?
I'll take over this one since it looks like a dup of https://github.com/vmware-tanzu/velero/issues/4760
Since there's no additional comments I'm closing this issue as a dup of #4760
Is there a special option that was added to make this work? I have Velero chart-7.1.3 with velero image v1.14.0. CSI plugin was removed and installed together with velero. Tried to remove backups like OP here, kopia still keeps the backups at the S3 side.
What steps did you take and what happened: I have a backup of my PVs, and I can see these snapshots on my aws console. When I execute this command to delete this backup
velero backup delete <my-backup>
. Thebackup
is deleted, the metadata on S3 is deleted, but the snapshot still exists on my aws. I can see the logs of this process:The log shows, velero is removing PV snapshots.
What did you expect to happen: I exepect to see my PV snapshots is deleted.
Anything else you would like to add: When I check velero code, I can find this pice of code at here:
GetBackupVolumeSnapshots
this function read the metadata file from S3, the file name looks likefmt.Sprintf("%s-volumesnapshots.json.gz", backup)
. When I check my metadata on S3, I find this file is empty, and there is another file which name looks likefmt.Sprintf("%s-csi-volumesnapshots.json.gz", backup)
, the csi file contains the metadata of PV snapshots. And I also find, you have a method to read this file :getCSIVolumeSnapshotKey
at here. But you didn't call this method anywhere.So, I suspect that you need a condition check for csi PV snapshot and read the correct file content.
Environment:
velero version
): 1.7.0velero client config get features
): EnableCSIkubectl version
):1.20.9/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.