vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.55k stars 1.38k forks source link

AWS Snapshot are not deleted once, when backup has expired. #5004

Closed shubhamyadav20 closed 2 years ago

shubhamyadav20 commented 2 years ago

What steps did you take and what happened:

Setup the velero using helm chart and set the ttl to 84days. but facing very weird issue it is not deleting the aws snapshots.

What did you expect to happen:

once backup is expired aws snapshot should automatically deleted which save the snapshot pricing.

Issue raised Earlier as well : https://github.com/openebs/velero-plugin/issues/178

blackpiglet commented 2 years ago

@shubhamyadav20 Could you give more information about your environment? For example, what's the version of Velero you are using? Which features do you enabled? Which plugins are you installed? If the version of Velero's version is newer or equal to v1.7.0, could you help to collect debug bundle files by command velero debug, and upload it here?

blackpiglet commented 2 years ago

@shubhamyadav20 Hi, I found a similar issue https://github.com/vmware-tanzu/velero/issues/1880. Could you check with whether the backup was already deleted when the snapshot still was found? If the backup still exists, it is normal snapshots are not deleted. All resources included the backup will be deleted during backup deletion process.

blackpiglet commented 2 years ago

@shubhamyadav20 To me, the problem is why the expired backups were not deleted. I noticed the expired and kept backups are all related to BackupStorageLocation aws-prd03. Could you check the BSL's status by command velero backup-location get to see whether it's in Available phase? If not, please check with your BSL configuration.

I also suggest to upgrade Velero version to v1.8.1. v1.6.0 is release more than one year ago. It's not easy to trace code to debug deeper.

blackpiglet commented 2 years ago

I think the Access Mode of the BSL is the reason. aws-prd03's AccessMode is set to ReadOnly. Velero will skip for this kind of BSL. Please set it to ReadWrite, or find out whether there is another cluster also use the same BSL.

blackpiglet commented 2 years ago

@shubhamyadav20 Not talking about permission on AWS. It's about Velero BackupStorageLocation aws-prd03's AccessMode.

Need to modify the aws-prd03's spec.accessMode to ReadWrite to let the expired backup deleted correctly.

shubhamyadav20 commented 2 years ago

backup is happening in velero storage location name is aws & associated bucket details in above screenshot.

No backup is happening on aws-prd03 storage location(it is useless since 2021).

blackpiglet commented 2 years ago

@shubhamyadav20

Hi blackpiglet,

Yes, we have set ttl to 84 days after that backups automatically expired. you can see in the below screenshot

image

but aws snapshot are still there which is causing us the pricing. till now I have reviewed so many pages but not get any correct solution.

We can tell whether the backup is expired or not by the Expires column value. If it says 56d, it means the backup should be kept 56d more before expired. If it says 127d ago, it means the backup was expired 127 day ago.

From this picture, we can see all backups from BSL aws are still no expired. The expired ones are related to BSL aws-prd03. If you need to keep these backups related to BSL aws-prd03, to me, there is nothing to do then.

It does ring a bell to me, the expires column name may be a confusing. Do you think changing to something like TTL would be better?

sseago commented 2 years ago

Hmm. I think there may be some confusion here on what's stored where. the backup storage location is used for storing the backup itself, and for restic filesystem backups. The BSL is not used for native snapshots (so if you're using ebs/gp2 for your aws PVs, those snapshots don't live in the bucket). Snapshots are accessed via the VolumeSnapshotLocation. It may or may not even be in the same region (or same provider) as the BSL. For example, you could be using minio buckets for your BSL but aws snapshots must always live in aws in the same region as your volumes.

What I don't know for sure off the top of my head is whether velero ties backup deletion to snapshot deletion, because doing this would be a more complicated operation. Velero would need to load the backup from the bucket before deleting it, parse the volume backups to see which volumes have snapshots associated, and then issue snapshot delete operations from there before deleting the backup. We'd need to confirm this, but I suspect that this is not being done.

shubhamyadav20 commented 2 years ago

I need to sought it out things at my own.

Thanks for your time & co-operation.

tobiasreischmann commented 1 year ago

Hey @sseago, did you check, whether volumesnapshots should automatically be deleted after the deletion of the backup? I assumed that this would be the case and was a bit confused seeing hundreds of volumesnapshots still laying in the cluster. (We have daily backups and a really short ttl of only 1day). Does anyone know a good way to automatically delete old volumesnapshots? Thanks