vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.78k stars 1.41k forks source link

Velero backup deletion is not deleting objects in kopia repository, leading to increase in our s3 bucket size. #8282

Open dharanui opened 1 month ago

dharanui commented 1 month ago

What steps did you take and what happened: Set backup expireAfter to some value. 3 days in our case. In the remote object store (s3 bucket in our case) under kopia/namespace (kopia repository) folder i see objects since the day we started using csi data movement.( 1 month )

This will increase the size and number of objects in our s3 bucket leading to increase in costs

velero version: 1.14.1 aws plugin : 1.10

What did you expect to happen:

older objects in s3 bucket gets deleted as the backup itself is deleted

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

ywk253100 commented 1 month ago

The backup repository isn't cleaned up immediately when one backup is deleted, the clean up is done during the maintenance job, see the explanation here https://velero.io/docs/v1.14/file-system-backup/#backup-deletion

dharanui commented 1 month ago

But it should stabilise after a point right? if I am not mistaken maintenance jobs runs every day(or are you referring to some other maintenance?) Here is our trend of s3 bucket size:

image
ywk253100 commented 1 month ago

How many new backups do you create every day? What's the TTL of them? Does the size of the workload data you backed up increase every day? What's the frequency of running the maintenance job?

dharanui commented 1 month ago

One backup per day. TTL is 14 days in this case, but even with ttl of 3 days in our other clusters its the same trend. Workload data does increase everyday, but for sure not at this rate as the volume sizes itself are less. Frequency of maintenance job is 24h since last 2 weeks. previously we used to have it once every hour (default of kopia). Trend of last 6 weeks:

image
sseago commented 1 month ago

Kopia backups are incremental, so with daily backups and ttl of 3 days, if the same volumes are backed up every day, kopia won't be able to delete anything since the snapshots from those deleted backups are needed for the later incremental backups for the same volume. However, if a PVC is eventually deleted, then once the last backup of this PVC is expired, then kopia maintenance should delete that content.

Lyndon-Li commented 1 month ago

But it should stabilise after a point right?

If new data keep generated (created or modified) and previous backup is NOT deleted, the storage usage will keep increasing without stabilising. If new data keep created (NO delete or modify), even though some previous backup ARE deleted, the storage usage will also keep increasing, no data could be GC during maintenance.

Lyndon-Li commented 1 month ago

Therefore, it may or may not be rational, we need more info to make the final judgement.

To further troubleshoot, please provide below info:

  1. Full list of the existing backups and full list of the existing DataUpload or PodVolumeBackup CRs
  2. Connect to Kopia repo through Kopia CLI (kopia repository connect), run below commands and share their outputs:
    kopia repo status
    kopia maintenance info --json
    kopia snapshot list --all
    kopia content stats
    kopia blob stats
    kopia index list --json
    kopia content list --deleted-only
    kopia index epoch list