vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.4k stars 1.37k forks source link

Enhancing Velero Schedule with RetainDays and RetainCount for Backup Retention Management. #7804

Open yuanqijing opened 1 month ago

yuanqijing commented 1 month ago

Abstract

This proposal aims to introduce new features to the velerov1.Schedule resource in Velero: retainDays and retainCount. These fields will allow users to specify the number of days backups should be retained and the count of recent backups to keep, respectively. This enhancement will provide finer control over backup retention policies directly through Velero schedules, simplifying compliance with data retention policies in Schedule.

Background

Velero currently schedules backups and allows users to define backup frequencies and TTL (time to live) for backups. However, managing backup retention based on count (i.e., keeping only a certain number of recent backups) or a specific number of days (i.e., keeping backups only for a defined period) requires additional scripting or manual intervention.

Goals

Enhance Backup Retention Flexibility: Users should be able to specify how long backups should be retained and how many recent backups to keep directly through the Velero Schedule API.

Design

API Changes: Extend the velerov1.ScheduleSpec with two optional fields: retainDays (type int64) and retainCount (type int64).

type ScheduleSpec struct {
    // Existing fields...
    Schedule string `json:"schedule"`

    // RetainDays specifies the number of days to retain backups created by this schedule.
    // Set to 0 for unlimited retention by days.
    RetainDays int64 `json:"retainDays,omitempty"`

    // RetainCount specifies the maximum number of backups to retain for this schedule.
    // Set to 0 for unlimited retention by count.
    RetainCount int64 `json:"retainCount,omitempty"`
    ....
}

Controller Changes: Update the GC controller's reconcile loop to check these fields and handle the deletion of backups.

RetainDays Logic: Calculate the cut-off date for each backup by subtracting retainDays from the current date. Any backup older than this cut-off date should be flagged for deletion, assuming it doesn't fall under the required count of retainCount.

RetainCount Logic: For each schedule, perform a query to list all associated backups, sorted by creation date in descending order.Identify and mark for deletion any backups that exceed the number in retainCount.

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Evesy commented 4 weeks ago

This would be beneficial in the scenarios where we have backup schedules on a high frequency with a relatively low TTL, and are primarily used for data loss recovery.

Such backups are on a low TTL since they become redundant fairly quickly after several successive backups exist with more recent data; however, this has the side effect that if the scheduled backups begin failing for one reason or another, you can quickly end up in a scenario where you have 0 backups left as they've all expired.

If the scheduler had an option to retain n backups rather than a static TTL, it ensures you can't end up in a scenario where you have 0 backups.