pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.23k stars 498 forks source link

MaxBackups behavior on BackupSchedule #3660

Open AkioLove opened 3 years ago

AkioLove commented 3 years ago

Feature Request

Is your feature request related to a problem? Please describe: https://asktug.com/t/topic/66829

When maxBackups calculating the number of backup copies, it doesn't consider whether the status of the existing Backup is "Complete", "Failed", or "Running".

Backups with the status "Failed" will also be counted in maxBackups. If a continuous failure is caused by some reason, it may result in a good backup being rotated out due to the setting of maxBackups.

Describe the feature you'd like: Check the Backup object status when counting maxBackups, only calculate the object whose status is completed.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

DanielZhangQD commented 3 years ago

@AkioLove The maxBackups is the max backup CRs we will maintain in the cluster, if we count the Complete ones only, what about the CRs in the other status, how can they be cleaned?

If a continuous failure is caused by some reason, it may result in a good backup being rotated out due to the setting of maxBackups.

For this issue, we should fix the failure by the first observation, and we can also increase the maxBackups, WDYT?

dragonly commented 3 years ago

@AkioLove you could set backup.spec.cleanPolicy to Retain so that deleting Backup CR won't delete the underlying data.

AkioLove commented 3 years ago

@AkioLove The maxBackups is the max backup CRs we will maintain in the cluster, if we count the Complete ones only, what about the CRs in the other status, how can they be cleaned? For this issue, we should fix the failure by the first observation, and we can also increase the maxBackups, WDYT?

Thank you :) So far increase the maxBackups and fix the failure by the first observation should be enough. But it will be great if there have an option that can guarantee to keep 1 closest "Complete" version of backups.

@AkioLove you could set backup.spec.cleanPolicy to Retain so that deleting Backup CR won't delete the underlying data.

Thank you, but our backup data is very large (each version > 500GB), so we want Backup CR can auto-delete the underlying data.

DanielZhangQD commented 3 years ago

OK, what about adding two more parameters to the BackupScheduleSpec to control this behavior:

MaxCompleteBackups *int32 `json:"maxCompleteBackups,omitempty"`
MaxFailedBackups *int32 `json:"maxFailedBackups,omitempty"`

For the CRs in the other status, e.g. scheduled, running, prepared, etc., no need to clean them because they should change to the above two status finally, WDYT? @AkioLove @dragonly @BinChenn

AkioLove commented 3 years ago

OK, what about adding two more parameters to the BackupScheduleSpec to control this behavior:

MaxCompleteBackups *int32 `json:"maxCompleteBackups,omitempty"`
MaxFailedBackups *int32 `json:"maxFailedBackups,omitempty"`

For the CRs in the other status, e.g. scheduled, running, prepared, etc., no need to clean them because they should change to the above two status finally, WDYT? @AkioLove @dragonly @BinChenn

@DanielZhangQD It was a good idea. By those options, I can let my tidb cluster keep the recent "Complete" version of backups :)