pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.2k stars 490 forks source link

BackupSchedule always deletes the backup files immediately after the backup job complete #5604

Closed wuyudian1 closed 3 months ago

wuyudian1 commented 3 months ago

Bug Report

What version of Kubernetes are you using? 1.24.6

What version of TiDB Operator are you using? 1.5.2

What's the status of the TiDB cluster pods?

NAME                                 READY   STATUS    RESTARTS   AGE    IP             NODE                       NOMINATED NODE   READINESS GATES
basicai-discovery-6f8785d8b6-mw2xw   1/1     Running   0          2d     10.101.0.52    cn-shanghai.192.168.3.81   <none>           <none>
basicai-monitor-0                    4/4     Running   0          61d    10.101.0.98    cn-shanghai.192.168.3.82   <none>           <none>
basicai-pd-0                         1/1     Running   0          2d1h   10.101.0.88    cn-shanghai.192.168.3.82   <none>           <none>
basicai-tidb-0                       2/2     Running   0          2d     10.101.0.35    cn-shanghai.192.168.3.81   <none>           <none>
basicai-tikv-0                       1/1     Running   0          2d1h   10.101.0.139   cn-shanghai.192.168.3.83   <none>           <none>
basicai-tikv-1                       1/1     Running   0          2d1h   10.101.2.42    cn-shanghai.192.168.3.77   <none>           <none>
basicai-tikv-2                       1/1     Running   0          2d1h   10.101.0.91    cn-shanghai.192.168.3.82   <none>           <none>

What did you do? We have multiple TiDB database clusters used for development, testing, and production. We plan to upgrade our TiDB version from 6.1 to 7.5.1, and we have just completed the TiDB upgrade in our testing environment. However, after the upgrade, we encountered an issue with the BackupSchedule behaving abnormally: it immediately deletes the backup files after the backup job is completed (this was not an issue before the upgrade),as follows are the log summaries:

I0402 14:21:29.624256       1 event.go:282] Event(v1.ObjectReference{Kind:"Backup", Namespace:"tidb-cluster", Name:"basicai-backup-schedule-minio2-2024-04-02t14-21-00", UID:"6d2d889e-9d79-443b-8547-dcbb9228173a", APIVersion:"pingcap.com/v1alpha1", ResourceVersion:"635369522", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Backup tidb-cluster/basicai-backup-schedule-minio2-2024-04-02t14-21-00 for backupSchedule/basicai-backup-schedule-minio2 successful
I0402 14:21:29.633334       1 backup_schedule_status_updater.go:61] BackupSchedule: [tidb-cluster/basicai-backup-schedule-minio2] updated successfully
I0402 14:21:29.633337       1 event.go:282] Event(v1.ObjectReference{Kind:"Backup", Namespace:"tidb-cluster", Name:"basicai-backup-schedule-minio2-2024-04-02t14-21-00", UID:"6d2d889e-9d79-443b-8547-dcbb9228173a", APIVersion:"pingcap.com/v1alpha1", ResourceVersion:"635369522", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create job tidb-cluster/backup-basicai-backup-schedule-minio2-2024-04-02t14-21-00 for cluster basicai-backup-schedule-minio2 backup successful
I0402 14:21:29.633504       1 backup_schedule_controller.go:105] BackupSchedule: tidb-cluster/basicai-backup-schedule-minio2, still need sync: backup schedule tidb-cluster/basicai-backup-schedule-minio2, the last backup basicai-backup-schedule-minio2-2024-04-02t14-21-00 is still running, requeuing
I0402 14:21:29.642125       1 backup_status_updater.go:128] Backup: [tidb-cluster/basicai-backup-schedule-minio2-2024-04-02t14-21-00] updated successfully
I0402 14:21:30.634724       1 backup_schedule_controller.go:105] BackupSchedule: tidb-cluster/basicai-backup-schedule-minio2, still need sync: backup schedule tidb-cluster/basicai-backup-schedule-minio2, the last backup basicai-backup-schedule-minio2-2024-04-02t14-21-00 is still running, requeuing
I0402 14:21:32.635727       1 backup_schedule_controller.go:105] BackupSchedule: tidb-cluster/basicai-backup-schedule-minio2, still need sync: backup schedule tidb-cluster/basicai-backup-schedule-minio2, the last backup basicai-backup-schedule-minio2-2024-04-02t14-21-00 is still running, requeuing
I0402 14:21:36.652154       1 backup_schedule_manager.go:388] backup schedule tidb-cluster/basicai-backup-schedule-minio2 gc backup basicai-backup-schedule-minio2-2024-04-02t14-21-00 success
I0402 14:21:36.652218       1 event.go:282] Event(v1.ObjectReference{Kind:"Backup", Namespace:"tidb-cluster", Name:"basicai-backup-schedule-minio2-2024-04-02t14-21-00", UID:"6d2d889e-9d79-443b-8547-dcbb9228173a", APIVersion:"pingcap.com/v1alpha1", ResourceVersion:"635369570", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' delete Backup tidb-cluster/basicai-backup-schedule-minio2-2024-04-02t14-21-00 for backupSchedule/basicai-backup-schedule-minio2 successful
I0402 14:21:36.662773       1 backup_schedule_status_updater.go:61] BackupSchedule: [tidb-cluster/basicai-backup-schedule-minio2] updated successfully

The defination of our BackupSchedule is as follow:

apiVersion: pingcap.com/v1alpha1
kind: BackupSchedule
metadata:
  name: basicai-backup-schedule-minio2
  namespace: tidb-cluster
spec:
  backupTemplate:
    backoffRetryPolicy:
      maxRetryTimes: 2
      minRetryDuration: 300s
      retryTimeout: 30m
    backupMode: snapshot
    backupType: full
    br:
      cluster: basicai
      clusterNamespace: tidb-cluster
    calcSizeLevel: all
    resources: {}
    s3:
      bucket: basicai-ops-backup
      endpoint: https://minio-end......ai.com
      prefix: tidb/alidev
      provider: aws
      region: oss-cn-beijing
      secretName: tidb-backup-to-minio
    volumeBackupInitJobMaxActiveSeconds: 600
  maxReservedTime: 84h
  schedule: 10 14 * * *

What did you expect to see? Backup files will NOT be deleted by BackupSchedule immediately after the backup job is completed.

What did you see instead? The BackupSchedule behaving abnormally: it immediately deletes the backup files after the backup job is completed (this was not an issue before the upgrade).