pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.2k stars 490 forks source link

How to use Backup CR to backup tidb to aliyun OSS? #5611

Closed wuyudian1 closed 1 month ago

wuyudian1 commented 3 months ago

Question

We have been using TiDB versions 6.1 and the latest 7.5.1 across multiple environments, including production. We have dedicated a significant amount of effort to explore how to backup to Alibaba Cloud's OSS without success, forcing us to resort to backups on AWS. However, as our data volume has grown, this solution has become ineffective for our backup needs. Consequently, we are seeking assistance from the community.

We are using the following Backup CR configuration to backup TiDB to Alibaba Cloud OSS:

apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
  name: backup-to-oss
  namespace: tidb-cluster
spec:
  backupMode: snapshot
  backupType: full
  br:
    logLevel: debug
    cluster: basicai
    clusterNamespace: tidb-cluster
  resources: {}
  s3:
    bucket: ba...up
    endpoint: https://oss-cn-beijing.aliyuncs.com
    prefix: tidb/alidev
    provider: alibaba
    region: oss-cn-beijing
    secretName: s3-secret

The secret s3-secret contains a valid AK/SK, and we can successfully perform backups manually using the BR command, confirming the AK/SK's validity. However, the actual execution of the Backup CR consistently fails. The error logs reveal that Alibaba Cloud returned an error to BR, accessible at https://api.aliyun.com/troubleshoot?q=0002-00000003. The page states:

Problem Description You have used an STS type AccessKey ID, but have not adopted STS authentication method for your request. Reason An STS type AccessKey ID was used without including the SecurityToken field in the request to indicate the use of STS authentication method.

Subsequently, by referencing the BR command format and explicitly appending the AK/SK to the prefix value (as shown below), the Backup CR was finally able to write the backup files to OSS. However, the Backup CR then attempts to read the backup metadata, failing with an error. It appears that reading the metadata and writing backup files are handled by two different logics, with the metadata read operation treating the AK/SK appended to the prefix as part of the path, leading to metadata not being found:

apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
  name: backup-to-oss
  namespace: tidb-cluster
spec:
  backupMode: snapshot
  backupType: full
  br:
    logLevel: debug
    cluster: basicai
    clusterNamespace: tidb-cluster
  resources: {}
  s3:
    bucket: ba..up
    endpoint: https://oss-cn-beijing.aliyuncs.com
    prefix: tidb_test/alidev?access-key=LT..N&secret-access-key=MD..K
    provider: alibaba
    region: oss-cn-beijing

The specific error message is as follows:

error: read backup meta from bucket basicai-ops-backup and prefix tidb_test/alidev?access-key=LTA…N&secret-access-key=MD…K: backupmeta not exist
csuzhangxc commented 2 months ago

@wuyudian1 Hi, can you show the args when you manually perform backups using the BR command? Writing files to the bucket happens in TiKV, but BR operates metadata.

wuyudian1 commented 2 months ago

@csuzhangxc

./br backup full \
    --pd "192.168.6.15:2379,192.168.6.16:2379,192.168.6.12:2379"   \
    --s3.endpoint "https://oss-cn-beijing.aliyuncs.com"   \
    --s3.provider "alibaba"   \
    --s3.region "oss-cn-beijing"   \
    --log-level debug   \
    --storage "s3://b...backup/tidb/alidev?access-key=LTA...N&secret-access-key=MD...K"
csuzhangxc commented 2 months ago

@wuyudian1 did you check if the backupmeta file exists? if it exists, it may be caused by the https://pkg.go.dev/gocloud.dev which is used by the tidb-backup-manager and is not working well for Alibaba OSS

wuyudian1 commented 2 months ago

@csuzhangxc Yes, backupmeta files are exist.