pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 489 forks source link

BR CRD based backup doesn't work with minio #2277

Open tfulcrand opened 4 years ago

tfulcrand commented 4 years ago

Bug Report

What version of Kubernetes are you using? Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.9", GitCommit:"2e808b7cb054ee242b68e62455323aa783991f03", GitTreeState:"clean", BuildDate:"2020-01-18T23:24:23Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using? TiDB Operator Version: version.Info{GitVersion:"v1.1.0-rc.2", GitCommit:"6f42736bd0f33aed65903058f43a9005e806680a", GitTreeState:"clean", BuildDate:"2020-04-16T01:23:10Z", GoVersion:"go1.13", Compiler:"gc", Platform:"linux/amd64"}

What's the status of the TiDB cluster pods?

NAME                                      READY   STATUS    
backup-tidb-backup-minio-j48md            0/1     Completed
tidb-cluster-discovery-59dbcc4b97-8c8hc   1/1     Running   
tidb-cluster-monitor-786989b9f-qf7gv      3/3     Running   
tidb-cluster-pd-0                         1/1     Running   
tidb-cluster-pd-1                         1/1     Running   
tidb-cluster-pd-2                         1/1     Running   
tidb-cluster-tidb-0                       2/2     Running   
tidb-cluster-tidb-1                       2/2     Running   
tidb-cluster-tikv-0                       1/1     Running   
tidb-cluster-tikv-1                       1/1     Running   
tidb-cluster-tikv-2                       1/1     Running   
tidb-controller-manager-b4c8ffccd-2jmwk   1/1     Running   
tidb-scheduler-bd5fcb5-h2f9x              2/2     Running

What did you do?

We define a CRD Backup object as follow :

kind: Backup
metadata:
  name: tidb-backup-minio
  namespace: tidb
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/storage
            operator: In
            values:
            - "true"
  backupType: full
  br:
    cluster: tidb-cluster
    clusterNamespace: tidb
    logLevel: debug
  from:
    host: tidb-cluster-tidb
    port: 4000
    secretName: tidb-backup
    user: root
  s3:
    provider: minio
    endpoint: https://****************************/minio
    secretName: tidb-minio-backup
    bucket: dev
    storageClass: STANDARD
  tolerations:
  - effect: NoSchedule
    key: dedicated
    operator: Equal
    value: storage

What did you expect to see? Backup piece present on our Minio instance.

What did you see instead? Minio instance stays empty and we can see in bakup pod logs this :

I0422 13:43:30.182523       1 backup.go:85] [2020/04/22 13:43:30.182 +00:00] [INFO] [manager.go:267] ["failed to campaign"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 033a0f7b-3cc7-4180-9022-c3e907511d7c"] [error="context canceled"]
I0422 13:43:30.182559       1 backup.go:85] [2020/04/22 13:43:30.182 +00:00] [INFO] [manager.go:239] ["etcd session is done, creates a new one"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 033a0f7b-3cc7-4180-9022-c3e907511d7c"][2020/04/22 13:43:30.182 +00:00] [INFO] [manager.go:243] ["break campaign loop, NewSession failed"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 033a0f7b-3cc7-4180-9022-c3e907511d7c"] [error="context canceled"] [errorVerbose="context canceled\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:15\ngithub.com/pingcap/tidb/owner.contextDone\n\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200223044457-aedea3ec5e1e/owner/manager.go:371\ngithub.com/pingcap/tidb/owner.NewSession\n\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200223044457-aedea3ec5e1e/owner/manager.go:142\ngithub.com/pingc
I0422 13:43:30.182572       1 backup.go:85] ap/tidb/owner.(*ownerManager).campaignLoop\n\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200223044457-aedea3ec5e1e/owner/manager.go:241\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
I0422 13:43:30.182626       1 backup.go:85] [2020/04/22 13:43:30.182 +00:00] [INFO] [domain.go:582] ["domain closed"] ["take time"=71.127211ms]
I0422 13:43:30.187621       1 backup.go:85] 
E0422 13:43:30.187724       1 manager.go:210] backup cluster tidb/tidb-backup-minio data failed, err: cluster tidb/tidb-backup-minio, wait pipe message failed, errMsg , err: exit status 1
I0422 13:43:30.210311       1 backup_status_updater.go:66] Backup: [tidb/tidb-backup-minio] updated successfully
DanielZhangQD commented 4 years ago

@tfulcrand Please upgrade TiDB Operator to v1.1.0-rc.2.p1 and also set tidbBackupManagerImage: pingcap/tidb-backup-manager:v1.1.0-rc.2.p1 and try again. Please post the pod logs and the yaml of backup including the status section here if fails again.

tfulcrand commented 4 years ago

@DanielZhangQD As you asked, we have upgraded TiDB operator and TiDB backup manager version and now we see more details on the previous errors. We get this kind of errror : x509: certificate signed by unknown authority. In fact we use cert-manager and vault-pki to automatically manage certificates and then we have an private CA which is no trusted by the backup manager. To temporally workaround this problem we have added a LetsEncrypt wildecard certificate for all subdomains in kube.dm.gg but it's not recommended for security. (BTW with this workaround backups succedded on minio) Ideally we would like to be able to import our CA to backup manager, do you plan to add this feature ?

DanielZhangQD commented 4 years ago

@tfulcrand tidb-backup-manager already supports TLS, please follow the doc to create ${cluster_name}-tidb-client-secret and follow the doc to create ${cluster_name}-cluster-client-secret in the target namespace of the tidb cluster. tidb-backup-manager will mount those two secrets automatically to connect to TiDB, PD, and TiKV. You can also create a private secret with the tidb client certs and configure the secret name to the spec.from.tlsClient.tlsSecret and backup-manager will use this secret instead of ${cluster_name}-tidb-client-secret. cc @weekface

DanielZhangQD commented 4 years ago

@tfulcrand With v1.1.0 TiDB Operator, the separate tidb client cert for Backup CR is changed to spec.from. tlsClientSecretName, you can refer to https://pingcap.com/docs/tidb-in-kubernetes/stable/enable-tls-for-mysql-client/#using-cert-manager step 5 for its creation.

tfulcrand commented 4 years ago

@DanielZhangQD Thanks for your replies. But The problem is still present... Probably the explanation is not clear. We setup a Minio instance as backup storage. This Minio instance stands behind an Ingress component with TLS enable. The certificate uses by the Ingress is generated by cert-manager and vault-pki (for auto management). With this configuration when lauching a CRD Backup object, the log of associated pod shows this message : x509: certificate signed by unknown authority and backup task fails. Currently we use a wildcard LetsEncrypt certificate in the Ingress as workaround but we would like to use our generated certificate in the future.

DanielZhangQD commented 4 years ago

@tfulcrand Sorry for the late response, so basically, the BR and TiKV should support loading customized certs and connects to the MinIO instance with https enabled, right?

DanielZhangQD commented 4 years ago

@tfulcrand would you please explain more about the request to meet your requirement?

Smana commented 4 years ago

@DanielZhangQD I'll try to explain too: We already tried your documentation regarding cert-manager usage. In this documentation you make use of a self-signed certificate issued by cert-manager itself. That means that the CA is created by cert-manager.

For our use-case we use cert-manager but with a vault pki issuer. In that case the CA is generated on Vault side and we use this CA on all our internal services within kubernetes. That means that, at some point, this CA must be trusted between 2 services in order to communicate with each other using TLS.

So here we'd like to configure the backup job to trust our CA for the MinIO communication. The same way as you will do a curl --cacert cacert.pem https://minio_server/ for a curl command :)

I hope this requirement is clear enough for you, don't hesitate if you need any detail.

DanielZhangQD commented 4 years ago

I think cert-manager could generate certs with an user-defined CA, could you please help confirm this? @weekface

DanielZhangQD commented 4 years ago

@Smana Have you tried the procedures in this doc? If the vault-issuer is created, it should be good to follow the rest of our docs to create certs for the components.

Smana commented 4 years ago

This is more about how to handle self signed certificate on client side @DanielZhangQD . There's no way to set a secure trusted connection from the backup job (here the client initiating the TLS connection) to the Minio instance using our internal private CA without explicitly trusting the CA on client side.

DanielZhangQD commented 4 years ago

So my understanding here https://github.com/pingcap/tidb-operator/issues/2277#issuecomment-647322439 is correct, right? The backup job and TiKV should support configuring certs that can be used to connect to 3rd party services such as Minio. But for now, they can only load the certs to be used internally among the TiDB cluster components.

Smana commented 4 years ago

Hi @DanielZhangQD , yes your comment is right :) loading a certificate, ca cert would be nice.

DanielZhangQD commented 4 years ago

OK, https://github.com/pingcap/br/issues/409 is created for BR.