pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 489 forks source link

tidb-controller-manager creates http client when tls client cert secret is not found #3593

Open dragonly opened 3 years ago

dragonly commented 3 years ago

Bug Report

What version of Kubernetes are you using?

1.18.4

What version of TiDB Operator are you using?

master branch (commit #9305cf8d)

What's the status of the TiDB cluster pods?

NAME                                   READY   STATUS    RESTARTS   AGE
basic-tls-discovery-54f5bc5c75-6bwp9   1/1     Running   0          105m
basic-tls-pd-0                         1/1     Running   0          105m

What did you do?

Ref: https://docs.pingcap.com/tidb-in-kubernetes/stable/enable-tls-between-components

  1. generate PD/TiDB/TiKV/client certs according to doc above
  2. deploy a TidbCluster as follows

    apiVersion: pingcap.com/v1alpha1
    kind: TidbCluster
    metadata:
        name: basic-tls
    spec:
        tlsCluster:
            enabled: true
        version: v4.0.8
        timezone: UTC
        pvReclaimPolicy: Delete
        enableDynamicConfiguration: true
        configUpdateStrategy: RollingUpdate
        discovery: {}
    
        pd:
            baseImage: pingcap/pd
            replicas: 1
            requests:
                storage: "1Gi"
            config:
                security:
                    cert-allowed-cn:
                    - TiDB
    
        tikv:
            baseImage: pingcap/tikv
            replicas: 1
            requests:
                storage: "1Gi"
            config:
                security:
                    cert-allowed-cn:
                    - TiDB
                storage:
                    # In basic examples, we set this to avoid using too much storage.
                    reserve-space: "0MB"
                rocksdb:
                    # In basic examples, we set this to avoid the following error in some Kubernetes clusters:
                    # "the maximum number of open file descriptors is too small, got 1024, expect greater or equal to 82920"
                    max-open-files: 256
                raftdb:
                    max-open-files: 256
    
        tidb:
            baseImage: pingcap/tidb
            replicas: 1
            service:
                type: NodePort
            tlsClient:
                enabled: true
            config:
                security:
                    cert-verify-cn:
                    - TiDB

What did you expect to see? TiDB cluster up and running.

What did you see instead? Only pd is up, and tidb-controller-manager has the following log

E1210 06:53:30.167070       1 pd_control.go:93] Unable to get tls config for tidb cluster "basic-tls", pd client may not work: unable to load certificates from secret default/basic-tls-cluster-client-secret: secrets "basic-tls-cluster-client-secret" not found

It seems that the tidb-controller-manager found no secrets and just returns a plain http pd client, which absolutely cannot connect to pd, thus the tc creating progress is stuck.

dragonly commented 3 years ago

There may be a potential derived issue: when the user re-generate all certs, the tidb-controller-manager must be restarted to read the new certs.

DanielZhangQD commented 3 years ago

It seems that the tidb-controller-manager found no secrets and just returns a plain http pd client, which absolutely cannot connect to pd, thus the tc creating progress is stuck.

Since no secret found, no matter it return a HTTP client or just return error here, the creating of other components will be blocked, what do you expect to handle here?

There may be a potential derived issue: when the user re-generate all certs, the tidb-controller-manager must be restarted to read the new certs.

Please check the code carefully, a new TLS client is created each time after getting the secret from Kubernetes, so no need to restart tidb-controller-manager after certs rotated.

dragonly commented 3 years ago

Since no secret found, no matter it return a HTTP client or just return error here, the creating of other components will be blocked, what do you expect to handle here?

I expect an error is directly returned, but not an unusable client instance, which can make debugging easier.

Please check the code carefully, a new TLS client is created each time after getting the secret from Kubernetes, so no need to restart tidb-controller-manager after certs rotated.

Yes, I misunderstood this. I is actually OK. 😂

DanielZhangQD commented 3 years ago

Since no secret found, no matter it return a HTTP client or just return error here, the creating of other components will be blocked, what do you expect to handle here?

I expect an error is directly returned, but not an unusable client instance, which can make debugging easier.

This makes sense, we can optimize the logic here.