pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.24k stars 499 forks source link

fail to upgrade pd #609

Closed cwen0 closed 5 years ago

cwen0 commented 5 years ago

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:54:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-24T06:43:59Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{TiDBVersion:"2.1.0", GitVersion:"v1.0.0-beta.2.38+758c9888ea4245-dirty", GitCommit:"758c9888ea4245f4d651f8f3f95c31290131f2e4", GitTreeState:"dirty", BuildDate:"2019-06-03T11:39:25Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

[pingcap@172.16.4.4 dashboard]$ kubectl get pvc --namespace=ryan
NAME                    STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS        AGE
pd-ryan-test-pd-0       Bound    local-pv-d381798e   3667Gi     RWO            shared-nvme-disks   12d
pd-ryan-test-pd-1       Bound    local-pv-3c881820   3667Gi     RWO            shared-nvme-disks   8d
pd-ryan-test-pd-2       Bound    local-pv-458ec73a   3667Gi     RWO            shared-nvme-disks   8d
tikv-ryan-test-tikv-0   Bound    local-pv-dc52bd99   3667Gi     RWO            nvme-disks          12d
tikv-ryan-test-tikv-1   Bound    local-pv-bd6f7b9a   3667Gi     RWO            nvme-disks          12d
tikv-ryan-test-tikv-2   Bound    local-pv-6fc44b05   3667Gi     RWO            nvme-disks          12d
tikv-ryan-test-tikv-3   Bound    local-pv-968c86d4   3667Gi     RWO            nvme-disks          12d
tikv-ryan-test-tikv-4   Bound    local-pv-6bd5a4a2   3667Gi     RWO            nvme-disks          8d

What's the status of the TiDB cluster pods?

[pingcap@172.16.4.4 dashboard]$ kubectl get pod -n ryan -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP               NODE          NOMINATED NODE
ryan-test-discovery-6cd69db65f-gxpwl   1/1     Running   0          12d    10.233.73.50     172.16.4.94   <none>
ryan-test-monitor-75fc65dbd5-kfjgc     2/2     Running   0          12d    10.233.73.14     172.16.4.94   <none>
ryan-test-pd-0                         1/1     Running   0          74m    10.233.73.6      172.16.4.94   <none>
ryan-test-tidb-0                       1/1     Running   0          12d    10.233.111.206   172.16.4.87   <none>
ryan-test-tikv-0                       1/1     Running   0          6d3h   10.233.68.102    172.16.4.32   <none>
ryan-test-tikv-1                       1/1     Running   0          6d3h   10.233.73.75     172.16.4.91   <none>
ryan-test-tikv-2                       1/1     Running   0          6d3h   10.233.103.122   172.16.4.90   <none>
ryan-test-tikv-3                       1/1     Running   0          6d3h   10.233.101.76    172.16.4.35   <none>
ryan-test-tikv-4                       1/1     Running   0          6d3h   10.233.84.213    172.16.4.99   <none>

What did you do?

* upgrade pd to `hub.pingcap.net/pingcap/pd:rleungx-0b5e79e` * update pd to `hub.pingcap.net/pingcap/pd:rleungx-368ee03` **What did you expect to see?** pd use this image `hub.pingcap.net/pingcap/pd:rleungx-368ee03` **What did you see instead?** ``` Containers: pd: Container ID: docker://e26a211c49a80301d3f7c669522f32a31a8bccab690ba3c98fe2b8ebd7116b68 Image: hub.pingcap.net/pingcap/pd:rleungx-0b5e79e Image ID: docker-pullable://hub.pingcap.net/pingcap/pd@sha256:7dc1c85026ebaaa4b8206027b40b2f8e37264fffdb8e9d07e4d9c2209cc0938d ```
DanielZhangQD commented 5 years ago

If any issue that causes PD not working, such as wrong image tag, incorrect node affinity config, incorrect PD config, etc. we cannot upgrade TiDB cluster anymore per existing logic. Plan to add ForceUpgrade option to tidb-operator so that we can forcely upgrade TiDB cluster in some error cases.