pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.24k stars 499 forks source link

e2e test failure, need more PV #702

Closed shonge closed 5 years ago

shonge commented 5 years ago

Bug Report

What version of Kubernetes are you using?

v1.12.5 What version of TiDB Operator are you using?

v1.0.0-rc.1 What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

What's the status of the TiDB cluster pods?

What did you do?

kubectl apply -f e2e.yaml What did you expect to see? Pass e2e test What did you see instead?

[root@host]# kubectl -n tidb-operator-e2e logs -f tidb-operator-e2e
I0729 18:01:19.107217       1 actions.go:1095] statefulset: e2e-cluster2/e2e-pd-replicas-1-tikv .status.ReadyReplicas(4) != 5
I0729 18:01:24.797111       1 actions.go:1095] statefulset: e2e-cluster2/e2e-pd-replicas-1-tikv .status.ReadyReplicas(4) != 5
I0729 18:01:29.103265       1 actions.go:1095] statefulset: e2e-cluster2/e2e-pd-replicas-1-tikv .status.ReadyReplicas(4) != 5

[root@host]# kubectl -n e2e-cluster2 describe po e2e-pd-replicas-1-tikv-4
Name:           e2e-pd-replicas-1-tikv-4
Namespace:      e2e-cluster2
Priority:       0
Node:           <none>
Labels:         app.kubernetes.io/component=tikv
                app.kubernetes.io/instance=e2e-pd-replicas-1
                app.kubernetes.io/managed-by=tidb-operator
                app.kubernetes.io/name=tidb-cluster
                controller-revision-hash=e2e-pd-replicas-1-tikv-cc9b86c8c
                statefulset.kubernetes.io/pod-name=e2e-pd-replicas-1-tikv-4
                tidb.pingcap.com/cluster-id=6719113174899182982
Annotations:    pingcap.com/last-applied-configuration:
                  {"volumes":[{"name":"annotations","downwardAPI":{"items":[{"path":"annotations","fieldRef":{"fieldPath":"metadata.annotations"}}]}},{"name...
                prometheus.io/path: /metrics
                prometheus.io/port: 20180
                prometheus.io/scrape: true
Status:         Pending
IP:
Controlled By:  StatefulSet/e2e-pd-replicas-1-tikv
Containers:
  tikv:
    Image:      pingcap/tikv:v3.0.1
    Port:       20160/TCP
    Host Port:  0/TCP
    Command:
      /bin/sh
      /usr/local/bin/tikv_start_script.sh
    Environment:
      NAMESPACE:              e2e-cluster2 (v1:metadata.namespace)
      CLUSTER_NAME:           e2e-pd-replicas-1
      HEADLESS_SERVICE_NAME:  e2e-pd-replicas-1-tikv-peer
      CAPACITY:               0
      TZ:                     UTC
    Mounts:
      /etc/podinfo from annotations (ro)
      /etc/tikv from config (ro)
      /usr/local/bin from startup-script (ro)
      /var/lib/tikv from tikv (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-v4cns (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  tikv:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  tikv-e2e-pd-replicas-1-tikv-4
    ReadOnly:   false
  annotations:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      e2e-pd-replicas-1-tikv
    Optional:  false
  startup-script:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      e2e-pd-replicas-1-tikv
    Optional:  false
  default-token-v4cns:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-v4cns
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From            Message
  ----     ------            ----               ----            -------
  Warning  FailedScheduling  52m (x8 over 53m)  tidb-scheduler  Failed filter with extender at URL http://127.0.0.1:10262/scheduler/filter, code 500
  Warning  FailedScheduling  52m (x6 over 53m)  tidb-scheduler  can't schedule to nodes: [kube-node-2], because these pods had been scheduled to nodes: map[kube-node-2:[e2e-pd-replicas-1-tikv-0 e2e-pd-replicas-1-tikv-3]] 

[root@host]# kubectl -n e2e-cluster2 get tc
NAME                PD                  STORAGE   READY   DESIRE   TIKV                  STORAGE   READY   DESIRE   TIDB                  READY   DESIRE
e2e-cluster2        pingcap/pd:v3.0.1   1Gi       5       5        pingcap/tikv:v3.0.1   10Gi      5       5        pingcap/tidb:v3.0.1   3       3
e2e-pd-replicas-1   pingcap/pd:v3.0.1   1Gi       5       5        pingcap/tikv:v3.0.1   10Gi      4       5        pingcap/tidb:v3.0.1   3       3 

[root@host]#  kubectl -n e2e-cluster2 get po -l app.kubernetes.io/component=tikv -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE          NOMINATED NODE
e2e-cluster2-tikv-0        1/1     Running   0          75m   10.244.2.24   kube-node-2   <none>
e2e-cluster2-tikv-1        1/1     Running   0          76m   10.244.3.22   kube-node-3   <none>
e2e-cluster2-tikv-2        1/1     Running   0          77m   10.244.1.21   kube-node-1   <none>
e2e-cluster2-tikv-3        1/1     Running   0          55m   10.244.2.31   kube-node-2   <none>
e2e-cluster2-tikv-4        1/1     Running   0          55m   10.244.3.31   kube-node-3   <none>
e2e-pd-replicas-1-tikv-0   1/1     Running   0          76m   10.244.2.22   kube-node-2   <none>
e2e-pd-replicas-1-tikv-1   1/1     Running   0          78m   10.244.3.18   kube-node-3   <none>
e2e-pd-replicas-1-tikv-2   1/1     Running   0          79m   10.244.1.20   kube-node-1   <none>
e2e-pd-replicas-1-tikv-3   1/1     Running   0          53m   10.244.2.34   kube-node-2   <none>
e2e-pd-replicas-1-tikv-4   0/1     Pending   0          53m   <none>        <none>        <none> 

[root@host]# kubectl get pv | grep -v Bound
NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                        STORAGECLASS    REASON   AGE
local-pv-f1f39fe7   49Gi       RWO            Delete           Available                                                local-storage            127m 

[root@host]#  kubectl get pv local-pv-f1f39fe7 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: local-volume-provisioner-kube-node-2-9663aa16-b216-11e9-9a7c-02426bb708ca
  creationTimestamp: "2019-07-29T15:38:46Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    kubernetes.io/hostname: kube-node-2
  name: local-pv-f1f39fe7
  resourceVersion: "1040"
  selfLink: /api/v1/persistentvolumes/local-pv-f1f39fe7
  uid: f4239095-b216-11e9-9a7c-02426bb708ca
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 49Gi
  local:
    path: /mnt/disks/vol1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kube-node-2
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
status:
  phase: Available
weekface commented 5 years ago

How many pv are there? kubectl get pv | wc -l

We had created 9 pvs on one node default in the DinD script: https://github.com/pingcap/tidb-operator/blob/master/manifests/local-dind/dind-cluster-v1.12.sh#L519

shonge commented 5 years ago

30

shonge commented 5 years ago
kubectl -n e2e-cluster1  get tc e2e-cluster1 -o yaml  |grep pvReclaimPolicy
  pvReclaimPolicy: Retain
kubectl -n e2e-cluster2  get tc e2e-cluster2 -o yaml  |grep pvReclaimPolicy
  pvReclaimPolicy: Retain
kubectl -n e2e-cluster1  get tc e2e-cluster1-other -o yaml  |grep pvReclaimPolicy
  pvReclaimPolicy: Retain

proposal:

weekface commented 5 years ago

OK, you need more PVs on DinD env if run e2e. Rebuild the DinD env with:

Firstly, run the following commands to destroy the DinD Kubernetes cluster:

$ manifests/local-dind/dind-cluster-v1.12.sh clean
$ sudo rm -rf data/kube-node-*

Then rebuild the DinD env with 100 PVs(enough) per node:

$ PV_NUMS=100 KUBE_REPO_PREFIX=uhub.ucloud.cn/pingcap manifests/local-dind/dind-cluster-v1.12.sh up