pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 493 forks source link

FailedScheduling: nodes didn't match node selector #516

Closed jaltabike closed 5 years ago

jaltabike commented 5 years ago

Bug Report

What version of Kubernetes are you using?

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.5", GitCommit:"51dd616cdd25d6ee22c83a858773b607328a18ec", GitTreeState:"clean", BuildDate:"2019-01-16T18:14:49Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"}

What version of TiDB Operator are you using?

TiDB Operator Version: version.Info{TiDBVersion:"2.1.0", GitVersion:"v1.0.0-beta.2", GitCommit:"bc913fc1459118c1d972be37e100d3a70a9e981a", GitTreeState:"clean", BuildDate:"2019-05-10T10:15:56Z", GoVersion:"go1.12", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

NAME PROVISIONER AGE local-storage kubernetes.io/no-provisioner 5m40s

What's the status of the TiDB cluster pods?

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE demo-discovery-58fb7c6765-wwgbx 1/1 Running 0 3m4s 10.244.1.5 kube-node-3 demo-monitor-5ddb4bf8c-9xv96 2/2 Running 0 3m4s 10.244.2.4 kube-node-1 demo-pd-0 0/1 Pending 0 3m4s demo-pd-1 0/1 Pending 0 3m4s demo-pd-2 0/1 Pending 0 3m4s

What did you do?

Following the guide (https://github.com/pingcap/tidb-operator/blob/master/docs/local-dind-tutorial.md) to deploy a TiDB cluster in the DinD Kubernetes cluster.

What did you expect to see? Success to deploy a TiDB cluster in the DinD Kubernetes cluster

What did you see instead? pd's status stucks in pending. When I run "kubectl describe po -n ", I see the following warning: Warning FailedScheduling 2m45s (x2510 over 7m45s) tidb-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.

tennix commented 5 years ago

@jaltabike Could you show if there are available PVs by kubectl get pv? Also is your local-volume-provisioner running? kubectl get po -n kube-system -lapp=local-volume-provisioner

jaltabike commented 5 years ago

@tennix $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-1767facf 29Gi RWO Delete Available local-storage 52m local-pv-1dbd65bc 29Gi RWO Delete Available local-storage 52m local-pv-2a83a815 29Gi RWO Delete Available local-storage 52m local-pv-37c274de 29Gi RWO Delete Available local-storage 52m local-pv-3ece239f 29Gi RWO Delete Available local-storage 52m local-pv-45324aa3 29Gi RWO Delete Available local-storage 52m local-pv-4982aa4a 29Gi RWO Delete Available local-storage 52m local-pv-62446ab1 29Gi RWO Delete Available local-storage 52m local-pv-6635374b 29Gi RWO Delete Available local-storage 52m local-pv-66d8020f 29Gi RWO Delete Available local-storage 52m local-pv-67e0e52d 29Gi RWO Delete Available local-storage 52m local-pv-7e1a02ed 29Gi RWO Delete Available local-storage 52m local-pv-820ea0a0 29Gi RWO Delete Available local-storage 52m local-pv-82ff2b19 29Gi RWO Delete Available local-storage 52m local-pv-8a0a2eb0 29Gi RWO Delete Available local-storage 52m local-pv-8ebb22ac 29Gi RWO Delete Available local-storage 52m local-pv-914e5926 29Gi RWO Delete Available local-storage 52m local-pv-9371219a 29Gi RWO Delete Available local-storage 52m local-pv-a6e4a208 29Gi RWO Delete Available local-storage 52m local-pv-b3d1a2e9 29Gi RWO Delete Available local-storage 52m local-pv-bf3146fc 29Gi RWO Delete Available local-storage 52m local-pv-c4277489 29Gi RWO Delete Available local-storage 52m local-pv-c52702d3 29Gi RWO Delete Available local-storage 52m local-pv-cfa833c6 29Gi RWO Delete Available local-storage 52m local-pv-d20c2706 29Gi RWO Delete Available local-storage 52m local-pv-d4b44d8 29Gi RWO Delete Available local-storage 52m local-pv-e62c3e22 29Gi RWO Delete Available local-storage 52m local-pv-e93f8428 29Gi RWO Delete Available local-storage 52m local-pv-f1f39fe7 29Gi RWO Delete Available local-storage 52m local-pv-f2cc9d77 29Gi RWO Delete Available local-storage 52m

$ kubectl get po -n kube-system -lapp=local-volume-provisioner NAME READY STATUS RESTARTS AGE local-volume-provisioner-b24zr 1/1 Running 0 54m local-volume-provisioner-g52cx 1/1 Running 0 54m local-volume-provisioner-mxsn6 1/1 Running 0 53m

weekface commented 5 years ago

Please view the TidbCluster object information by:

kubectl get tc -n tidb -oyaml

@jlerche

jaltabike commented 5 years ago

@weekface

$ kubectl get tc -n tidb -oyaml
apiVersion: v1
items:
- apiVersion: pingcap.com/v1alpha1
  kind: TidbCluster
  metadata:
    creationTimestamp: "2019-05-24T07:48:47Z"
    generation: 1
    labels:
      app.kubernetes.io/component: tidb-cluster
      app.kubernetes.io/instance: demo
      app.kubernetes.io/managed-by: Tiller
      app.kubernetes.io/name: tidb-cluster
      helm.sh/chart: tidb-cluster-dev
    name: demo
    namespace: tidb
    resourceVersion: "1474"
    selfLink: /apis/pingcap.com/v1alpha1/namespaces/tidb/tidbclusters/demo
    uid: 5c9eb891-7df8-11e9-8409-0242d7093bb5
  spec:
    pd:
      image: pingcap/pd:v2.1.8
      imagePullPolicy: IfNotPresent
      limits: {}
      replicas: 3
      requests:
        storage: 1Gi
      storageClassName: local-storage
    pvReclaimPolicy: Retain
    schedulerName: tidb-scheduler
    services:
    - name: pd
      type: ClusterIP
    tidb:
      image: pingcap/tidb:v2.1.8
      imagePullPolicy: IfNotPresent
      limits: {}
      maxFailoverCount: 3
      replicas: 2
      requests: {}
      slowLogTailer:
        image: busybox:1.26.2
        imagePullPolicy: IfNotPresent
        limits:
          cpu: 100m
          memory: 50Mi
        requests:
          cpu: 20m
          memory: 5Mi
    tikv:
      image: pingcap/tikv:v2.1.8
      imagePullPolicy: IfNotPresent
      limits: {}
      replicas: 3
      requests:
        storage: 10Gi
      storageClassName: local-storage
    tikvPromGateway:
      image: ""
    timezone: UTC
  status:
    pd:
      leader:
        clientURL: ""
        health: false
        id: ""
        lastTransitionTime: null
        name: ""
      phase: Normal
      statefulSet:
        collisionCount: 0
        currentReplicas: 3
        currentRevision: demo-pd-677598c6b9
        observedGeneration: 1
        replicas: 3
        updateRevision: demo-pd-677598c6b9
        updatedReplicas: 3
    tidb: {}
    tikv: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
xiaojingchen commented 5 years ago

@jaltabike Could you give a PD pod's info.

jaltabike commented 5 years ago

@xiaojingchen $ kubectl describe po -n tidb demo-pd-1 Name: demo-pd-1 Namespace: tidb Priority: 0 PriorityClassName: Node: Labels: app.kubernetes.io/component=pd app.kubernetes.io/instance=demo app.kubernetes.io/managed-by=tidb-operator app.kubernetes.io/name=tidb-cluster controller-revision-hash=demo-pd-677598c6b9 statefulset.kubernetes.io/pod-name=demo-pd-1 Annotations: pingcap.com/last-applied-configuration: {"volumes":[{"name":"annotations","downwardAPI":{"items":[{"path":"annotations","fieldRef":{"fieldPath":"metadata.annotations"}}]}},{"name... prometheus.io/path: /metrics prometheus.io/port: 2379 prometheus.io/scrape: true Status: Pending IP:
Controlled By: StatefulSet/demo-pd Containers: pd: Image: pingcap/pd:v2.1.8 Ports: 2380/TCP, 2379/TCP Host Ports: 0/TCP, 0/TCP Command: /bin/sh /usr/local/bin/pd_start_script.sh Environment: NAMESPACE: tidb (v1:metadata.namespace) PEER_SERVICE_NAME: demo-pd-peer SERVICE_NAME: demo-pd SET_NAME: demo-pd TZ: UTC Mounts: /etc/pd from config (ro) /etc/podinfo from annotations (ro) /usr/local/bin from startup-script (ro) /var/lib/pd from pd (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-mscw6 (ro) Volumes: pd: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pd-demo-pd-1 ReadOnly: false annotations: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.annotations -> annotations config: Type: ConfigMap (a volume populated by a ConfigMap) Name: demo-pd Optional: false startup-script: Type: ConfigMap (a volume populated by a ConfigMap) Name: demo-pd Optional: false default-token-mscw6: Type: Secret (a volume populated by a Secret) SecretName: default-token-mscw6 Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Warning FailedScheduling 35s (x103928 over 4h10m) tidb-scheduler 0/4 nodes are available: 4 node(s) didn't match node selector.

shuijing198799 commented 5 years ago

What version of your operator is used for?

tkanng commented 5 years ago

The same problem occurred when I try to deploy TiDB cluster in the DinD Kubernetes cluster. @shuijing198799 My operator's version is :

root@iZhp37kmiszbkwzt5oh9csZ:/k/op# helm get manifest tidb-operator | grep image
        image: pingcap/tidb-operator:v1.0.0-beta.2
        imagePullPolicy: IfNotPresent
        image: pingcap/tidb-operator:v1.0.0-beta.2
        imagePullPolicy: IfNotPresent
        image: mirantis/hypokube:final
        imagePullPolicy: IfNotPresent
weekface commented 5 years ago

@jaltabike @tkanng This issue is fixed by https://github.com/pingcap/tidb-operator/pull/475, can you upgrade to pingcap/tidb-operator:latest and use the latest chart to test it?

By the way, we will release v1.0.0-beta.3 before this weekend which will include this pr: https://github.com/pingcap/tidb-operator/pull/475.

tkanng commented 5 years ago

pingcap/tidb-operator:latest works fine. Thanks!

jaltabike commented 5 years ago

@weekface What does "upgrade to pingcap/tidb-operator:latest" mean? Does it mean to modify the image version in tidb-cluster/values.yaml and tidb-operator/values.yaml?

weekface commented 5 years ago

@jaltabike Yes.

jaltabike commented 5 years ago

@weekface It works now~ Thank you! :)

gregwebs commented 5 years ago

@mysticaltech what environment is your K8s running in? on-prem or a cloud provider?

mysticaltech commented 5 years ago

@gregwebs It was my fault, now this particular problem is indeed fixed with pingcap/tidb-operator:latest