pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 493 forks source link

Docker for Mac DinD deploy failed #170

Closed kirinse closed 5 years ago

kirinse commented 5 years ago

followed instruction local-dind-tutorial.md

root@kube-master:/# kubectl get pods --all-namespaces 
NAMESPACE     NAME                                      READY     STATUS             RESTARTS   AGE
kube-system   etcd-kube-master                          1/1       Running            0          25m
kube-system   kube-apiserver-kube-master                1/1       Running            0          26m
kube-system   kube-controller-manager-kube-master       1/1       Running            0          25m
kube-system   kube-dns-64d6979467-8sqrd                 3/3       Running            12         26m
kube-system   kube-flannel-ds-amd64-cjnnz               1/1       Running            0          26m
kube-system   kube-flannel-ds-amd64-fhtv9               1/1       Running            0          26m
kube-system   kube-flannel-ds-amd64-rbq8n               1/1       Running            0          26m
kube-system   kube-flannel-ds-amd64-vczqs               1/1       Running            0          26m
kube-system   kube-proxy-54hph                          1/1       Running            0          26m
kube-system   kube-proxy-g95qx                          1/1       Running            0          26m
kube-system   kube-proxy-ks7gq                          1/1       Running            0          26m
kube-system   kube-proxy-njl5h                          1/1       Running            0          26m
kube-system   kube-scheduler-kube-master                1/1       Running            0          25m
kube-system   kubernetes-dashboard-68ddc89549-6nclg     1/1       Running            0          26m
kube-system   local-volume-provisioner-9rfvm            1/1       Running            0          26m
kube-system   local-volume-provisioner-lhkbh            1/1       Running            0          26m
kube-system   local-volume-provisioner-qs6n7            1/1       Running            0          26m
kube-system   registry-proxy-6wth2                      1/1       Running            0          26m
kube-system   registry-proxy-bzp45                      1/1       Running            0          26m
kube-system   registry-proxy-cxcpg                      1/1       Running            0          26m
kube-system   registry-proxy-z4l6c                      1/1       Running            0          26m
kube-system   tiller-deploy-df4fdf55d-lhk9p             1/1       Running            0          24m
tidb-admin    tidb-controller-manager-bcc66f746-t4tsq   1/1       Running            0          22m
tidb-admin    tidb-scheduler-5b85b688c6-wrvbg           2/2       Running            0          22m
tidb          demo-monitor-5bc85fdb7f-n4vj7             2/2       Running            0          20m
tidb          demo-monitor-configurator-sn5hb           0/1       Completed          1          20m
tidb          demo-pd-0                                 1/1       Running            0          20m
tidb          demo-pd-1                                 1/1       Running            0          20m
tidb          demo-pd-2                                 0/1       Pending            0          20m
tidb          demo-tidb-0                               0/1       CrashLoopBackOff   7          17m
tidb          demo-tidb-1                               0/1       Running            8          17m
tidb          demo-tikv-0                               2/2       Running            4          20m
tidb          demo-tikv-1                               2/2       Running            4          20m
tidb          demo-tikv-2                               0/2       Pending            0          20m

kubectl describe pod demo-pd-2 -n tidb

Name:           demo-pd-2
Namespace:      tidb
Node:           <none>
Labels:         app.kubernetes.io/component=pd
                app.kubernetes.io/instance=demo
                app.kubernetes.io/managed-by=tidb-operator
                app.kubernetes.io/name=tidb-cluster
                controller-revision-hash=demo-pd-579d4c4bdf
                statefulset.kubernetes.io/pod-name=demo-pd-2
                tidb.pingcap.com/cluster-id=6621035670618381862
Annotations:    pingcap.com/last-applied-configuration={"volumes":[{"name":"annotations","downwardAPI":{"items":[{"path":"annotations","fieldRef":{"fieldPath":"metadata.annotations"}}]}},{"name":"config","configMap":...
                prometheus.io/path=/metrics
                prometheus.io/port=2379
                prometheus.io/scrape=true
Status:         Pending
IP:             
Controlled By:  StatefulSet/demo-pd
Containers:
  pd:
    Image:       pingcap/pd:v2.0.7
    Ports:       2380/TCP, 2379/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      /bin/sh
      /usr/local/bin/pd_start_script.sh
    Environment:
      NAMESPACE:          tidb (v1:metadata.namespace)
      PEER_SERVICE_NAME:  demo-pd-peer
      SERVICE_NAME:       demo-pd
      SET_NAME:           demo-pd
      TZ:                 UTC
    Mounts:
      /etc/pd from config (ro)
      /etc/podinfo from annotations (ro)
      /usr/local/bin from startup-script (ro)
      /var/lib/pd from pd (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qkthr (ro)
Volumes:
  pd:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pd-demo-pd-2
    ReadOnly:   false
  annotations:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      demo-pd
    Optional:  false
  startup-script:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      demo-pd
    Optional:  false
  default-token-qkthr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qkthr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From            Message
  ----     ------            ----               ----            -------
  Warning  FailedScheduling  23m                tidb-scheduler  pod has unbound PersistentVolumeClaims (repeated 3 times)
  Warning  FailedScheduling  3m (x71 over 23m)  tidb-scheduler  Failed filter with extender at URL http://127.0.0.1:10262/scheduler, code 500
weekface commented 5 years ago

@kirinse Thank you for your attention.

We add an extended scheduler to K8s, i have to check its logs.

Could you please provide these outputs:

1, The PV list:

kubectl get pv

2, tidb-scheduler logs:

kubectl get po -n tidb-admin
NAME                                       READY     STATUS    RESTARTS   AGE
tidb-controller-manager-bcc66f746-t4tsq   1/1       Running   0          1h
tidb-scheduler-5b85b688c6-wrvbg            2/2       Running   0          1h

kubectl logs -f -n tidb-admin tidb-scheduler-5b85b688c6-wrvbg -c tidb-scheduler

replace tidb-scheduler-5b85b688c6-wrvbg with your real pod name.

kirinse commented 5 years ago

@weekface thank you for your quick response.

kubectl get pv

NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                   STORAGECLASS    REASON    AGE
local-pv-1767facf   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-1dbd65bc   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-37c274de   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-1       local-storage             36m
local-pv-45324aa3   238476Gi   RWO            Retain           Bound       tidb/tikv-demo-tikv-1   local-storage             36m
local-pv-62446ab1   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-67e0e52d   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-7e1a02ed   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-820ea0a0   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-2       local-storage             36m
local-pv-8a0a2eb0   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-9371219a   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-0       local-storage             36m
local-pv-bf3146fc   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-c4277489   238476Gi   RWO            Delete           Available                           local-storage             36m
local-pv-cfa833c6   238476Gi   RWO            Retain           Bound       tidb/tikv-demo-tikv-2   local-storage             36m
local-pv-f1f39fe7   238476Gi   RWO            Retain           Bound       tidb/tikv-demo-tikv-0   local-storage             36m
local-pv-f2cc9d77   238476Gi   RWO            Delete           Available                           local-storage             36m

kubectl logs -f -n tidb-admin tidb-scheduler-5b85b688c6-wrvbg -c tidb-scheduler

I1107 09:29:33.750286       1 version.go:37] Welcome to TiDB Operator.
I1107 09:29:33.750406       1 version.go:38] Git Commit Hash: b779ae6f111f341802a85b4be2d524b7ed605331
I1107 09:29:33.750411       1 version.go:39] UTC Build Time:  2018-11-06 04:04:11
I1107 09:29:33.752248       1 mux.go:60] start scheduler extender server, listening on 0.0.0.0:10262
I1107 09:29:50.999869       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:29:51.082799       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:51.086261       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:29:51.091025       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:52.089737       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:29:52.177854       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:52.181548       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:29:52.198180       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:54.186310       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:29:54.276168       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:54.279738       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:29:54.284608       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:58.281417       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:29:58.376853       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:29:58.382925       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:29:58.386798       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:30:06.384973       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:30:06.479468       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:30:06.484521       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:30:06.489510       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:30:22.486357       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:30:22.490099       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:30:22.576954       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:30:22.580687       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:30:54.497027       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:30:54.500387       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:30:54.585628       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:30:54.590169       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:31:54.679235       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:31:54.686643       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:31:54.691308       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:31:54.695033       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:31:55.693478       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:31:55.779798       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:31:55.785466       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:31:55.791430       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:31:57.786711       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
E1107 09:31:57.878974       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 09:31:57.884592       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
E1107 09:31:57.900293       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
weekface commented 5 years ago

Sorry, can you provide these outputs also:

kubectl get no
kubectl get po -n tidb -owide
kirinse commented 5 years ago

kubectl get no

NAME          STATUS    ROLES     AGE       VERSION
kube-master   Ready     master    1h        v1.10.5
kube-node-1   Ready     <none>    1h        v1.10.5
kube-node-2   Ready     <none>    1h        v1.10.5
kube-node-3   Ready     <none>    1h        v1.10.5

kubectl get po -n tidb -o wide

NAME                              READY     STATUS             RESTARTS   AGE       IP            NODE
demo-monitor-5bc85fdb7f-n4vj7     2/2       Running            4          1h        10.244.2.11   kube-node-2
demo-monitor-configurator-sn5hb   0/1       Completed          1          1h        10.244.1.6    kube-node-3
demo-pd-0                         1/1       Running            10         1h        10.244.3.20   kube-node-1
demo-pd-1                         1/1       Running            10         1h        10.244.1.14   kube-node-3
demo-pd-2                         0/1       Pending            0          1h        <none>        <none>
demo-tidb-0                       0/1       Running            16         1h        10.244.1.16   kube-node-3
demo-tidb-1                       0/1       CrashLoopBackOff   16         1h        10.244.3.17   kube-node-1
demo-tikv-0                       1/2       CrashLoopBackOff   20         1h        10.244.2.14   kube-node-2
demo-tikv-1                       1/2       CrashLoopBackOff   20         1h        10.244.3.18   kube-node-1
demo-tikv-2                       0/2       Pending            0          1h        <none>        <none>
weekface commented 5 years ago

Oh, the default log level is 2, please modify the log level to 4 and get the tidb-scheduler's logs again:

kubectl edit deploy -n pingcap tidb-scheduler
...
      containers:
      - command:
        - /usr/local/bin/tidb-scheduler
        - -v=2
        - -port=10262
        image: localhost:5000/pingcap/tidb-operator:latest
        imagePullPolicy: Always
        name: tidb-scheduler
...

and change - -v=2 to - -v=4

tennix commented 5 years ago

Could you provide the output of kubectl get pv?

kirinse commented 5 years ago

@weekface

kubectl logs -f -n tidb-admin tidb-scheduler-68c6b47498-cqd8w -c tidb-scheduler

I1107 10:35:01.588200       1 version.go:37] Welcome to TiDB Operator.
I1107 10:35:01.588390       1 version.go:38] Git Commit Hash: b779ae6f111f341802a85b4be2d524b7ed605331
I1107 10:35:01.588397       1 version.go:39] UTC Build Time:  2018-11-06 04:04:11
I1107 10:35:01.589756       1 mux.go:60] start scheduler extender server, listening on 0.0.0.0:10262
I1107 10:36:02.735488       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:36:02.735542       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:36:02.755197       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:02.757923       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:36:02.757959       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:36:02.761698       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:03.833986       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:36:03.834059       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:36:03.839983       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:03.845109       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:36:03.845149       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:36:03.848358       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:05.848968       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:36:05.849026       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:36:05.853956       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:05.857275       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:36:05.857314       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:36:05.934393       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:09.859208       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:36:09.859252       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:36:09.864846       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:09.941271       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:36:09.941322       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:36:09.948397       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:17.872355       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:36:17.872395       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:36:17.877219       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:17.956285       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:36:17.956336       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:36:17.960351       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:33.883130       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:36:33.883193       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:36:33.934535       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:36:33.967279       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:36:33.967338       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:36:33.971626       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:37:05.940401       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:37:05.940443       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:37:05.943939       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:37:06.035090       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:37:06.035164       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:37:06.042428       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:05.951983       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:38:05.952028       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:38:05.956685       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:06.051733       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:38:06.051785       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:38:06.056060       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:06.961878       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:38:06.961921       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:38:06.968359       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:07.062084       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:38:07.062135       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:38:07.066808       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:08.974982       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:38:08.975033       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:38:08.979332       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:09.071876       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:38:09.071921       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:38:09.077528       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:12.985025       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:38:12.985071       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:38:12.989279       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:13.083948       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:38:13.084008       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:38:13.091092       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:20.994732       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:38:20.994775       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:38:21.037100       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:21.099360       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:38:21.099411       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:38:21.103540       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:37.043035       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:38:37.043079       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:38:37.047936       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:38:37.138102       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:38:37.138148       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:38:37.145989       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:39:09.055704       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:39:09.055747       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:39:09.060804       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:39:09.153606       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:39:09.153660       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:39:09.156774       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:09.068262       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:40:09.068314       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:40:09.141687       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:09.163422       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:40:09.163473       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:40:09.167276       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:10.147831       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:40:10.147875       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:40:10.153196       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:10.245557       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:40:10.245598       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:40:10.248539       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:12.247602       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:40:12.247655       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:40:12.253328       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:12.257506       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:40:12.257553       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:40:12.345764       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:16.259222       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:40:16.259313       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:40:16.341097       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:16.351384       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:40:16.351430       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:40:16.354883       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:24.347358       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:40:24.347404       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:40:24.351971       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:24.441833       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:40:24.441899       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:40:24.449016       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:40.356755       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:40:40.356800       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:40:40.360186       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:40:40.456533       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:40:40.456654       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:40:40.462478       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:41:12.365479       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:41:12.365543       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:41:12.441869       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:41:12.467554       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:41:12.467644       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:41:12.471118       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:12.449723       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:42:12.449803       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:42:12.544695       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:12.555344       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:42:12.555392       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:42:12.561499       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:13.549852       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:42:13.549895       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:42:13.643538       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:13.650638       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:42:13.650798       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:42:13.656557       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:15.647575       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:42:15.647619       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:42:15.651273       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:15.743731       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:42:15.743788       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:42:15.748409       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:19.748460       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:42:19.748548       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:42:19.754403       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:19.844387       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:42:19.844459       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:42:19.849864       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:27.844806       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:42:27.844854       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:42:27.849480       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:27.856960       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:42:27.856975       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:42:27.860572       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:43.855968       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:42:43.856112       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:42:43.861569       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:42:43.868215       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:42:43.868313       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:42:43.947077       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:43:15.869805       1 scheduler.go:76] scheduling pod: tidb/demo-tikv-2
I1107 10:43:15.869913       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-2]
E1107 10:43:15.946481       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
I1107 10:43:15.954218       1 scheduler.go:76] scheduling pod: tidb/demo-pd-2
I1107 10:43:15.954262       1 scheduler.go:79] entering predicate: HighAvailability, nodes: [kube-node-1]
E1107 10:43:15.959137       1 mux.go:106] unable to filter nodes: the first 3 pods can't be scheduled to the same node
kirinse commented 5 years ago

@tennix

kubectl get pv

NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                   STORAGECLASS    REASON    AGE
local-pv-1767facf   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-1dbd65bc   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-37c274de   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-1       local-storage             2h
local-pv-45324aa3   238476Gi   RWO            Retain           Bound       tidb/tikv-demo-tikv-1   local-storage             2h
local-pv-62446ab1   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-67e0e52d   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-7e1a02ed   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-820ea0a0   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-2       local-storage             2h
local-pv-8a0a2eb0   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-9371219a   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-0       local-storage             2h
local-pv-bf3146fc   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-c4277489   238476Gi   RWO            Delete           Available                           local-storage             2h
local-pv-cfa833c6   238476Gi   RWO            Retain           Bound       tidb/tikv-demo-tikv-2   local-storage             2h
local-pv-f1f39fe7   238476Gi   RWO            Retain           Bound       tidb/tikv-demo-tikv-0   local-storage             2h
local-pv-f2cc9d77   238476Gi   RWO            Delete           Available                           local-storage             2h
weekface commented 5 years ago

The logs indicate that K8s is trying to schedule the tidb/demo-pd-2 Pod to kube-node-1, but the tidb/demo-pd-0 Pod has been scheduled to kube-node-1.

TiDB operator doesn't allow the first 3 pods to be scheduled to the same node. For example, pd-0 and pd-2 can't be scheduled to the same kube-node-1 node.

tennix commented 5 years ago

Yes, this is for data safety reasons. If two PD/TiKV pods scheduled on the same node and that node is down, then TiDB is out of service because of two replicas lost.

kirinse commented 5 years ago

The logs indicate that K8s is trying to schedule the tidb/demo-pd-2 Pod to kube-node-1, but the tidb/demo-pd-0 Pod has been scheduled to kube-node-1.

TiDB operator don't allow the first 3 pods can't be scheduled to the same node.

Ok. i just cloned this repo, didn't change anything. How to fix it?

tennix commented 5 years ago

This may be an issue of our recently added scheduler. If the scheduler works correctly, the third PD pod's PV should not bound on the same node as previous PD pod. We'll diagnose it further.

kirinse commented 5 years ago

@tennix thanks. also, i met other problem is manifests/local-dind/dind-cluster-v1.10.sh L1847, helm returned error:

Error: unknown flag: --template

My helm client version is

Client: &version.Version{SemVer:"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844", GitTreeState:"clean"}
tennix commented 5 years ago

You need to upgrade your helm client. v2.8.2 doesn't support --template flag.

kirinse commented 5 years ago

@tennix OK, then you guys need update local-dind-tutorial.md :)

tennix commented 5 years ago

Oh, yeah. The document only requires v2.8.2 or later. Could you help us update the document? Thanks!

weekface commented 5 years ago

Someone reported a bug to K8s: https://github.com/kubernetes/kubernetes/issues/65131, much like this issue.

And this is the fix PR: https://github.com/kubernetes/kubernetes/pull/67556, fixed on v1.12

This causes issues when trying to evaluate future pods with pod affinity/anti-affinity because the pod has not been assumed while the volumes have been decided.

@kirinse @tennix

kirinse commented 5 years ago

Oh, yeah. The document only requires v2.8.2 or later. Could you help us update the document? Thanks!

sent pr #175

tennix commented 5 years ago

Great, thank you for your contribution!

kirinse commented 5 years ago

After fetch updates, now i can't see any demo-tikv/demo-tidb pods.

kubectl get no

NAME          STATUS    ROLES     AGE       VERSION
kube-master   Ready     master    3h        v1.10.5
kube-node-1   Ready     <none>    3h        v1.10.5
kube-node-2   Ready     <none>    3h        v1.10.5
kube-node-3   Ready     <none>    3h        v1.10.5

kubectl get pv

NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM               STORAGECLASS    REASON    AGE
local-pv-1767facf   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-0   local-storage             3h
local-pv-1dbd65bc   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-37c274de   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-45324aa3   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-62446ab1   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-67e0e52d   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-7e1a02ed   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-820ea0a0   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-8a0a2eb0   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-9371219a   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-bf3146fc   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-c4277489   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-cfa833c6   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-f1f39fe7   238476Gi   RWO            Delete           Available                       local-storage             3h
local-pv-f2cc9d77   238476Gi   RWO            Retain           Bound       tidb/pd-demo-pd-1   local-storage             3h

kubectl logs -f -n tidb-admin tidb-scheduler-5b85b688c6-dwpc8 -c tidb-scheduler

I1109 04:24:10.884319       1 version.go:37] Welcome to TiDB Operator.
I1109 04:24:10.885626       1 version.go:38] Git Commit Hash: c5a835d545856ebfa853a291d95b5d1cd99ab8b9
I1109 04:24:10.885662       1 version.go:39] UTC Build Time:  2018-11-09 03:43:33
I1109 04:24:10.937456       1 mux.go:60] start scheduler extender server, listening on 0.0.0.0:10262
I1109 04:39:26.632577       1 scheduler.go:76] scheduling pod: tidb/demo-pd-0
I1109 04:39:27.692360       1 scheduler.go:76] scheduling pod: tidb/demo-pd-0
I1109 04:40:12.290369       1 scheduler.go:76] scheduling pod: tidb/demo-pd-1
I1109 04:40:13.346263       1 scheduler.go:76] scheduling pod: tidb/demo-pd-1

get pods --all-namespaces

NAMESPACE     NAME                                      READY     STATUS             RESTARTS   AGE
kube-system   etcd-kube-master                          1/1       Running            0          3h
kube-system   kube-apiserver-kube-master                1/1       Running            0          3h
kube-system   kube-controller-manager-kube-master       1/1       Running            0          3h
kube-system   kube-dns-64d6979467-6sv55                 2/3       CrashLoopBackOff   7          3h
kube-system   kube-flannel-ds-amd64-7827d               1/1       Running            0          3h
kube-system   kube-flannel-ds-amd64-8nggc               1/1       Running            0          3h
kube-system   kube-flannel-ds-amd64-8qtpr               1/1       Running            0          3h
kube-system   kube-flannel-ds-amd64-mqwxg               1/1       Running            0          3h
kube-system   kube-proxy-2vz85                          1/1       Running            0          3h
kube-system   kube-proxy-hd4cx                          1/1       Running            0          3h
kube-system   kube-proxy-mhq8t                          1/1       Running            0          3h
kube-system   kube-proxy-v5jzp                          1/1       Running            0          3h
kube-system   kube-scheduler-kube-master                1/1       Running            0          3h
kube-system   kubernetes-dashboard-68ddc89549-hstqk     1/1       Running            0          3h
kube-system   local-volume-provisioner-n88lh            1/1       Running            0          3h
kube-system   local-volume-provisioner-p7xdd            1/1       Running            0          3h
kube-system   local-volume-provisioner-pjb2l            1/1       Running            0          3h
kube-system   registry-proxy-966lw                      1/1       Running            0          3h
kube-system   registry-proxy-cll4r                      1/1       Running            0          3h
kube-system   registry-proxy-dh524                      1/1       Running            0          3h
kube-system   registry-proxy-tn8j7                      1/1       Running            0          3h
kube-system   tiller-deploy-6fd8d857bc-g2k4h            1/1       Running            0          3h
tidb-admin    tidb-controller-manager-bcc66f746-p5sth   1/1       Running            0          30m
tidb-admin    tidb-scheduler-5b85b688c6-dwpc8           2/2       Running            0          30m
tidb          demo-monitor-644d69f7db-z7pv4             2/2       Running            0          14m
tidb          demo-monitor-configurator-xbcjv           0/1       Completed          0          14m
tidb          demo-pd-0                                 1/1       Running            0          14m
tidb          demo-pd-1                                 1/1       Running            4          13m
tidb          demo-tidb-initializer-npjw4               1/1       Running            0          14m
tennix commented 5 years ago

We're investigating this issue which may be the bug of kubernetes itself. From what you provided, demo-pd-1 is already scheduled, it's just failed. Could you provide the log of demo-pd-1 by kubectl logs -n tidb demo-pd-1?

kirinse commented 5 years ago

@tennix

kubectl logs -n tidb demo-pd-1


nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

nslookup domain demo-pd-1.demo-pd-peer.tidb.svc failed

Name: demo-pd-1.demo-pd-peer.tidb.svc Address 1: 10.244.1.6 demo-pd-1.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-1.demo-pd-peer.tidb.svc success pd cluster is not ready now: demo-pd.tidb.svc

tennix commented 5 years ago

Did you just delete the previous cluster by helm delete tidb-cluster --purge without cleaning the pvc? You can remove the previous PVC following the guide here and then creating a new cluster again.

kirinse commented 5 years ago

actually

manifests/local-dind/dind-cluster-v1.10.sh stop
manifests/local-dind/dind-cluster-v1.10.sh clean
sudo rm -rf data/kube-node-*
manifests/local-dind/dind-cluster-v1.10.sh up
...
tennix commented 5 years ago

Ah, OK. What about demo-pd-0's log? The pd-2's log shows that pd cluster is not ready. The pd-1 must wait until pd-0 ready.

kirinse commented 5 years ago

kubectl logs -n tidb demo-pd-0


nslookup domain demo-pd-0.demo-pd-peer.tidb.svc failed

nslookup domain demo-pd-0.demo-pd-peer.tidb.svc failed

nslookup domain demo-pd-0.demo-pd-peer.tidb.svc failed

Name: demo-pd-0.demo-pd-peer.tidb.svc Address 1: 10.244.2.9 demo-pd-0.demo-pd-peer.tidb.svc.cluster.local nslookup domain demo-pd-0.demo-pd-peer.tidb.svc success starting pd-server ... /pd-server --data-dir=/var/lib/pd --name=demo-pd-0 --peer-urls=http://0.0.0.0:2380 --advertise-peer-urls=http://demo-pd-0.demo-pd-peer.tidb.svc:2380 --client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://demo-pd-0.demo-pd-peer.tidb.svc:2379 --config=/etc/pd/pd.toml --initial-cluster=demo-pd-0=http://demo-pd-0.demo-pd-peer.tidb.svc:2380 2018/11/09 05:03:05.821 util.go:62: [info] Welcome to Placement Driver (PD). 2018/11/09 05:03:05.821 util.go:63: [info] Release Version: v2.0.5 2018/11/09 05:03:05.821 util.go:64: [info] Git Commit Hash: b64716707b7279a4ae822be767085ff17b5f3fea 2018/11/09 05:03:05.821 util.go:65: [info] Git Branch: release-2.0 2018/11/09 05:03:05.821 util.go:66: [info] UTC Build Time: 2018-09-07 12:34:46 2018/11/09 05:03:05.821 metricutil.go:83: [info] disable Prometheus push client 2018/11/09 05:03:05.822 server.go:96: [info] PD config - Config({FlagSet:0xc00019cd20 Version:false ClientUrls:http://0.0.0.0:2379 PeerUrls:http://0.0.0.0:2380 AdvertiseClientUrls:http://demo-pd-0.demo-pd-peer.tidb.svc:2379 AdvertisePeerUrls:http://demo-pd-0.demo-pd-peer.tidb.svc:2380 Name:demo-pd-0 DataDir:/var/lib/pd InitialCluster:demo-pd-0=http://demo-pd-0.demo-pd-peer.tidb.svc:2380 InitialClusterState:new Join: LeaderLease:3 Log:{Level:info Format:text DisableTimestamp:false File:{Filename: LogRotate:true MaxSize:0 MaxDays:0 MaxBackups:0}} LogFileDeprecated: LogLevelDeprecated: TsoSaveInterval:3s Metric:{PushJob:demo-pd-0 PushAddress: PushInterval:15s} Schedule:{MaxSnapshotCount:3 MaxPendingPeerCount:16 MaxMergeRegionSize:0 SplitMergeInterval:1h0m0s PatrolRegionInterval:100ms MaxStoreDownTime:1h0m0s LeaderScheduleLimit:4 RegionScheduleLimit:4 ReplicaScheduleLimit:8 MergeScheduleLimit:8 TolerantSizeRatio:5 LowSpaceRatio:0.8 HighSpaceRatio:0.6 EnableRaftLearner:false Schedulers:[{Type:balance-region Args:[] Disable:false} {Type:balance-leader Args:[] Disable:false} {Type:hot-region Args:[] Disable:false} {Type:label Args:[] Disable:false}]} Replication:{MaxReplicas:3 LocationLabels:[zone rack host]} Namespace:map[] QuotaBackendBytes:0 AutoCompactionRetention:1 TickInterval:500ms ElectionInterval:3s Security:{CAPath: CertPath: KeyPath:} LabelProperty:map[] configFile:/etc/pd/pd.toml WarningMsgs:[] NamespaceClassifier:default nextRetryDelay:1000000000 disableStrictReconfigCheck:false heartbeatStreamBindInterval:{Duration:60000000000} leaderPriorityCheckInterval:{Duration:60000000000}}) 2018/11/09 05:03:05.826 server.go:122: [info] start embed etcd 2018/11/09 05:03:05.826 log.go:86: [info] embed: [listening for peers on http://0.0.0.0:2380] 2018/11/09 05:03:05.826 log.go:86: [info] embed: [pprof is enabled under /debug/pprof] 2018/11/09 05:03:05.826 log.go:86: [info] embed: [listening for client requests on 0.0.0.0:2379] 2018/11/09 05:03:05.826 systime_mon.go:24: [info] start system time monitor 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [name = demo-pd-0] 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [data dir = /var/lib/pd] 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [member dir = /var/lib/pd/member] 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [heartbeat = 500ms] 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [election = 3000ms] 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [snapshot count = 100000] 2018/11/09 05:03:05.878 log.go:86: [info] etcdserver: [advertise client URLs = http://demo-pd-0.demo-pd-peer.tidb.svc:2379] 2018/11/09 05:03:05.886 log.go:86: [info] etcdserver: [restarting member b798afebf07ff3aa in cluster 2bcb211fc6dc76bd at commit index 43] 2018/11/09 05:03:05.886 log.go:86: [info] raft: [b798afebf07ff3aa became follower at term 282] 2018/11/09 05:03:05.886 log.go:86: [info] raft: [newRaft b798afebf07ff3aa [peers: [], term: 282, commit: 43, applied: 0, lastindex: 44, lastterm: 2]] 2018/11/09 05:03:05.889 log.go:82: [warning] auth: [simple token is not cryptographically signed] 2018/11/09 05:03:05.891 log.go:86: [info] etcdserver: [starting server... [version: 3.2.18, cluster version: to_be_decided]] 2018/11/09 05:03:05.892 log.go:86: [info] etcdserver/membership: [added member b798afebf07ff3aa [http://demo-pd-0.demo-pd-peer.tidb.svc:2380] to cluster 2bcb211fc6dc76bd] 2018/11/09 05:03:05.892 log.go:84: [info] etcdserver/membership: [set the initial cluster version to 3.2] 2018/11/09 05:03:05.892 log.go:86: [info] etcdserver/api: [enabled capabilities for version 3.2] 2018/11/09 05:03:05.893 log.go:86: [info] etcdserver/membership: [added member 4b38fa5a0dce5ba5 [http://demo-pd-1.demo-pd-peer.tidb.svc:2380] to cluster 2bcb211fc6dc76bd] 2018/11/09 05:03:05.893 log.go:86: [info] rafthttp: [starting peer 4b38fa5a0dce5ba5...] 2018/11/09 05:03:05.893 log.go:86: [info] rafthttp: [started HTTP pipelining with peer 4b38fa5a0dce5ba5] 2018/11/09 05:03:05.895 log.go:86: [info] rafthttp: [started streaming with peer 4b38fa5a0dce5ba5 (writer)] 2018/11/09 05:03:05.896 log.go:86: [info] rafthttp: [started streaming with peer 4b38fa5a0dce5ba5 (writer)] 2018/11/09 05:03:05.896 log.go:86: [info] rafthttp: [started peer 4b38fa5a0dce5ba5] 2018/11/09 05:03:05.896 log.go:86: [info] rafthttp: [started streaming with peer 4b38fa5a0dce5ba5 (stream MsgApp v2 reader)] 2018/11/09 05:03:05.896 log.go:86: [info] rafthttp: [added peer 4b38fa5a0dce5ba5] 2018/11/09 05:03:05.896 log.go:86: [info] rafthttp: [started streaming with peer 4b38fa5a0dce5ba5 (stream Message reader)] 2018/11/09 05:03:10.889 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 282] 2018/11/09 05:03:10.889 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 283] 2018/11/09 05:03:10.889 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 283] 2018/11/09 05:03:10.889 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 283] 2018/11/09 05:03:10.896 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: ] 2018/11/09 05:03:15.888 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 283] 2018/11/09 05:03:15.888 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 284] 2018/11/09 05:03:15.888 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 284] 2018/11/09 05:03:15.888 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 284] 2018/11/09 05:03:15.897 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:16.893 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out] 2018/11/09 05:03:18.889 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 284] 2018/11/09 05:03:18.889 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 285] 2018/11/09 05:03:18.889 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 285] 2018/11/09 05:03:18.889 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 285] 2018/11/09 05:03:20.898 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:21.889 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 285] 2018/11/09 05:03:21.889 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 286] 2018/11/09 05:03:21.889 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 286] 2018/11/09 05:03:21.889 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 286] 2018/11/09 05:03:25.899 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:26.889 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 286] 2018/11/09 05:03:26.889 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 287] 2018/11/09 05:03:26.889 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 287] 2018/11/09 05:03:26.889 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 287] 2018/11/09 05:03:27.894 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out] 2018/11/09 05:03:30.900 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp 10.244.1.6:2380: connect: connection refused] 2018/11/09 05:03:32.389 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 287] 2018/11/09 05:03:32.389 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 288] 2018/11/09 05:03:32.389 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 288] 2018/11/09 05:03:32.389 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 288] 2018/11/09 05:03:35.900 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:36.389 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 288] 2018/11/09 05:03:36.389 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 289] 2018/11/09 05:03:36.389 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 289] 2018/11/09 05:03:36.389 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 289] 2018/11/09 05:03:38.895 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out] 2018/11/09 05:03:39.389 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 289] 2018/11/09 05:03:39.389 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 290] 2018/11/09 05:03:39.389 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 290] 2018/11/09 05:03:39.389 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 290] 2018/11/09 05:03:40.901 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:43.889 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 290] 2018/11/09 05:03:43.889 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 291] 2018/11/09 05:03:43.889 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 291] 2018/11/09 05:03:43.889 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 291] 2018/11/09 05:03:45.901 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:49.390 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 291] 2018/11/09 05:03:49.390 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 292] 2018/11/09 05:03:49.390 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 292] 2018/11/09 05:03:49.390 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 292] 2018/11/09 05:03:49.897 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out] 2018/11/09 05:03:50.903 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp 10.244.1.6:2380: connect: connection refused] 2018/11/09 05:03:52.890 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 292] 2018/11/09 05:03:52.890 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 293] 2018/11/09 05:03:52.890 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 293] 2018/11/09 05:03:52.890 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 293] 2018/11/09 05:03:55.904 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:03:57.390 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 293] 2018/11/09 05:03:57.390 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 294] 2018/11/09 05:03:57.390 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 294] 2018/11/09 05:03:57.390 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 294] 2018/11/09 05:04:00.897 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out] 2018/11/09 05:04:00.904 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:04:01.890 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 294] 2018/11/09 05:04:01.890 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 295] 2018/11/09 05:04:01.890 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 295] 2018/11/09 05:04:01.890 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 295] 2018/11/09 05:04:05.904 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:04:06.890 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 295] 2018/11/09 05:04:06.890 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 296] 2018/11/09 05:04:06.890 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 296] 2018/11/09 05:04:06.890 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 296] 2018/11/09 05:04:10.890 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 296] 2018/11/09 05:04:10.890 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 297] 2018/11/09 05:04:10.890 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 297] 2018/11/09 05:04:10.890 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 297] 2018/11/09 05:04:10.905 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp 10.244.1.6:2380: connect: connection refused] 2018/11/09 05:04:11.898 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out] 2018/11/09 05:04:14.390 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 297] 2018/11/09 05:04:14.390 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 298] 2018/11/09 05:04:14.390 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 298] 2018/11/09 05:04:14.390 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 298] 2018/11/09 05:04:15.905 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:04:18.891 log.go:86: [info] raft: [b798afebf07ff3aa is starting a new election at term 298] 2018/11/09 05:04:18.891 log.go:86: [info] raft: [b798afebf07ff3aa became candidate at term 299] 2018/11/09 05:04:18.891 log.go:86: [info] raft: [b798afebf07ff3aa received MsgVoteResp from b798afebf07ff3aa at term 299] 2018/11/09 05:04:18.891 log.go:86: [info] raft: [b798afebf07ff3aa [logterm: 2, index: 44] sent MsgVote request to 4b38fa5a0dce5ba5 at term 299] 2018/11/09 05:04:20.907 log.go:82: [warning] rafthttp: [health check for peer 4b38fa5a0dce5ba5 could not connect: dial tcp: i/o timeout] 2018/11/09 05:04:22.899 log.go:80: [error] etcdserver: [publish error: etcdserver: request timed out]

weekface commented 5 years ago

I see the kube-dns is CrashLoopBackOff, what is the kube-dns log?

kube-system   kube-dns-64d6979467-6sv55                 2/3       CrashLoopBackOff   7          3h
kirinse commented 5 years ago

kubectl logs -f kube-dns-64d6979467-6sv55 -n kube-system -c kubedns

I1109 04:56:45.750242       1 dns.go:48] version: 1.14.8
I1109 04:56:45.751468       1 server.go:71] Using configuration read from directory: /kube-dns-config with period 10s
I1109 04:56:45.751510       1 server.go:119] FLAG: --alsologtostderr="false"
I1109 04:56:45.751515       1 server.go:119] FLAG: --config-dir="/kube-dns-config"
I1109 04:56:45.751518       1 server.go:119] FLAG: --config-map=""
I1109 04:56:45.751519       1 server.go:119] FLAG: --config-map-namespace="kube-system"
I1109 04:56:45.751521       1 server.go:119] FLAG: --config-period="10s"
I1109 04:56:45.751590       1 server.go:119] FLAG: --dns-bind-address="0.0.0.0"
I1109 04:56:45.751644       1 server.go:119] FLAG: --dns-port="10053"
I1109 04:56:45.751649       1 server.go:119] FLAG: --domain="cluster.local."
I1109 04:56:45.751652       1 server.go:119] FLAG: --federations=""
I1109 04:56:45.751658       1 server.go:119] FLAG: --healthz-port="8081"
I1109 04:56:45.751660       1 server.go:119] FLAG: --initial-sync-timeout="1m0s"
I1109 04:56:45.751772       1 server.go:119] FLAG: --kube-master-url=""
I1109 04:56:45.751819       1 server.go:119] FLAG: --kubecfg-file=""
I1109 04:56:45.751838       1 server.go:119] FLAG: --log-backtrace-at=":0"
I1109 04:56:45.751842       1 server.go:119] FLAG: --log-dir=""
I1109 04:56:45.751844       1 server.go:119] FLAG: --log-flush-frequency="5s"
I1109 04:56:45.751860       1 server.go:119] FLAG: --logtostderr="true"
I1109 04:56:45.751862       1 server.go:119] FLAG: --nameservers=""
I1109 04:56:45.751991       1 server.go:119] FLAG: --stderrthreshold="2"
I1109 04:56:45.752009       1 server.go:119] FLAG: --v="2"
I1109 04:56:45.752146       1 server.go:119] FLAG: --version="false"
I1109 04:56:45.752167       1 server.go:119] FLAG: --vmodule=""
I1109 04:56:45.752511       1 server.go:201] Starting SkyDNS server (0.0.0.0:10053)
I1109 04:56:45.753077       1 server.go:220] Skydns metrics enabled (/metrics:10055)
I1109 04:56:45.753108       1 dns.go:146] Starting endpointsController
I1109 04:56:45.753113       1 dns.go:149] Starting serviceController
I1109 04:56:45.753320       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I1109 04:56:45.753423       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I1109 04:56:46.254556       1 dns.go:170] Initialized services and endpoints from apiserver
I1109 04:56:46.254667       1 server.go:135] Setting up Healthz Handler (/readiness)
I1109 04:56:46.254689       1 server.go:140] Setting up cache handler (/cache)
I1109 04:56:46.254698       1 server.go:126] Status HTTP port 8081
I1109 04:58:25.145183       1 server.go:160] Ignoring signal terminated (can only be terminated by SIGKILL)
tennix commented 5 years ago

This is mainly caused by DNS unstable when the cluster was bootstrapping. So it should be another issue, we should create another issue to track this one.

tennix commented 5 years ago

Seems the DNS problem is related to #126, it's a known bug and fixed in latest version of PD.

gregwebs commented 5 years ago

Are we waiting for a new release of PD?

tennix commented 5 years ago

We can run with the current version of PD but it's not very stable especially when bootstrapping with an unstable network setup. This mostly happens in DinD environment.

gregwebs commented 5 years ago

How should @kirinse fix his issue? Can he upgrade to 2.1?

tennix commented 5 years ago

The PD fix commit https://github.com/pingcap/pd/pull/1279 hasn't cherry-picked in v2.1.0-rc.4, so currently only latest version should fix this issue.

gregwebs commented 5 years ago

@tennix can you recommend a set of image tags for @kirinse to use?

tennix commented 5 years ago

Currently, there's no versioned tag contains this fix except for latest tag.

gregwebs commented 5 years ago

that's for PD. For tidb & tikv I think you would want to use the more stable tag v2.1.0-rc.4

kirinse commented 5 years ago

So, no solution for now?

tennix commented 5 years ago

@kirinse We're sorry about these issues, these are related to upstream programs so they're a bit slower to get fixed in tidb-operator. But there are some workarounds you can try now.

For the scheduling issue, you can delete and recreate the cluster. If you're lucky, all the pods can be scheduled correctly. If not, then there is another workaround: set schedulerName to default in charts/tidb-cluster/values.yaml. This disables HA scheduling, multiple PD pods or TiKV pods may be scheduled to the same node, but I think it's ok for DinD test.

For the PD pods bootstrap error, you can try the latest PD Docker image.

gregwebs commented 5 years ago

@tennix this github issue is long and now includes three separate issues. Can we also create a separate issue for the scheduler issue?

@kirinse here is the config change to use latest pd. The PD fix is going into RC5, which should be available soon.

diff --git a/charts/tidb-cluster/values.yaml b/charts/tidb-cluster/values.yaml
index b5cf2c8..a31dd1b 100644
--- a/charts/tidb-cluster/values.yaml
+++ b/charts/tidb-cluster/values.yaml
@@ -30,3 +30,3 @@ pd:
   replicas: 3
-  image: pingcap/pd:v2.0.7
+  image: pingcap/pd:latest
   logLevel: info
@@ -72,3 +72,3 @@ tikv:
   replicas: 3
-  image: pingcap/tikv:v2.0.7
+  image: pingcap/tikv:v2.1.0-rc.4
   logLevel: info
@@ -120,3 +120,3 @@ tidb:
   # password: "admin"
-  image: pingcap/tidb:v2.0.7
+  image: pingcap/tidb:v2.1.0-rc.4
   # Image pull policy.
gregwebs commented 5 years ago

rc5 is available for all three components now.