Closed gregwebs closed 5 years ago
kubectl describe po -n tidb21 demo-tikv-2
kubectl get pvc -n tidb21
kubectl get pv
kubectl describe po -n tidb21 demo-tikv-2
Name: demo-tikv-2
Namespace: tidb21
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app.kubernetes.io/component=tikv
app.kubernetes.io/instance=tidb21
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
controller-revision-hash=demo-tikv-874b8bf89
statefulset.kubernetes.io/pod-name=demo-tikv-2
Annotations: pingcap.com/last-applied-configuration:
{"volumes":[{"name":"annotations","downwardAPI":{"items":[{"path":"annotations","fieldRef":{"fieldPath":"metadata.annotations"}}]}},{"name...
prometheus.io/path: /metrics
prometheus.io/port: 20180
prometheus.io/scrape: true
Status: Pending
IP:
Controlled By: StatefulSet/demo-tikv
Init Containers:
wait-for-pd:
Image: gcr.io/pingcap-tidb-alpha/tidb-operator:v1.0.0-beta.3.start-fast-16
Port: <none>
Host Port: <none>
Command:
wait-for-pd
Environment:
NAMESPACE: tidb21 (v1:metadata.namespace)
CLUSTER_NAME: demo
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-x6265 (ro)
Containers:
tikv:
Image: pingcap/tikv:v3.0.0-rc.1
Port: 20160/TCP
Host Port: 0/TCP
Command:
/bin/sh
/usr/local/bin/tikv_start_script.sh
Requests:
cpu: 1
memory: 2Gi
Environment:
NAMESPACE: tidb21 (v1:metadata.namespace)
CLUSTER_NAME: demo
HEADLESS_SERVICE_NAME: demo-tikv-peer
CAPACITY: 0
TZ: UTC
Mounts:
/etc/podinfo from annotations (ro)
/etc/tikv from config (ro)
/usr/local/bin from startup-script (ro)
/var/lib/tikv from tikv (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-x6265 (ro)
Volumes:
tikv:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: tikv-demo-tikv-2
ReadOnly: false
annotations:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: demo-tikv
Optional: false
startup-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: demo-tikv
Optional: false
default-token-x6265:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-x6265
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
tidb.pingcap.com/tidb-scaler=n1-standard-2-375:NoSchedule
Events: <none>
kubectl get pvc -n tidb21
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pd-demo-pd-0 Bound pvc-21708809-942a-11e9-aab8-4201ac1f4008 5Gi RWO pd-ssd-wait 110m
pd-demo-pd-1 Bound pvc-2175417b-942a-11e9-aab8-4201ac1f4008 5Gi RWO pd-ssd-wait 110m
pd-demo-pd-2 Bound pvc-217981bc-942a-11e9-aab8-4201ac1f4008 5Gi RWO pd-ssd-wait 110m
tikv-demo-tikv-0 Pending local-storage 110m
tikv-demo-tikv-1 Pending local-storage 110m
tikv-demo-tikv-2 Bound local-pv-3c9d1093 368Gi RWO local-storage 110m
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-1c02244d 368Gi RWO Delete Available local-storage 44h
local-pv-3c9d1093 368Gi RWO Retain Bound tidb21/tikv-demo-tikv-2 local-storage 113m
local-pv-52bb53c 368Gi RWO Delete Available local-storage 110m
local-pv-5e3c2064 368Gi RWO Delete Available local-storage 44h
local-pv-69e7f7f9 368Gi RWO Delete Available local-storage 44h
local-pv-6a9c1bf9 368Gi RWO Delete Available local-storage 110m
local-pv-82f4cde9 368Gi RWO Delete Available local-storage 15h
local-pv-8b5c80f4 368Gi RWO Delete Available local-storage 44h
local-pv-92134e5d 368Gi RWO Delete Available local-storage 21h
local-pv-92524f84 368Gi RWO Delete Available local-storage 44h
local-pv-99d360f 368Gi RWO Delete Available local-storage 20h
local-pv-a2973354 368Gi RWO Delete Available local-storage 20h
local-pv-b06a079e 368Gi RWO Delete Available local-storage 21h
local-pv-b1e66ac4 368Gi RWO Delete Available local-storage 110m
local-pv-ba5e9234 368Gi RWO Delete Available local-storage 22h
local-pv-bb23005c 368Gi RWO Delete Available local-storage 22h
local-pv-da125dd4 368Gi RWO Delete Available local-storage 44h
local-pv-e8210ae5 368Gi RWO Delete Available local-storage 18h
local-pv-f4f18899 368Gi RWO Delete Available local-storage 22h
pvc-21708809-942a-11e9-aab8-4201ac1f4008 5Gi RWO Retain Bound tidb21/pd-demo-pd-0 pd-ssd-wait 112m
pvc-2175417b-942a-11e9-aab8-4201ac1f4008 5Gi RWO Retain Bound tidb21/pd-demo-pd-1 pd-ssd-wait 111m
pvc-217981bc-942a-11e9-aab8-4201ac1f4008 5Gi RWO Retain Bound tidb21/pd-demo-pd-2 pd-ssd-wait 112m
pvc-a518b9e1-920e-11e9-afc9-4201ac1f4006 2Gi RWO Delete Bound operations/tidb-data-mysql-0 standard 2d18h
pvc-b2d05151-9200-11e9-afc9-4201ac1f4006 2Gi RWO Delete Bound monitor/database-netdata-master-0 standard 2d19h
pvc-b2d39b21-9200-11e9-afc9-4201ac1f4006 1Gi RWO Delete Bound monitor/alarms-netdata-master-0 standard 2d19h
The tikv-2 PVC was bound, but can't scheduled, and there are no events
, so this should be a kube-scheduler problem we have met frequency in our k8s env recently? @cofyc
kube-scheduler
in the tidb-scheduler
pod?As per #468 this blocks a new cluster from being scheduled.
The tidb-scheduler logs are listed above. kube-scheduler looks the same.
E0621 16:04:47.293641 1 factory.go:1519] Error scheduling tidb21/demo-tikv-0: Failed filter with extender at URL http://127.0.0.1:10262/scheduler/filter, code 500; retrying
E0621 16:04:47.296992 1 scheduler.go:546] error selecting node for pod: Failed filter with extender at URL http://127.0.0.1:10262/scheduler/filter, code 500
E0621 16:04:47.297663 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.297676 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.297912 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.297920 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.298155 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.298163 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
I0621 16:04:47.693084 1 trace.go:76] Trace[1601680201]: "Scheduling tidb21/demo-tikv-1" (started: 2019-06-21 16:04:47.297100315 +0000 UTC m=+161567.461774652) (total time: 395.944579ms):
Trace[1601680201]: [395.944579ms] [395.883901ms] END
E0621 16:04:47.694697 1 factory.go:1519] Error scheduling tidb21/demo-tikv-1: Failed filter with extender at URL http://127.0.0.1:10262/scheduler/filter, code 500; retrying
E0621 16:04:47.701308 1 scheduler.go:546] error selecting node for pod: Failed filter with extender at URL http://127.0.0.1:10262/scheduler/filter, code 500
E0621 16:04:47.702889 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.702909 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.702929 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.702950 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.703229 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
E0621 16:04:47.703242 1 predicates.go:1277] Node not found, gke-alpha-tidb-custom-6-11008-0-12ae9ca3-xp99
I0621 16:04:48.092963 1 trace.go:76] Trace[147365297]: "Scheduling tidb21/demo-tikv-0" (started: 2019-06-21 16:04:47.7020882 +0000 UTC m=+161567.866762536) (total time: 390.828437ms):
Trace[147365297]: [390.828437ms] [390.756271ms] END
Is the unscheduled pod retried by the scheduler repeatedly? If the scheduler retries scheduling the pod but always fail, it is unrelated to the issue we found in IDC k8s env. In IDC k8s env, the tidb-scheduler didn't try to schedule the new TikV pods.
Yes, it keeps trying to schedule.
@cofyc suggests:
or uprade to v1.14+
@gregwebs can you have a try?
I filled out a form to be an alpha user of 1.14 on GKE. I am still waiting... tidb-operator is using kube-scheduler v1.13.6 which matches the GKE version when it was installed. I will update my version of tidb-operator.
I cannot reproduce this anymore
Bug Report
The scheduler continually logs:
The kube-scheduler log is similar.
There are no events for tikv-2 when it is described.
I got to this state after creating a tidb cluster, then creating a 2nd tidb cluster and deleting the first cluster. I deleted Released PV.