pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.2k stars 490 forks source link

tidb-scheduler pod keep crashing with TiDB Operator v1.5.3 #5623

Closed yahonda closed 2 months ago

yahonda commented 2 months ago

Bug Report

What version of Kubernetes are you using?

$ kubectl version
Client Version: v1.29.3+k3s1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.3+k3s1

What version of TiDB Operator are you using?

$ kubectl exec -n tidb-admin tidb-controller-manager-7f8c786f9-s8pls -- tidb-controller-manager -V
TiDB Operator Version: version.Info{GitVersion:"v1.5.3", GitCommit:"2c9e4dad0abaa4400afdef9ceff3084e71510ecb", GitTreeState:"clean", BuildDate:"2024-04-18T03:46:15Z", GoVersion:"go1.21.6", Compiler:"gc", Platform:"linux/amd64"}

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

$ kubectl get sc
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   false                  12m
$

What's the status of the TiDB cluster pods?

ubuntu@k3s:~$ kubectl get po --all-namespaces -o wide
NAMESPACE     NAME                                      READY   STATUS             RESTARTS      AGE   IP           NODE   NOMINATED NODE   READINESS GATES
kube-system   local-path-provisioner-6c86858495-sc5ds   1/1     Running            0             14m   10.42.0.4    k3s    <none>           <none>
kube-system   coredns-6799fbcd5-q8qt4                   1/1     Running            0             14m   10.42.0.3    k3s    <none>           <none>
kube-system   helm-install-traefik-crd-gvq88            0/1     Completed          0             14m   10.42.0.5    k3s    <none>           <none>
kube-system   metrics-server-54fd9b65b-tg2zq            1/1     Running            0             14m   10.42.0.6    k3s    <none>           <none>
kube-system   helm-install-traefik-vqdq7                0/1     Completed          1             14m   10.42.0.2    k3s    <none>           <none>
kube-system   svclb-traefik-5a9ec57e-tx2dt              2/2     Running            0             13m   10.42.0.7    k3s    <none>           <none>
kube-system   traefik-f4564c4f4-xmcqp                   1/1     Running            0             13m   10.42.0.8    k3s    <none>           <none>
tidb-admin    tidb-controller-manager-7f8c786f9-s8pls   1/1     Running            0             12m   10.42.0.9    k3s    <none>           <none>
tidb-admin    tidb-scheduler-6649dfb5d9-hdgls           1/2     CrashLoopBackOff   7 (42s ago)   12m   10.42.0.10   k3s    <none>           <none>
ubuntu@k3s:~$

What did you do? I attempt to run TiDB Operator v1.5.3 and follow the step below, it used to work with TiDB Operator v1.5.1. Just replacing 1.5.1 and 1.5.3

on Macbook Pro as host

runcmd:

ssh_authorized_keys:

% multipass launch --cpus 8 --memory 8G --disk 20G --name k3s --cloud-init cloud-init.yaml
% multipass shell k3s

on Ubuntu guest

  1. Allow non root user to run kubectl (optional)
mkdir -p ~/.kube
sudo kubectl config view --raw >> ~/.kube/config
chmod 600 ~/.kube/config
echo "export KUBECONFIG=~/.kube/config" >> ~/.bashrc
source ~/.bashrc
  1. Follow these steps to install TiDB Operator v1.5.3
    kubectl create -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.3/manifests/crd.yaml
    helm repo add pingcap https://charts.pingcap.org/
    kubectl create namespace tidb-admin
    helm install --namespace tidb-admin tidb-operator pingcap/tidb-operator --version v1.5.3

What did you expect to see? Both of tidb-controller and tidb-scheduler are Running.

What did you see instead? tidb-scheduler keeps crashing.

ubuntu@k3s:~$ kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
NAME                                      READY   STATUS             RESTARTS      AGE
tidb-controller-manager-7f8c786f9-s8pls   1/1     Running            0             18m
tidb-scheduler-6649dfb5d9-hdgls           1/2     CrashLoopBackOff   8 (48s ago)   18m
csuzhangxc commented 2 months ago

we do not recommend using tidb-scheduler now, in newer versions of k8s (v1.19+), there is no need to install it.

when installing, .scheduler.create: false can be used to install TiDB Operator without tidb-scheduler (https://github.com/pingcap/tidb-operator/blob/v1.5.3/charts/tidb-operator/values.yaml#L152). This is the default behavior in the coming v1.6

yahonda commented 2 months ago

Would you advise how to set .scheduler.create: false ?

yahonda commented 2 months ago

It works. helm install --namespace tidb-admin tidb-operator pingcap/tidb-operator --version v1.5.3 --set scheduler.create=false

ubuntu@k3s:~$ helm install --namespace tidb-admin tidb-operator pingcap/tidb-operator --version v1.5.3 --set scheduler.create=false
NAME: tidb-operator
LAST DEPLOYED: Fri Apr 19 15:38:17 2024
NAMESPACE: tidb-admin
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Make sure tidb-operator components are running:

    kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
ubuntu@k3s:~$     kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
NAME                                      READY   STATUS              RESTARTS   AGE
tidb-controller-manager-7f8c786f9-tbm97   0/1     ContainerCreating   0          6s
ubuntu@k3s:~$     kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
NAME                                      READY   STATUS              RESTARTS   AGE
tidb-controller-manager-7f8c786f9-tbm97   0/1     ContainerCreating   0          33s
ubuntu@k3s:~    kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
NAME                                      READY   STATUS    RESTARTS   AGE
tidb-controller-manager-7f8c786f9-tbm97   1/1     Running   0          55s
ubuntu@k3s:~$