pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.2k stars 490 forks source link

A TiKV was restarted before it was stopped for scheduling, and BR did not exit in a timely manner and set the status to failed, resulting in the cluster remaining in the pause schedule state #5583

Closed zhongmin-amin closed 3 months ago

zhongmin-amin commented 3 months ago

Bug Report

What version of Kubernetes are you using?

What version of TiDB Operator are you using?

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

What's the status of the TiDB cluster pods?

What did you do?

  1. do rolling restart
  2. do volumebackup

What did you expect to see?

  1. volumebackup success
  2. The pause schedule lasts for a short period of time

What did you see instead?

  1. volumebackup is stuck, when exceed Volume Backup Init Job Max Active Seconds, volumebackup is set to failed
  2. img_v3_0295_c15d2c9e-311b-4186-81b8-366bd787a74g

img_v3_0295_7667b5b9-b3a4-4a11-8b60-5cc8996adeeg img_v3_0294_5b2d712b-d82b-4941-8cdd-1424aa0e540g image

BornChanger commented 3 months ago

track and closed in tidb repo https://github.com/pingcap/tidb/issues/52243