pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 493 forks source link

Mechanism to gracefully shut down a Pod in TiDB cluster #4758

Closed luohao closed 1 year ago

luohao commented 1 year ago

Feature Request

Is your feature request related to a problem? Please describe: We have a node drainer service in k8s cluster that rotates k8s nodes(EC2 VMs), and all Pods on the victim node are evicted with the eviction API. Conflicts between tidb-operator initiated Pod rolling upgrade and node drainer initiated Pod eviction will result in availability issues in PD and TiKV.

Describe the feature you'd like: We would like a mechanism for external operators(e.g., cluster admin, node maintenance service etc) to notify tidb-operator that a Pod in the cluster needs to be restarted. tidb-operator should decide when and how the Pod will be restarted.

tidb-operator already has graceful shutdown for TiKV Pod using annotation(proposal). We'd like something similar but need it to

  1. support all componenets(e.g., PD and TiDB etc): right now Pod controller only syncs TiKV pod.
  2. support multi-cluster deployment.

Describe alternatives you've considered: We already have a scheduled maintenance window but with multi-cluster deployment(i.e., a TiDB cluster spans across 3 k8s cluster) it's hard to avoid conflicts(e.g., an ongoing rolling upgrade in cluster-1 and a node drain in cluster-3 may still have conflicts).

Teachability, Documentation, Adoption, Migration Strategy:

We plan to integrate this with out node drainer service so that TiDB Pods can be safely evicted from the victim node. This should minimize the disruption caused by maintenance tasks in general.

hanlins commented 1 year ago

Hi @luohao, thanks for raising the issue here! I found that in this issue, you're planning to support graceful shutdown for all components, I'm wondering if we could tackle this one by one? IMO the procedure for evicting different components could have subtle differences. Take an example, for a PD instance, we might want to evict it when A). It's no longer the leader, B). A quorum of PD members is up to date and can take requests. Whereas for TiDB, the eviction criteria could be there's no connection affiliated with the pod. Maybe we can break this down, propose solutions respectively and solve them separately? It would also be helpful to dispatch these problems to multiple developers. What do you think?