pingcap / tiup

A component manager for TiDB
https://tiup.io
Apache License 2.0
416 stars 306 forks source link

Support starting TiDB cluster managed by K8s and tidb-operator in case that K8s itself has a full cluster crash #743

Open pcqz opened 3 years ago

pcqz commented 3 years ago

Feature Request

Is your feature request related to a problem? Please describe:

In some extreme scenarios, Kubernetes cluster and services may crash unexpected due to bugs or intended mis-operations. In that case, tidb cluster pods managed by K8s may not work normally, which lead to business failure. If we want to recover from failure, we need to locate and fix problem which may consume a lot of time.

Describe the feature you'd like:

TiUP support fetching topology metadata of tidb cluster from tidb-operator and regularly updating local repository. When K8s service is unavailable, TiUP can read cluster metadata from local to restart the affected tidb cluster.

Describe alternatives you've considered:

Deploy another tidb cluster separately as an active standy cluster in advance or use BR backup to restore data to a new cluster after a failure.

Teachability, Documentation, Adoption, Migration Strategy:

AstroProfundis commented 3 years ago

This could be very complicated.

First of all, tiup-cluster does not have any daemon (by design), so the only way to sync topology is by cronjob, and that is risky as the topology is not guaranteed to be up-to-date.

Second, our container images for tidb components are not fully featured VM environments, that's say, they don't have SSH server inside, so that if kubelet is broken and the pod is not working, we don't have any method to connect into that pod.

Besides, in k8s environment there are different StorageClass available for components, and the mapping from pvc to real files on disk is not naturally available outside k8s system.

And kubelet maintains complex network settings among host nodes, that's another thing we have no easy way to handle.

It's obviously too complicated for tiup to reinvent kubelet again so that it can manage pod or containers in pods directly by itself. I think this is not something tiup should bother.