Closed mirandaconrado closed 4 years ago
@mirandaconrado The increased pod is triggered by auto failover.
The CRD replicas is 3 means users want 3 healthy TiKV stores. When TiKV fails for some time (default 35min), tidb-operator will try to add more TiKVs to recover the cluster to 3 healthy stores. Just like if a pod of a Deployment
fails, the Deployment
controller will try to add more pods to ensure the healthy pods meet the Deployment
spec replicas.
But failover from 3 to 45 is unreasonable, tidb-operator needs to limit the failover count. Since in this case, no matter how many pods are added, the cluster will never return back to normal. Users must recover the cluster manually.
@mirandaconrado Thank you very much for your feedback.
Now, you can remove the failover Pods by remove status.tikv.failureStores
field of the TidbCluster
object manually: https://pingcap.com/docs/v3.0/tidb-in-kubernetes/maintain/auto-failover/#failover-with-tikv
What are the logs of the failed TiKV pods? The failover may be caused by the too-small resources, this is the recommendations of the resources for development and test environments: https://pingcap.com/docs/v3.0/how-to/deploy/hardware-recommendations/#development-and-test-environments
And we will add a maxFailoverCount
limit for TiKV: https://github.com/pingcap/tidb-operator/issues/508
@tennix thanks for the view! It makes sense for the controller to have create different pods to bring things back to normality. I think the part that I was surprised was the fact that it didn't kill the pod and brought one over, but the documentation clearly explains that stateful set doesn't allow this functionality.
Thanks for the pointer, @weekface! I'll definitely read more on the maintenance documentation. And thanks for considering the failover limit. I checked the logs and didn't see anything funky. I'm going to separate our TiDB setup from the other explorations done by other folks to help have better/more self contained logs. My best understanding was that PD was overwhelmed due to very low amount of resources and made it crash OOM in a loop, so it was most likely a mistake on my end.
For my understanding, is there a reason why the operator doesn't automatically detect that the failure is gone? Also, if I delete the node from the status failures, will the controller automatically scale the stateful set? I want to make sure I'm not messing up when I do that manually.
For my understanding, is there a reason why the operator doesn't automatically detect that the failure is gone? Also, if I delete the node from the status failures, will the controller automatically scale the stateful set? I want to make sure I'm not messing up when I do that manually.
Recover to the original state requires to delete pod and pvc, it both have performance cost and dangerous. We'll implement this feature when we have more confidence about the actual status of the environment and cluster status.
When the cluster really return back to normal, delete the node from the status failures will trigger scale-in safely: offline the tikv store and then delete the pod.
BTW, for production setup, it's strongly suggested to deploy 3 PD pods.
Yep, totally agree that's pretty dangerous. Just wanted to confirm this is a security measure and that I'm not missing anything. Good to know the controller will handle things cleanly after I give it the green light.
And yeah, we'll most definitely run HA on production. Right now I'm mainly experimenting with TiDB for some use cases we have.
I'm inclined to close the ticket as #508 is a more appropriate description of the reasonable improvement for what originated this ticket. But I'm not sure the process your team uses: close and use the other one or keep it open for referral. I'm fine with either (or maybe that won't be necessary since there's already a PR!).
We prefer to close this issue and use #508 to track the problem.
Bug Report
What version of Kubernetes are you using? 1.11.10 on EKS
What version of TiDB Operator are you using? 1.0.1
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods? gp2
What's the status of the TiDB cluster pods? Running, although we did have the single PD in a crash loop due to OOM and we have increased its memory.
What did you do? We had a 1 PD, 3 TiKV and 1 TiDB pod setup, which was working fine. The PD pod was killed for some reason and it got into a crash loop backoff due to OOM. Then the number of pods of the TiKV configuration was increased drastically to 45 over the course of 8 hours.
What did you expect to see? I expected the controller to 1) not increase the number of TiKV pods while the CRD specified 3 when PD 2) restore to 3 replicas when the PD pod was back Running.
What did you see instead? The cluster remain at 45 replicas and I believe the controller might have some internal state that is corrupt. When using tkctl, I get
but the CRD correctly points the desired is 3
Restarting the controller pod did not help. Setting the replica set size to 3 directly made the controller bring it back to 8. Then setting it to 3 again made it stay at 3. However, the controller logs have things like
for each pod that is missing and they still appear as part of the CRD status.
Supplementary material CRD status: https://pastebin.com/haQwqa6Q Pod list: https://pastebin.com/1nGR98Lv