Closed towens closed 1 month ago
can you show some details about "volume conflicts"?
In our cases for NodeGroup upgrade, we often do it with the following steps:
Hi @csuzhangxc, thanks for the response. Should they happen again, I'll collect and share the errors related the volume conflicts. It sounds like you are describing this doc: Replace Nodes for a TiDB Cluster on Cloud Disks. Which we previously had not done and probably the reason for the pod errors. I'll close this issue.
Yes. The steps are just as Replace Nodes for a TiDB Cluster on Cloud Disks.
Bug Report
What version of Kubernetes are you using? v1.24 - v1.26
What version of TiDB Operator are you using? TiDB Operator Version: version.Info{GitVersion:"v1.5.3", GitCommit:"2c9e4dad0abaa4400afdef9ceff3084e71510ecb", GitTreeState:"clean", BuildDate:"2024-04-18T03:43:46Z", GoVersion:"go1.21.6", Compiler:"gc", Platform:"linux/arm64"}
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods? Proper storage classes were created and being used
What's the status of the TiDB cluster pods? Running as we have repeatedly set the tidb cluster to a zero state by deleting it, dropping the pvc's and pv's and recreating the cluster
What did you do? Upgrade k8s (EKS) and/or an EKS nodegroup
What did you expect to see? Pods running
What did you see instead? Pods pending, typical volume conflicts.
This was all solved years ago with topologySpreadConstraints. Unfortunately the tidb-operator doesn't implement the full topologySpreadConstraints spec.
tidb-operators users (at minimum) need:
Neither the basic or advanced examples survive upgrades. We are open to the idea something simple was missed. TiDB is the only offering we have issues with.
Does there need to be a PR for full the spec of topologySpreadConstraints? Is this obvious to folks working on this project but a docs PR is needed for those not living that context?
Thanks.