radondb / radondb-mysql-kubernetes

Open Source,High Availability Cluster,based on MySQL
Apache License 2.0
356 stars 84 forks source link

StatefulSet pod can only be started with number 0. If the node with number 0 fails, the entire cluster will become unavailable. #820

Closed 844700118 closed 1 year ago

844700118 commented 1 year ago

StatefulSet pod can only be started with number 0. If the node with number 0 fails, the entire cluster will become unavailable.!

radondb-mysql version: v2.3.0 Kubernetes version (use kubectl version): 1.19 percona/percona-server:5.7.34 radondb/xenon:v2.3.0 radondb/mysql-operator:v2.3.0 radondb/mysql-sidecar:v2.3.0

[root@localhost ~]#  kubectl get sts -n storage-cluster  demo-mysql -o yaml
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: ev-mysql
      namespace: storage-cluster
    spec:
      podManagementPolicy: OrderedReady
      replicas: 5

 [root@localhost ~]#  kubectl get pod -A -o wide|grep mysql
    storage-cluster ev-mysql-0  0/4  Pending 0 94m 100.113.172.1 master31 

    我的mysql集群有5个节点, 其中master31节点存在故障. 发现StatefulSet使用的是OrderedReady策略,
    如果编号0的Pod无法启动阻塞,那么编号1-4的Pod都无法启动,因为控制器会等待编号0的Pod变成Running和Ready后才会
    创建编号1的Pod. 请问 podManagementPolicy: OrderedReady 如果修改为 podManagementPolicy: Parallel.
    这样是否安全, 会不会导致选举出现问题. 有什么好的解决方案. 目的是集群高可用. 目前radondb集群是不满足的. 

    My mysql cluster has 5 nodes, among which the master31 node is faulty.
    I found that StatefulSet uses the OrderedReady strategy.
    If the Pod numbered 0 cannot start blocking, then the Pods numbered 1-4 cannot be started, 
    because the controller will wait for the Pod numbered 0 to become Running and Ready.
    Create Pod number 1. How to change podManagementPolicy: OrderedReady to podManagementPolicy: Parallel.
    Is this safe? Will it cause problems in the election? Is there any good solution? 
    The purpose is to make the cluster highly available. The current radondb cluster is not satisfied.
github-actions[bot] commented 1 year ago

Hi! thanks for your contribution! great first issue!

acekingke commented 1 year ago

You can try to label the node as taint node, so It will not schedule pod number 0 in it, It cannot use the Parallel policy, it must be very dangerous, causing the data inconsistency.