instaclustr / cassandra-operator

Kubernetes operator for Apache Cassandra
https://instaclustr.com
Apache License 2.0
240 stars 59 forks source link

CassandraRoleManager skipped default role setup: some nodes were not ready #397

Open kong62 opened 3 years ago

kong62 commented 3 years ago

I kept getting an error when I redeployed because sts cannot create pods in parallel I need podManagementPolicy: Parallel, but see github code commit:

@@ -242,7 +241,7 @@ private V1beta2StatefulSet generateStatefulSet(DataCenterKey dataCenterKey, V1Co
                )
                .spec(new V1beta2StatefulSetSpec()
                        .serviceName("cassandra")
                        .podManagementPolicy("Parallel")
                        //.podManagementPolicy("Parallel")
                        .replicas(dataCenter.getSpec().getReplicas().intValue())
                        .selector(new V1LabelSelector().putMatchLabelsItem("cassandra-datacenter", dataCenterKey.name))
                        .template(new V1PodTemplateSpec()
# kubectl  get pod                                  
NAME                                  READY   STATUS             RESTARTS   AGE
cassandra-cassandra-dc1-dc1-rack1-0   1/2     Running            0          6m12s
cassandra-operator-6f685694c5-l7m27   1/1     Running            0          4d7h

# kubectl  get pvc
NAME                                              STATUS   VOLUME                                      CAPACITY   ACCESS MODES   STORAGECLASS                             AGE
data-volume-cassandra-cassandra-dc1-dc1-rack1-0   Bound    disk-29d3cfdd-dc5a-457e-bae1-6b72dcc34c37   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h12m
data-volume-cassandra-cassandra-dc1-dc1-rack1-1   Bound    disk-9a8621f6-3f8b-428e-b69d-72cde007c7cf   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h6m
data-volume-cassandra-cassandra-dc1-dc1-rack1-2   Bound    disk-1971e0c4-fdf5-4adf-85fa-c1e9e53b7658   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h5m
data-volume-cassandra-cassandra-dc1-dc1-rack1-3   Bound    disk-5be7e523-a3cc-4b32-9149-6a3ab5e44ed2   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h3m
data-volume-cassandra-cassandra-dc1-dc1-rack1-4   Bound    disk-4a4d235b-871f-45ff-be57-c4ed7c9b4ad2   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h2m
data-volume-cassandra-cassandra-dc1-dc1-rack1-5   Bound    disk-b9c45b99-f169-413b-b8dc-65b97d205264   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   4h
data-volume-cassandra-cassandra-dc1-dc1-rack1-6   Bound    disk-c2bf3596-a986-4099-b746-316ddaf36c8f   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   72m
data-volume-cassandra-cassandra-dc1-dc1-rack1-7   Bound    disk-89fae7ec-9f5a-4b2f-9191-631f66ac71b8   2Ti        RWO            alicloud-disk-efficiency-cn-hangzhou-g   57m

# kubectl  logs -f  cassandra-cassandra-dc1-dc1-rack1-0 -c cassandra
INFO  [main] Server.java:159 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO  [main] CassandraDaemon.java:564 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
INFO  [main] CassandraDaemon.java:650 Startup complete
WARN  [OptionalTasks:1] CassandraRoleManager.java:377 CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] CassandraRoleManager.java:416 Setup task failed with error, rescheduling
WARN  [OptionalTasks:1] CassandraRoleManager.java:377 CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] CassandraRoleManager.java:416 Setup task failed with error, rescheduling
smiklosovic commented 3 years ago

Can you elaborate on why you need it to be in parallel?

kong62 commented 3 years ago

When the cluster has been in a crash after an avalanche( a lot of nodes crash), I deleted the statefulset and then re-created it, so that pod started one by one, pod0 will not be ready because it is looking for other members, but other members must start pod0 successfully before starting.

# kubectl  get pod                                  
NAME                                  READY   STATUS             RESTARTS   AGE
cassandra-cassandra-dc1-dc1-rack1-0   0/2     CrashLoopBackOff   5          4d7h
cassandra-cassandra-dc1-dc1-rack1-1   2/2     Running            7          4d7h
cassandra-cassandra-dc1-dc1-rack1-2   1/2     Running            4          4d7h
cassandra-cassandra-dc1-dc1-rack1-3   0/2     CrashLoopBackOff   11         4d7h
cassandra-cassandra-dc1-dc1-rack1-4   0/2     CrashLoopBackOff   4          4d7h
cassandra-cassandra-dc1-dc1-rack1-5   0/2     CrashLoopBackOff   8          4d7h

#  kubectl  delete -f example-dc.yaml
#  kubectl  apply -f example-dc.yaml