yugabyte / yugabyte-operator

Kubernetes Operator for YugabyteDB (legacy)
65 stars 29 forks source link

Add basic sanity checks for configuration #23

Open rtsisyk opened 3 years ago

rtsisyk commented 3 years ago

I made a mistake and set the number of replicas to "2" instead of "3":

@@ -1,21 +1,21 @@
 apiVersion: yugabyte.com/v1alpha1
 kind: YBCluster
 metadata:
   name: demo
   namespace: yb-operator
 spec:
   replicationFactor: 3
   enableLoadBalancer: true
   master:
     replicas: 3
     storage:
       storageClass: linstor-hdd
       size: 10Gi
   tserver:
-    replicas: 2
+    replicas: 3
     enableLoadBalancer: true
     tserverUIPort: 9000
     storage:
       count: 1
       storageClass: linstor-hdd
       size: 10Gi

YugaByte Operator successfully started the cluster without complaining, but nothing worked:

./ysqlsh 
+ kubectl --namespace yb-operator exec -it yb-tserver-0 -- sh -c 'cd /home/yugabyte && ysqlsh -h yb-tserver-0 --echo-queries'
ysqlsh (11.2-YB-2.3.2.0-b0)
Type "help" for help.

yugabyte=# SELECT version();
SELECT version();
                                                  version                                                   
------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2-YB-2.3.2.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
(1 row)

yugabyte=# CREATE DATABASE demo;
CREATE DATABASE demo; <!-- Stuck here

[STUCK HERE FOR 2-3 MINUTES]

ERROR:  Timed out: Timed out waiting for Table Creation: . Errors from tablet servers: [Timed out (yb/client/client-internal.cc:147): Timed out waiting for Table Creation]
W1212 11:53:51.209652    41 heartbeater.cc:528] P 0371e580e58e4b089d414dca6729be8c: Failed to heartbeat to yb-master-2.yb-masters.yb-operator.svc.cluster.local:7100: Service unavailable (yb/tserver/heartbeater.cc:409): Master is no longer the leader: code: NOT_THE_LEADER status { code: SERVICE_UNAVAILABLE message: "Catalog manager is not initialized. State: 1" source_file: "../../src/yb/master/scoped_leader_shared_lock.cc" source_line: 59 errors: "\000" } tries=9, num=3, masters=0x000000000125df90 -> [[yb-master-0.yb-masters.yb-operator.svc.cluster.local:7100], [yb-master-1.yb-masters.yb-operator.svc.cluster.local:7100], [yb-master-2.yb-masters.yb-operator.svc.cluster.local:7100]], code=Service unavailable

Yes, I understand that I need at least three replicas to get consensus. Unfortunately, there was a typo in my config and I spent some time trying to figure it out. Could you please add some fool-proof sanity checks to the operator?