reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Replicas not enforced after drain and uncordon #96

Open joes opened 2 years ago

joes commented 2 years ago

I've set spec.failover.isDisabled=true

I had 3 replica pods:

I needed to do some maintenance on the server running node-3.

Because of this I promoted postgres-db-2-0 to primary using spec.failover.promotePod=postgres-db-2-0.

This lead to postgres-db-1-0 being destroyed and postgres-db-2-0 got promoted. After a while a new replicaset was created with a new pod: postgres-db-4-0. At this point these pods/replicasets were running:

I then drained node-3 which evicted the pod postgres-db-4-0 (as expected).

After maintenance was finished I uncordoned node-3. However, after uncordoning postgres-db-4-0 was not started up again. Nor was a new replicaset created to replace postgres-db-4-0. At this point I have these two pods/replicasets running:

Should not Kubegres start another pod/replicaset after I uncordoned node-3? Or are my expectations not correct?

joes commented 2 years ago

My kubegres-controller-manager logs provides an answer:

2022-02-15T15:10:40.380Z INFO controllers.Kubegres We need to deploy additional Replica(s) because the number of Replicas deployed is less than the number of required Replicas in the Spec. However, a Replica failover cannot happen because the automatic failover feature is disabled in the YAML. To re-enable automatic failover, either set the field 'failover.isDisabled' to false or remove that field from the YAML.

So, promotePod works if failover is disabled, but if failover is disabled the cluster will not recover after uncordoning.

Is it possible to have kubegres remember how many replicas it was provisioned with and try to get back to the state it was deployed in even if failover is disabled?

joes commented 2 years ago

Since I was initially able to deploy 3 replicas even with spec.failover.isDisabled=true I was thinking that I could get back to that state again even after the pod was evicted.

As a workaround I set replicas to 2 and then back to 3 again in an attempt to "fool" kubegres into redeploying the third replica (as it did on the initial deployment). This did not work and I still got the same message in the logs regarding setting the field 'failover.isDisabled' to false.

It would seem that it is not possible to adjust replicas after the initial deployment (or at least not if a kubegres pod was previously evicted). Does that count as a bug?

joes commented 2 years ago

I also tried to scale replicas to 1 and then back to 3 again.

When I scaled to 1 replica:

When I scaled back to 3 replicas:

However, a third replica was not created.

It seems that Kubegres views the third replica to have failed uncontrollably due to it being evicted and refuses to bring it back up again due to this.

I suppose I will have to temporarily set spec.failover.isDisabled=false to make everything go back up again.

Is there any way to temporarily "stop" a Kubegres replica/statefulset such that it is preserved in a "healthy" state as far as Kubegres is concerned? Or some mechanism to mark it as being healthy again?

joes commented 2 years ago

I also tried to scale replicas to 1 and then to 2.

When I scaled to 1 replica:

When I scaled to 2 replicas nothing happened. I only have one primary and no additional replicas. Why is node-2 now deemed unhealthy and unschedulable simply by scaling replicas?