Open adamlamar opened 2 years ago
Hi, we are running into a similar situation.
In our case, a few nodes in our k8s cluster were restarted. When we attempted an upgrade for postgres, we started seeing this:
kubectl get statefulsets.apps -n postgres
NAME READY AGE
postgres-16-1 1/1 149d
postgres-16-2 1/1 149d
postgres-16-4 1/1 78d
kubectl describe kubegres -n postgres
Status:
Blocking Operation:
Stateful Set Operation:
Stateful Set Spec Update Operation:
Enforced Replicas: 3
Last Created Instance Index: 4
Previous Blocking Operation:
Operation Id: Replica DB count spec enforcement
Stateful Set Operation:
Instance Index: 4
Name: postgres-16-4
Stateful Set Spec Update Operation:
Step Id: Replica DB is deploying
Time Out Epoc In Seconds: 1725961288
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailoverCannotHappenAsNoReplicaDeployed 73s (x10 over 13h) Kubegres-controller A failover is required for a Primary Pod as it is not healthy. However, a failover cannot happen because there is not any Replica deployed.
There are 2 replica's postgres-16-2
and postgres-16-4
. However, kubegres thinks there is no replicas.
@alex-arica , any chance you can have a look at this please?
Thanks.
Starting with a health cluster with 3 replicas:
Delete the statefulsets:
The following error is seen:
The error makes sense because no replica is available. However, its unclear how to recover the cluster. Although the statefulsets were deleted, the PVCs still exist, and the database is intact.
Using
promotePod
is not possible because we cannot promote a pod that is not running.As a workaround, I was able to manually create a statefulset out of band, and then promote the pod. But this process was kind of error prone (editing index labels) and unclear. I'm not sure I did it right, but it seemed to work eventually.
Feature idea: maybe a
promotePVC
option that can start the statefulset from an existing PVC.