reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Blocked on StatefulSet already exists #123

Open zpear opened 2 years ago

zpear commented 2 years ago

Hi, was hoping to get some clarity on how I can remediate the following. It appears something is out of sync w/ my kubegres operator where the custom resource status' previous blocking operation is not removed / fulfilled when the relevant statefulset becomes ready.

Initial state:

$ kubectl get sts | grep postgres
postgres-9                        1/1     144d

Custom resource status

  ...
  replicas: 3
  ...
status:
  blockingOperation:
    statefulSetOperation: {}
    statefulSetSpecUpdateOperation: {}
  enforcedReplicas: 8
  lastCreatedInstanceIndex: 8
  previousBlockingOperation:
    operationId: Replica DB count spec enforcement
    statefulSetOperation:
      instanceIndex: 9
      name: postgres-9
    statefulSetSpecUpdateOperation: {}
    stepId: Replica DB is deploying
    timeOutEpocInSeconds: 1657204454

Logs

2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Active Blocking-Operation: None
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Previous Blocking-Operation     {"OperationId": "Replica DB count spec enforcement", "StepId": "Replica DB is deploying", "HasTimedOut": false, "StatefulSetInstanceIndex": 9}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Database StorageClass states.   {"IsDeployed": true, "name": "rubix-aws-provisioner-v4"}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Base Config states      {"IsDeployed": true, "name": "base-kubegres-config"}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    All StatefulSets deployment states:     {"Spec expected to deploy": 3, "Nbre Deployed": 1}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Primary states:         {"IsDeployed": true, "Name": "postgres-9", "IsReady": true, "Pod Name": "postgres-9-0", "Pod IsDeployed": true, "Pod IsReady": true, "Pod IsStuck": false}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Primary Service states:         {"IsDeployed": true, "name": "postgres"}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    Replica Service states:         {"IsDeployed": true, "name": "postgres-replica"}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    BackUp states.  {"IsCronJobDeployed": false, "IsPvcDeployed": false, "CronJobLastScheduleTime": ""}
2022-07-06T20:06:04.673Z        INFO    controllers.Kubegres    We are going to deploy 2 Replica statefulSet(s)
2022-07-06T20:06:04.678Z        INFO    controllers.Kubegres    Deploying Replica statefulSet 'postgres-9'
2022-07-06T20:06:04.696Z        ERROR   controllers.Kubegres    Unable to deploy Replica StatefulSet.   {"Replica name": "postgres-9", "error": "statefulsets.apps \"postgres-9\" already exists"}

Removing status as a subresource from the CRD so I could manually update it (to remove the previous operation and increment the last created index) resulted in the next replica statefulset getting created. However, now the operator seems to just be stuck on trying to recreate that statefulset:

postgres-10-0                                  1/1     Running   0          5m32s
postgres-9-0                                   1/1     Running   0          12h
status:
  blockingOperation:
    statefulSetOperation: {}
    statefulSetSpecUpdateOperation: {}
  enforcedReplicas: 9
  lastCreatedInstanceIndex: 9
  previousBlockingOperation:
    operationId: Replica DB count spec enforcement
    statefulSetOperation:
      instanceIndex: 10
      name: postgres-10
    statefulSetSpecUpdateOperation: {}
    stepId: Replica DB is deploying
    timeOutEpocInSeconds: 1657208439

2022-07-07T15:36:18.935Z        INFO    controllers.Kubegres    Updating Kubegres' status:      {"Field": "PreviousBlockingOperation", "New value": {"operationId":"Replica DB count spec enforcement","stepId":"Replica DB is deploying","timeOutEpocInSeconds":1657208478,"statefulSetOperation":{"instanceIndex":10,"name":"postgres-10"},"statefulSetSpecUpdateOperation":{}}}
2022-07-07T15:36:18.935Z        DEBUG   controller-runtime.manager.events       Warning {"object": {"kind":"Kubegres","namespace":"data-oregano","name":"postgres","uid":"171e2976-6510-4cac-8ba6-79c17d03afe8","apiVersion":"kubegres.reactive-tech.io/v1","resourceVersion":"2730195589"}, "reason": "ReplicaStatefulSetDeploymentErr", "message": "Unable to deploy Replica StatefulSet. 'Replica name': postgres-10 - statefulsets.apps \"postgres-10\" already exists"}

Any help would be appreciated, thanks!