zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.37k stars 981 forks source link

Deleting PostgreSQL cluster leaves orphaned master endpoints #1751

Open Starefossen opened 2 years ago

Starefossen commented 2 years ago

When deleting a PostgreSQL cluster the master endpoints are updated but not removed. This causes a cascading issue where you try to create the cluster again it will fail.

$ kubectl delete postgresql devops-keycloak-postgres
// some time later
$ kubectl get endpoints                                                                                  
NAME                              ENDPOINTS   AGE                                                                                                             devops-keycloak-postgres          <none>      6m36s                                                                                                           
devops-keycloak-postgres-config   <none>      6m36s                                                                                                           

When you try to create this database again you end up with the following error:

Events:                                                                                                                                                       
  Type     Reason  Age   From               Message                                                                                                           
  ----     ------  ----  ----               -------                                                                                                           
  Normal   Create  117s  postgres-operator  Started creation of new cluster resources                                                                         
  Warning  Create  117s  postgres-operator  could not create cluster: could not create master endpoint: could not create master endpoint: endpoints "devops-keycloak-postgres" already exists

Removing the orchaned endpoints and restarting the postgres-operator pod resolves the problem.

oliveratprimer commented 2 years ago

Also seeing this with v1.6.3

bergey commented 2 years ago

Deleting the cluster also leaves behind:

registry.opensource.zalan.do/acid/postgres-operator:v1.6.2

Happy to open new issues for one or both of these if consolidating with the endpoints issue is not appropriate.

FxKu commented 2 years ago

When you delete a healthy cluster, there should not be any resources left. Only, when a cluster is never fully synced - can be a new cluster never ending up in Running state or the operator was restarted and the existing cluster is already broken - it happens that there are leftovers.

We have discussions and PRs open to add ownerReferences or finalizers that might solve it. Unfortunately, we did not have the time to experiment enough to consider them as a save option we want. See this #941 for a good overview of the current state of discussions.