zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.22k stars 968 forks source link

postgres-operator keeps removing service controller managed fields on services #2432

Open g00pix opened 11 months ago

g00pix commented 11 months ago

Hello,

postgres-operator keeps removing service controller managed fields on services or fields that are automatically added with default values by Kubernetes. For example, it keeps removing the following:

Some of these fields are very important for services of type LoadBalancer. In my use case, I use OpenStack to manage my load balancers and my service has a Local externalTrafficPolicy which requires the use of health monitors. The health monitor is based on the healthCheckNodePort field which keeps getting removed by the operator. The OpenStack cloud controller manager detects the removal of the field, which then removes the monitor_port in the load balancer making it unable to work. The field healthCheckNodePort is then automatically added back but the OpenStack cloud controller manager sometimes does not update the LB which is now stuck offline. This very situation is probably an issue on their side or on my deployment but I don't think the field should be removed at all from the beginning.

The following is an example of the operator detecting the changes and removing fields.

postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:46Z" level=debug msg="syncing master service" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"192.168.105.0/24\" \"192.168.99.0/24\" \"192.168.103.0/24\"]" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=info msg="master service postgres/cri-main-cluster is not in the desired state and needs to be updated" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-      protocol: TCP," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-      targetPort: 5432," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-      nodePort: 30299" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="+      targetPort: 5432" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  clusterIP: 172.21.55.56," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  clusterIPs: [" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-    172.21.55.56" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  ]," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  sessionAffinity: None," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  externalTrafficPolicy: Local," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  healthCheckNodePort: 31148," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  ipFamilies: [" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-    IPv4" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  ]," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  ipFamilyPolicy: SingleStack," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  allocateLoadBalancerNodePorts: true," cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="-  internalTrafficPolicy: Cluster" cluster-name=postgres/cri-main-cluster pkg=cluster
postgres-operator-698fcbd465-npntm postgres-operator time="2023-09-25T06:36:47Z" level=debug msg="+  externalTrafficPolicy: Local" cluster-name=postgres/cri-main-cluster pkg=cluster
g00pix commented 11 months ago

After reading through the code I realized I'm probably being misled by the diff in the logs but the operator does not actually delete any field since it's doing merge patches on service spec.