zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.31k stars 977 forks source link

Recreation of DB clusters due changing nodeAffinity term order #1996

Open ljcesca opened 2 years ago

ljcesca commented 2 years ago

We experienced an issue similar to #924 due to changes in ordering of nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions across syncs of clusters.

Our operator configuration was setting two node_readiness_labels via:

node_readiness_label:
  kubernetes.io/arch: amd64
  postgres-cluster: "1"

And an additional label via the Cluster spec:

nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - key: postgres-plan-small
        operator: In
        values:
        - "1"

We've worked around this for now by removing the node_readiness_label configuration, but would like to be able to use this again the future.

We were able to capture the StatefulSet before and after and confirmed that the order of nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions was changing. Which caused the cluster to be re-synced due to Cluster.compareStatefulSetWith.

I'm happy to work on a PR that fixes this if you agree that changes to Cluster.compareStatefulSetWith is the appropriate approach. Thanks!

FxKu commented 2 years ago

Ok, so it seems we need a method that compares the matchExpression of an affinity ignoring the order. I'd welcome a PR on that. Please also add unit tests, too. Thanks :)