zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.35k stars 979 forks source link

StatefulSet-based may loose flexibility to handle PG specific logic when scale-in #776

Closed jacobzhang0110 closed 10 months ago

jacobzhang0110 commented 4 years ago

PG cluster is actually deployed by a statefulset, for scale-in (or upgrade) it follows statefulset's character to handle each pod that starts from the largest ordinal to the smallest. This may not be the best choice all the time, e.g. In a 4-pods cluster, pod4 is master now, the scale-in will try to terminate pod4 but the best choice will be terminating some non-master pod.

FxKu commented 4 years ago

Relying on Kubernetes resources here. I think scaling out/in database pods is nothing you do all too regularly (or let be done by a HPA) in contrast to i.e. application pods. For your described scenario one should do a switchover (planned failover) in advance, if scaling-in is planned. But even when the master pod is deleted failover should only take a few seconds.

jacobzhang0110 commented 4 years ago

Thanks for answering. Do you mean before scale-in, shall do a switch-over in advance to make pod-0 become master and pod-1 as replice-standby, so that scale-in will first delete pod4 and pod3 which has less impact towards the whole system? What about rolling upgrade, I mean from PG cluster point of view (not from K8S Statefulset pov), what's the proper order for upgrade (e.g. starting from master or from replica) or whatever order is fine?

FxKu commented 4 years ago

Proper order is starting to update the replica instance, failover the master and update the older master instance. Thinking about K8s rolling upgrade again, the StatefulSet will gracefully terminate the Pods, so Patroni will get the signals to do a proper failover which isn't different from a manual switchover. Automation is one of the the main goals for Patroni/Operator.