Open martin31821 opened 4 months ago
FWIW we currently work around this by setting enable_lazy_spilo_upgrade=true
, thus not triggering a Pod recreate, see: https://github.com/zalando/postgres-operator/blob/5357062857f3b724546a80849c2257ae7b804c14/pkg/cluster/sync.go#L408-L432
Can you not specify the mirrored image in the global configuration? Not sure if I'm getting it. For delaying pod replacement on image differences you've already found the right option: enable_lazy_spilo_upgrade
Can you not specify the mirrored image in the global configuration? Not sure if I'm getting it. For delaying pod replacement on image differences you've already found the right option: enable_lazy_spilo_upgrade
π @FxKu, to explain our workflow:
We use https://estahn.github.io/k8s-image-swapper/v1.5/index.html which will automatically mirror new container images of upcoming pods into our own registry using a well-known naming scheme. The next time the same container image is used in an upcoming pod, image-swapper will notice that the image already exists in our registry and transparently replaces the image using a mutating webhook. Thus, we ensure images are always available independently of various upstream registries (in the past we had a case of downtime due to an unfortunate combination of unavailable upstream registry and newly spawned autoscaling instances that did not have the required image cached).
Using the webhook rather than hardcoding the mirror image into the the helm values has the advantage that it solves the availability issue at scale throughout our entire cluster without any extra steps during workload setup.
As an added bonus, since we do IasC we can use renovate bot to easily manage dependencies without having to jump through any hoops due to changed image specifications.
For delaying pod replacement on image differences you've already found the right option: enable_lazy_spilo_upgrade
Jep, this works for us but still isn't ideal since we now have to manually roll-out updates. But thinking about it, i'm actually not sure it is possible to have both using the operator π (but not sure)
Please, answer some short questions which should help us to understand your problem / question better?
We are experiencing the same issue as #1397, #2453 and #1955 and I'd like to propose a fix for it. For us it is company policy to have every docker image we use mirrored in our own private registry, which we do by running a k8s mutating webhook that pulls the docker images, pushes it to our own registry and then swaps out the image reference on pod creation.
Therefore our pods always have a different image set than the StatefulSet, so postgres-operator will kill and recreate all of our clusters every sync interval.
I'd like to propose a new configuration option
ignoreImageDifference
, which by default should be false to keep the current behavior. If it's set to true, differences in the actual image in the pod vs in the statefulset will be ignored by the sync processor.