Closed miguelhar closed 5 days ago
@miguelhar How big is your database, and how long is the default timeout on the platform you're using?
Thinking things through a bit, my initial thought that retrying the upgrade without increasing the timeout value is pretty likely to not result in a success. The upgrade scripting itself is mostly a case of:
pg_upgrade
(the official PostgreSQL cli tool for upgrading) can work with, thenpg_upgrade
cli utilityWhile the are some sanity checks done first (ie here), the actual upgrade piece is done using pg_upgrade
.
If pg_upgrade has been terminated part way through... I kind of doubt it'll be able to happily continue on and complete what wasn't done in a previous run.
Long story short, I reckon you'll need to restore your database files back to how they were before the upgrade, then try the upgrade again with a longer timeout threshold.
Sorry I don't have better news nor suggestions @miguelhar. :frowning:
@justinclift thank you for your response, the size varies but increasing the livenessProbe/failureThreshold
to 20
seems to do the trick
Awesome. I bet that took a bunch of time and effort to work through and get happening.
Those two terms (livenessProbe
, failureThreshold
) show up as Kubernetes related things. Are you managing Kubernetes yourself, or are you using the offering from one of the big vendors?
Asking because it might be useful for future people to directly mention the relevant vendor here, so others that hit the same issue can see whether it relates to them. :smile:
We are using Image pgautoupgrade/pgautoupgrade:14-dev
We are seeing an issue on larger DBs that the init container running this image takes longer than the default thresholds and gets terminated. Upon the restart the DB fails to start with:
It seems that after the upgrade is interrupted the upgrade is not retried leaving the DB in a broken state.
Aside from incrementing the liveness probe timeout/failure threshold for the init container running this image is there something that could be set so that the upgrade is retried?