hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.07k stars 113 forks source link

Document step-by-step PostgreSQL upgrade #920

Open kevinelliott opened 1 year ago

kevinelliott commented 1 year ago

When using pg_auto_failover, it is unclear how to perform a major Postgres upgrade of the cluster.

Need clarity around: 1) Upgrade pg_auto_failover itself (on all nodes) + whether is necessary before/after PostgreSQL upgrade 2) Steps necessary to prep for the upgrade (stop/disable pg_auto_failover? put in maintenance mode? etc) 3) Steps to accomplish the upgrade (occur one node at a time to prevent outage? occur on all nodes at once? etc) 4) Finalization

kevinelliott commented 1 year ago

@DimCitus Hi Dimitri, any thoughts here?

kevinelliott commented 1 year ago

@DimCitus just checking in again here. Would be great to upgrade some clusters.

sobbybutter commented 1 year ago

This would be great.

kevinelliott commented 1 year ago

@DimCitus Another ping. We'd love some guidance here as we need to upgrade from a now very-aging Postgres cluster.

s4ke commented 1 year ago

This was for splitting a cluster, but might be helpful here as well:

https://github.com/citusdata/pg_auto_failover/discussions/660

kevinelliott commented 1 year ago

This is getting kind of ridiculous @DimCitus ... how can anyone use pg_auto_failover in any production environment without this. At this point we are definitely looking at alternatives.

DimCitus commented 1 year ago

This is getting kind of ridiculous @DimCitus ... how can anyone use pg_auto_failover in any production environment without this. At this point we are definitely looking at alternatives.

Please keep civil. This is an Open Source project, and it is being maintained by volunteers. I can't remember you signing a contract or paying for consulting and development, and I'm sure that if you did, you wouldn't publicly use such phrasing to comment on the current state of things.

Again it's Open Source Software. Either you're happy with it, or it is missing key features that you need. Then you need to either contribute to the software or use something else. Your choice. That's what Open Source is all about, freedom of the users.

kevinelliott commented 1 year ago

I am civil. No insults or irrationalities here. I am simply trying to understand why it took this long for a response.

I agree with your comments about open source. I contribute where I can, whenever I can, and also rely on others to contribute. That's the spirit, and I respect it greatly.

I would be happy to contribute the documentation in a pull request, were I to know the steps clearly.

I know you are likely busy, and there are tremendous numbers of things to do. I can respect that. But a response can go a long way too.

sobbybutter commented 1 year ago

I agree with @kevinelliott's concern--I think a detailed, step-by-step upgrade guide is much needed for running PGAF in a production environment, but I think it's important to be kind and respect that this is an open source project to which @DimCitus can contribute however he wishes.

DimCitus commented 1 year ago

Thanks for all the support. I agree that a Postgres Minor Upgrade and a Postgres Major Upgrade manual for pg_auto_failover would be much appreciated, but I'm not in a position to write that at the moment. Also, I had in mind to provide integrated support for those operations in the future, but again, I'm not sure when that future might be.

Contributions welcome.

If you're running pg_auto_failover in production, you need to be able to answer that question anyway, which is why I didn't raise priority to it. I'm pretty sure your production does not depend on my personal availability, and that you would (have to) know how to operate it.

kevinelliott commented 1 year ago

I'll say it again, I'm happy to provide contributions. It does rely on some input from you on how to do the operations properly within confines of the project, how the project interacts and impacts outside of the Postgres cluster itself, not just operating Postgres itself.

I'm very aware of how to operate Postgres, and upgrading a cluster outside of using this tooling is very clear already... but upgrading and interacting with the monitor and node CLIs in an appropriate way while maintaining production cluster availability without an outage is not clear.

This is what I've been asking about for three months. Some clarity is all that is really necessary here, the sharp responses are really not.

Some guidance (in the form of a meaningful reply with information) would be much more useful.

rdegez commented 1 year ago

@kevinelliott what about testing this yourself on a non-production cluster and sharing your discovery with us ? :-)

woutsanders commented 1 year ago

Hi everyone,

Long time user, but not at all active on Github. However, now I thought I'd share what I did this week as I recorded stuff while doing a major upgrade.

I've been using this tool for some years now and am very satisfied with ease of failover. However, there could be some improvements regarding upgrading clusters. I shared my notes and issues I encountered while performing a major upgrade in issue 506. It's a closed issue, but imho almost a duplicate of this issue (and I found that issue before this one). It's been a lot of trial and error, but I have updated the primary node! To keep things simple I explicitly removed all secondaries/standby nodes first.

In short: It's a bit of a hassle, but if we can figure out and set something in stone, it could become much easier. Minor version bumps are really easy (just apt upgrade on Debian is all that's needed). I always temporarily stop the service on the monitor while doing minor updates, so no unexpected failover occurs during service restarts. Major upgrades (like I did from 14 to 15) is more challenging, but something that can be overcome. The main thing is that the system identifier changes during major upgrades (at least with pg_upgrade with which you have to do a new initdb of the new major version according to postgres docs).

Looking forward to any further tips, tricks or things I could've done better, so we hopefully can work something out together and document it 🙂

benbro commented 1 year ago

Mentioned here: https://news.ycombinator.com/item?id=36328416