hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.09k stars 114 forks source link

No failover with a separate tablespace #844

Closed akqopensystems closed 2 years ago

akqopensystems commented 2 years ago

Hi

We are currently testing the pg_auto_failover tool for us. The installation went off without any problems. There are three nodes - 1x monitor and 2x dbs with pg_auto_failover 1.6.3 The nodes are ipv6 only. PostreSQL version is 12 and the OS is RedHat 8. After creating a tablespace under the path /srv/tblspc/app for example, a failover is not possible.

The content must always be deleted under the path (/srv/tblspc/app/PG_xxx) where the tablespace is located. Then the failover works.

Are we doing something wrong or is that perhaps a bug?

redbaron commented 2 years ago

Just to be more precise, it is setting up standby node from the former master which doesn't work. Failover itself succeeded , as in another standby is promoted to a new master.

DimCitus commented 2 years ago

Hi @akqopensystems and @redbaron ; thanks for a detailed issue! I think it's fair to say at this point that we didn't test pg_auto_failover with extra table spaces, and that we are lacking support for them. I suppose table spaces should be handled the same way we handle the main PGDATA one at pg_basebackup time, see https://github.com/citusdata/pg_auto_failover/blob/master/src/bin/pg_autoctl/pgctl.c#L1285.

Then, we're targeting a temporary directory for pg_basebackup and only after the backup is available locally do we erase and replace the previous PGDATA. This requires double the space of the data directory, and at this cost what we get is the ability to do forensics and repairs until we have a new copy of the data that's known good.

I suppose we might have to implement a tablespace-mapping for our pg_basebackup command, so that we would target temporary directories in the same way as we do for the main default PGDATA table space. And then add some calls to rename as in https://github.com/citusdata/pg_auto_failover/blob/master/src/bin/pg_autoctl/pgctl.c#L1297, one per table space.

Do you folks want to get started implementing a PR to fix table space in pg_auto_failover?

akqopensystems commented 2 years ago

Hi DimCitus

sorry, but we are not able to provide a PR. Nevertheless, a function to handle extra tablespace by failover, would be very interesting for us as we have various installations with extra tablespaces.

rheaton commented 2 years ago

@DimCitus I'm going to take a look at creating a PR for this, unless you've started.

DimCitus commented 2 years ago

Hey @rheaton that's awesome news, thanks a lot!