reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.31k stars 74 forks source link

References for PITR / Continuous Archiving implementation? #139

Open ccrvlh opened 1 year ago

ccrvlh commented 1 year ago

Hi.

I've had some issues with Zalando, and now I'm looking for a simpler operator. Kubegres seems to fit the bill, and my experience deploy a cluster was great. I have a custom image setup to run pg_dump and pg_restore scripts , CronJobs for the dump and an on-demand job for the restoration process. This is really simple, and works well, but with restrictions: won't work for larger databases, slow, very high RPO.

I've been looking at strategies to implement PITR and continuous backup. Zalando had this baked in using pg_basebackup and WAL-G (I think). Outside the k8s world, I've read a lot about PgBackrest, Barman, WAL-G and couple of other solutions. But those doesn't look all that simple to setup when the DB is running in containers (they might be, but I don't find much information on it except one or two repos). I know Timescale runs PgBackrest as a sidecar, Zalando runs a custom image with WAL-G/E + pg_basebackup, Percona also uses PgBackrest (not sure about the architecture). PGO Crunchy also backrest, Stackgres I think is custom solution, not sure.

I tried running a separate container for PgBackrest, so I changed the VolumeClaim policy for ReadWriteMany (so that Backrest could connect directly to the data directory), but I had quite a few issues all around the process and couldn't make it work (yet? will keep trying).

I understand Kubegres is not particularly going in this direction at the moment, but I wonder if this could be an option for the future. There has been a brief discussion about it here, but it stopped at pg_dump. Stackgres has an interesting approach with several CRDs. Although this looks complex at first having multiple CRD also allows for more flexibility. Zalando's approach tries to put everything into the cluster definition and/or configuration file, so things are not always trivial to grasp. (I'll follow up in a bit with potential implementations.

I imagine that this should be a common requirement for folks deploying PSQL to k8s, so even if this is not a plan for Kubegres in future, I imagine the pain still exists, so I was wondering if there were any examples, references or any other material really to implement this solution with Kubegres, or any experience people could share.

Thanks a lot!

alazycoder101 commented 1 year ago

I have the same kind of requests as well. PITR is very good, before using kubegres we are backing up with the following strategy:

Unfortunately the config file in kubegres is for both primary and replicas, we need WAL only for primary.

I understand kubegres is just keeping as simple as possible so that you can customize as you want.

Maybe the first step is to seperate the config file between primary and replicas?