systemapic / wu

Systemapic web server and API
https://systemapic.com
2 stars 3 forks source link

PostGIS automatic backup #280

Open knutole opened 8 years ago

knutole commented 8 years ago

We're not currently backing up any data, and need to do so. Would be great with some feedback on how this is best accomplished.

knutole commented 8 years ago

see https://github.com/systemapic/docker-systemapic/issues/7

strk commented 8 years ago

The current idea is to tweak the "postgis" docker so that it automatically restores a dump if started with an empty cluster and finds such dumps in a known location. Such mechanism would also work for automating PostgreSQL upgrades (and PostGIS major upgrades).

knutole commented 8 years ago

There are two different ways (at least) to do backups, and we might should do both:

Replication

Dumps

Questions:
  1. How heavy is this process?
  2. Is is possible to do some kind of appending with pg_dump, only backup up what's new?
  3. What is the PostGIS dbs are > 1TB?
  4. Is it then feasible to create a copy say every ten minutes?

In a way, it seems it would be nice to do replication unto a postgis_backup container (or several), which would then serve both purposes (backup + future load-balancing). But there might be unknown caveats to this approach.

strk commented 8 years ago

The pg_dump based backup script (https://github.com/systemapic/wu/blob/242205ac7ba55b53d9a65264e6c331b0a76c47de/scripts/postgis/backup_databases.sh), run inside the systemapic/wu docker, backups the dev2 database cluster (from postgis_store_dev2) which is ~8.2GB in 7 minutes and 25 seconds, creating a 2.5G dump directory.

strk commented 8 years ago

I'm doing some research on replication based backups

knutole commented 8 years ago

Cool. It seems 1 minute per GB might be too much, as soon as we get into the hundreds of GB's (especially if there's no append-mode for pg_dump?).

strk commented 8 years ago

No append-mode, no. The only append-mode way is with WAL backup: http://www.postgresql.org/docs/current/static/continuous-archiving.html

I'm running some tests locally to better understand how it works.

What I understood so far is that you basically make a copy of the PGDATA directory and separately archive WAL files which can be done incrementally and automated (and even streamed remotely to an hot standby replication server).

strk commented 8 years ago

for the record: right now an automatic pg_dump based backup is running (backup Docker), and scripts for restoring from backups are available (https://github.com/systemapic/docker-systemapic/issues/7#issuecomment-172011486) -- maybe this ticket should be closed, and a new one filed for replication-based approach.

strk commented 8 years ago

and possibly also another one to make sure the "backup_store" gets saved in a safer place than a docker container :)

knutole commented 8 years ago

yes, will do this once i've run things on tx.