systemapic / docker-systemapic

Old fork of Mapic
https://github.com/mapic/mapic
GNU Affero General Public License v3.0
0 stars 4 forks source link

Backup of all storage containers (or data) #7

Open knutole opened 8 years ago

knutole commented 8 years ago

We need to backup all critical data at all times.

Critical data

  1. postgis_store_dev (contains all geo-data) (see https://github.com/systemapic/wu/issues/280)
  2. mongo_store_dev (contains models for portal: projects, users, etc.)
  3. redis_store_dev (contains layers for tileserver) 4 redis_stats_store_dev (contains stats)
  4. dev_store_dev_common (contains files, eg. rendered tiles, etc.)

Question is how to do this best:

strk commented 8 years ago

As of commit 06ce5614054a8d5398cfc71d3f8fa5f3c6a0705d the "postgis" docker image includes scripts to perform a soft upgrade on all existing databases and to create and restore databases from a set of dumps.

Details can be found in a README there: https://github.com/systemapic/docker-systemapic/blob/master/build/postgis/README.md

The restore-from-dump script is also automatically run by the default entrypoint script (start.sh) IFF a specific env variable is set, pointing to a backup directory. This is used by the do_restore.sh script documented in the backup/postgis docker: https://github.com/systemapic/docker-systemapic/tree/master/build/backup/postgis

So the "hard upgrade" (restore from dumps) support can be used to upgrade a cluster to a later PostgreSQL version and possibly also at the same time to a later PostGIS version in it. Practically, it can be used to restore dumps into a new postgis docker with any combination of versions.

The current "postgis" Dockerfile accepts a build argument to specify the PostgreSQL version. It could be updated to add more build arguments, eventually.

Right now the last dump performed by the backup/postgis container was restored into two new stores on dev2.systemapic.com (mx): postgresql93_store_dev2 (to be used with systemapic/postgis:93-21) and postgresql94_store_dev2 (to be used with systemapic/postgis:94-21). The current postgis service there is played by systemapic/postgis:94-21 using postgresql94_store_dev2 (but change in docker-compoose.yml was not committed yet).

knutole commented 8 years ago

I have a suggestion. I'm starting to think it's better if we make an independent restore_postgis_backup image that will do the following:

So whenever we need to restore, we simply run the restore container, and get a freshly restored backup that we can connect in docker-compose afterwards.

The reason for this suggestion, is that it gets messy in docker-compose with the ENV vars. I mean, we have to start the whole compose once with ENV SYSTEMAPIC_RESTORE_POSTGIS_FROM=pgbk_test2 - but then what? We have to restart whole compose again to remove the ENV. With a separate process for restoring, we can restart compose once (or soon with new upgrades to Docker, we can probably switch stores without restarting at all).

Could be easily put in a script, with to/from args. I mean, we almost have this working now, simply adding the ENV to systemapic/postgis:latest and that will (almost) work. For example:

#!/bin/bash

# Usage: restore_to_fresh.sh store_postgis_backup store_postgis_fresh
BACKUP_STORE=$1
FRESH_STORE=$2

echo "Restoring $BACKUP_STORE into $FRESH_STORE"
docker run -it --volumes-from $BACKUP_STORE --volumes-from $FRESH_STORE -e SYSTEMAPIC_RESTORE_POSTGIS_FROM=$BACKUPSTORE systemapic/postgis:backup

echo "Done! Connect restored volume $FRESH_STORE in docker-compose."

Then, in theory, the store_postgis_fresh can be connected in docker-compose and should be identical to pre-crash backup.

What do you think? Will it work?

knutole commented 8 years ago

@strk Also, would it be possible to, instead of having a backup container, to simply connect two volumes (store_postgis, store_backup) to postgis container and keep data in one and backup in another?

(Btw, would it be possible to simply rsync between the two /var/lib/postgresql/9.4/main/ folders in each container? Or is a proper dump preferred?)

I know you must be laughing (or crying) now, remembering this was your initial idea, to not have a backup image! But if this is possible - without any catches I'm not aware about - what do you think about it?

I know this is a bit last-minute, I mean, we want to move on. At the same time, most of the heavy lifting is done and just a few scripts here and there should do the trick, and it will simplify our setup. It's compatible with the suggestion above as far as I can see.

Do you see any problems with this approach?

strk commented 8 years ago

First thing: we already have a script that does what you want to do here:

when we need to restore, we simply run this image once. this will restore backup from store_postgis_backup into store_postgis_fresh

The above is what the current do_restore.sh script does: https://github.com/systemapic/docker-systemapic/blob/master/build/backup/postgis/restore/do_restore.sh

To do what you mention above, you'd call it like this (from the host system):

 ./do_restore.sh store_postgis_backup /backup/postgis/postgis-backup-last systemapic/postgis:94-21

Is that good enough ? Note it doesn't take a separate image to do that, but rather you specify the name of an existing image to find the pg_dump command in. This lets you restore into a any new PostgreSQL data directory (for example to upgrade from 9.3 to 9.5).

Second, for backup:

knutole commented 8 years ago

Still TODO