mediacloud / backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
http://www.mediacloud.org
GNU Affero General Public License v3.0
280 stars 87 forks source link

automate and monitor backups #37

Open hroberts opened 8 years ago

hroberts commented 8 years ago

We are running solr and postgres backups to faith.law.harvard.edu. I have just been manually running the backups once a week. We should set the backups to run via cron job and monitor that they have run and passed some sanity test for success.

The backups and scripts are kept in /space01/mediacloud_pgsql_backup. The postgres backup script is in backup-mcdb1-postgres.sh. The solr backup script is backup-solr.sh. The postgres script requires a ssh tunnel, which can be established by running proxy-mcdb1-postgres.sh. The tunnel usually dies between backup runs and needs to be restarted.

The postgres backup sometimes fails because of locking issues on intermittent tables. I have fixed these as they come up with by adding exclusions to the pg_dump call, but they still pop up occasionally.

A reasonable test for success for the postgres backup is just to make sure that the size of the backup looks about right.

For the solr rsyncs, we'll probably have to look at the script output to make sure the script finished successfully, though we could also look for recent dates in the resulting directories.

At some point, we should verify that we can restore the postgres backup. This is low priority because pg_dump is very reliable (unless it reports an error), but we should test on principle.

I have just assumed that these scripts are run by the hroberts, so it may take some tweaking to clean that up. We could ask for the berkman geeks to setup a mediacloud account on the machine to make sure everything is running under an account that allows shared access.

The solr backups only keep a single copy of the solr database, and that copy will be invalid for the length of the backup. We could fix this by using rsync support for incremental backups via hard links, but that takes some fiddling to make work well in my experience.

The postgres backup are made to a separate directory named for the date. I have just been deleting old backups so that we have three at any given time. Ideally we would have the current backup, last week's backup, and a backup from six weeks ago.

hroberts commented 8 years ago

pg_restore has a --list option that we could use to test for the presence of our core tables. I just ran it on the most recent backup, and it returned instantly.

pypt commented 8 years ago

Acknowledged.

pypt commented 8 years ago

Can you add me to mediacloud group on Faith (Berkman's LDAP probably) and add +g (664 / 775) on everything under /space01/mediacloud_pgsql_backup? I can't access some of those files.

hroberts commented 8 years ago

group membership and permissions should be good now.

pypt commented 8 years ago

Thanks, works now!

hroberts commented 7 years ago

can we reboot this? I still want to be making the postgres dumps even with the binary backups. ideally we will eventually switch to dumping from the zfs mirror, but wherever the dumps are coming from they should be automated. as discussed, we also need solr backups to be automated, since it is a days or weeks long process to regenerate the solr index from postgres.

rahulbot commented 6 years ago

@pypt - Can I get a status update on: 1) postgres backups 2) sold backups

Are we doing them? Are they working? If they are then we can close this old issue out.

rahulbot commented 5 years ago

Update on the next task for Postgres: "Instead of a daily backup of our PostgreSQL database, we want to try our hand at creating a (presumably) hot standby on mcdb2. Colby has tried to do it in his time, didn't quite work out, but it might be possible with a newer PostgreSQL version that we use today + the biggest tables being partitioned"