opendata / Gov-Data-Hosting

Free, time-limited CKAN hosting for governments within the United States. [RETIRED]
MIT License
0 stars 1 forks source link

Automate EC2 backups to S3 #5

Open waldoj opened 9 years ago

waldoj commented 9 years ago
waldoj commented 9 years ago

We might be able to add redundancy by storing the data on S3 (per #6).

waldoj commented 9 years ago

Seems to me like a useful backup function would be running ckanapi full-site backups every 24 hours, and saving them on S3. I'm not sure how one would go about scripting that centrally, but perhaps that's a cron job that needs to be included in each Docker instance.

waldoj commented 9 years ago

I suppose I've got two goals here. The first is internal redundancy, so that's easy to get back up and running if something blows up on the server. The second is client redundancy, so that if I'm hit by a bus at the same time that the AWS hosting facility is hit by a meteor, clients have a method of getting their data.

mafintosh commented 9 years ago

AWS has a way to snapshot an entire instance into an AMI that can be stored on S3. This requires halting the instance for a few seconds though.

waldoj commented 9 years ago

I've been using ec2-automate-backup, which seems to work pretty well, but I suspect there are more moderns methods of accomplishing this. :)

waldoj commented 9 years ago

BTW, @mafintosh @maxogden and @karissa, I have no idea why this repo has all of y'all following it by default. GitHub permissions confuse me. Obviously, you're welcome to do so, but please don't feel any obligation—it's in no way intentional!

waldoj commented 9 years ago

Note that the site data is stored in /home/ubuntu/.datacats/multisite/sites, which is weird but OK.