Open wvmarle opened 12 years ago
Thats... awkward :/
Static file to S3? But queries will be killing us :/
I was more thinking of local backup copy as csv file or so (optionally uploaded to Glacier/S3/whatever of course), with a restore function to put it back into a new SimpleDB db when needed.
Can merge already; it's extra functionality. Docs are just incomplete.
I'm not a fan of local - anything local is based on presumption that your local drives/backup will exist to the degree of having a cache there. Since Glacier is cold backup I'm personally against any presumptions on local (preferably people would backup to more than one region, but thats not really our scope).
That said - maybe we should design our format for descriptions so that if there was more than 1 or 3 or 6 months since last write/read cache would be rebuilt (or anytime later). The only problem is that descriptions must be under 1024 characters so if we want to enable our users to enter custom descriptions we should probably do some text compression, something along the lines of:
import base64
import bz2
base64.base64encode(bz2.compress(our_json_data))
And afterwards when we get all the data out maybe we should pickle the data and put it into a file that is then uploaded to S3?
I'm not using S3 myself; and hope no need to start using more services.
For backup of the database a CSV file would suffice (we just have to write import and export routines). This csv can in turn be compressed and uploaded to Glacier; it shouldn't be too hard to put something useful in the description to help recovery. As soon as we have automatic downloads I can imagine we can quite easily write a routine to retrieve and download this backup from Glacier, and import it into the user's SimpleDB domain.
I do assume the SimpleDB domain is not removed, just the inactive data on it. That's at least how I read the manual.
To keep it flexible on one hand, and automated on the other, I consider implementing database backup as follows. This allows automatic backup/restore to Glacier (restore is going to take a long time: first inventory, then retrieve, then download, some 8 hours total), or backup/restore to/from a local file. Then user can backup this as they like: to a local (USD) drive, to another cloud service like S3, or whatever they like. Main routines will go into GlacierWrapper. Shouldn't be too hard as I can just call all the existing functions, including search to get a complete dump of the db.
def backupdb(args):
"""
Create a copy of the current bookkeeping db, and put it on Glacier.
"""
# If args.outfile: save to that file. --outfile <file_name>
# if args.zip: compresse data before saving to file. --compress
# if args.stdout: dump to stdout (json code, never compressed). --stdout
# if no special requests:
# check for vault 'glacier-cmd_bookkeeping', create if necessary.
# compress data to zip file; upload this file to glacier with
# description glacier-cmd_bookkeeping_yyyy_mm_dd_hh_ss
def restoredb(args):
"""
Restore database from glacier.
"""
# If args.infile: use it. --infile <file_name>
# If args.zip: infile is zipped, otherwise plain json. --zip
# can we check for this? Try to unzip, see what happens?
# If nothing given, restore from Glacier:
# Check whether we have a vault glacier-cmd_bookkeeping.
# Check inventory of vault glacier-cmd_bookkeeping;
# notify user of progress.
# Check which is latest backup archive; retrieve it; notify
# user of progress.
# When available, download it and return the data into the database.
From Amazon's docs:
Any data stored as part of the free tier program must be actively used. If a domain is not accessed for a period of 6 months, it will be subject to removal at the discretion of Amazon Web Services.
As with Glacier we're talking about cold storage (so I can very well imagine that people put data on it once, and then maybe a year later when they have another set of vacation photos), we should think about a local backup/restore option for this database.