uskudnik / amazon-glacier-cmd-interface

Command line interface for Amazon Glacier
MIT License
374 stars 100 forks source link

SimpleDB expiry #83

Open wvmarle opened 12 years ago

wvmarle commented 12 years ago

From Amazon's docs:

Any data stored as part of the free tier program must be actively used. If a domain is not accessed for a period of 6 months, it will be subject to removal at the discretion of Amazon Web Services.

As with Glacier we're talking about cold storage (so I can very well imagine that people put data on it once, and then maybe a year later when they have another set of vacation photos), we should think about a local backup/restore option for this database.

uskudnik commented 12 years ago

Thats... awkward :/

Static file to S3? But queries will be killing us :/

wvmarle commented 12 years ago

I was more thinking of local backup copy as csv file or so (optionally uploaded to Glacier/S3/whatever of course), with a restore function to put it back into a new SimpleDB db when needed.

wvmarle commented 12 years ago

Can merge already; it's extra functionality. Docs are just incomplete.

uskudnik commented 12 years ago

I'm not a fan of local - anything local is based on presumption that your local drives/backup will exist to the degree of having a cache there. Since Glacier is cold backup I'm personally against any presumptions on local (preferably people would backup to more than one region, but thats not really our scope).

That said - maybe we should design our format for descriptions so that if there was more than 1 or 3 or 6 months since last write/read cache would be rebuilt (or anytime later). The only problem is that descriptions must be under 1024 characters so if we want to enable our users to enter custom descriptions we should probably do some text compression, something along the lines of:

import base64
import bz2
base64.base64encode(bz2.compress(our_json_data))

And afterwards when we get all the data out maybe we should pickle the data and put it into a file that is then uploaded to S3?

wvmarle commented 12 years ago

I'm not using S3 myself; and hope no need to start using more services.

For backup of the database a CSV file would suffice (we just have to write import and export routines). This csv can in turn be compressed and uploaded to Glacier; it shouldn't be too hard to put something useful in the description to help recovery. As soon as we have automatic downloads I can imagine we can quite easily write a routine to retrieve and download this backup from Glacier, and import it into the user's SimpleDB domain.

I do assume the SimpleDB domain is not removed, just the inactive data on it. That's at least how I read the manual.

wvmarle commented 12 years ago

To keep it flexible on one hand, and automated on the other, I consider implementing database backup as follows. This allows automatic backup/restore to Glacier (restore is going to take a long time: first inventory, then retrieve, then download, some 8 hours total), or backup/restore to/from a local file. Then user can backup this as they like: to a local (USD) drive, to another cloud service like S3, or whatever they like. Main routines will go into GlacierWrapper. Shouldn't be too hard as I can just call all the existing functions, including search to get a complete dump of the db.

def backupdb(args):
    """
    Create a copy of the current bookkeeping db, and put it on Glacier.
    """

    # If args.outfile: save to that file. --outfile <file_name>

    # if args.zip: compresse data before saving to file. --compress

    # if args.stdout: dump to stdout (json code, never compressed). --stdout

    # if no special requests:
    #   check for vault 'glacier-cmd_bookkeeping', create if necessary.

    #   compress data to zip file; upload this file to glacier with
    #   description glacier-cmd_bookkeeping_yyyy_mm_dd_hh_ss

def restoredb(args):
    """
    Restore database from glacier.
    """

    # If args.infile: use it. --infile <file_name>

    # If args.zip: infile is zipped, otherwise plain json. --zip
    # can we check for this? Try to unzip, see what happens?

    # If nothing given, restore from Glacier:
    #   Check whether we have a vault glacier-cmd_bookkeeping.
    #   Check inventory of vault glacier-cmd_bookkeeping;
    #   notify user of progress.
    #   Check which is latest backup archive; retrieve it; notify
    #   user of progress.
    #   When available, download it and return the data into the database.