Complete bookkeeping support

offlinehacker commented 11 years ago

@uskudnik already implemented support for saving file metadata to SimpleDB while uploading archive and removing entry while deleting it.

I think these options should still additionally be implemented:

Update/Sync inventory with SimpleDB(after invertory is retrived)
Option to save metadata to archive description. This enables us to recover all inconsitencies between SimpleDB and glacier when inventory/file is retrived. This should be made optional, because some people would not want to have json in archive description.

uskudnik commented 11 years ago

Will do. I get a new laptop tomorrow and hope to be operational by weekend at the latest.

offlinehacker commented 11 years ago

Nice, i'm sorry for your laptop. Mine is working perfectly fine for almost 4 years, and failure would cause me quite some problems too(my backups are pretty much monthly based).

uskudnik commented 11 years ago

Mine was working for 5 years and would probably still work for quite some time if I wouldn't spill water over it :D

On Tue, Sep 11, 2012 at 10:38 PM, Jaka Hudoklin notifications@github.comwrote:

Nice, i'm sorry for your laptop. Mine is working perfectly fine for almost 4 years, and failure would cause me quite some problems too(my backups are pretty much monthly based).

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/26#issuecomment-8471864.

gburca commented 11 years ago

Is the filename the proper key to use in SimpleDB? From the looks of it, when a user uploads MyDataFile.txt from two different locations, the first entry is wiped out by the second upload. Why not use the ArchiveId as the key. That should be unique even if the user uploads /path1/MyDataFile.txt and /path2/MyDataFile.txt doing:

cd /path1 && glacier upload test MyDataFile.txt
cd /path2 && glacier upload test MyDataFile.txt

uskudnik commented 11 years ago

Bummer, my fuckup. Yeah, that just overrides it. Will be updating ASAP (by the weekend at the latest).

On Wed, Sep 12, 2012 at 6:31 AM, Gabriel Burca notifications@github.comwrote:

Is the filename the proper key to use in SimpleDB? From the looks of it, when a user uploads MyDataFile.txt from two different locations, the first entry is wiped out by the second upload. Why not use the ArchiveId as the key. That should be unique even if the user uploads /path1/MyDataFile.txt and /path2/MyDataFile.txt doing:

cd /path1 && glacier upload test MyDataFile.txt cd /path2 && glacier upload test MyDataFile.txt

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/26#issuecomment-8481835.

offlinehacker commented 11 years ago

I recommend that we don't support multiple files with the same name in the same vault. There's practical reason for that if we are going to implement fuse. We should return error.

So database should have following relation: vault->filename->metadata. On Sep 12, 2012 7:02 AM, "Urban Škudnik" notifications@github.com wrote:

Bummer, my fuckup. Yeah, that just overrides it. Will be updating ASAP (by the weekend at the latest).

On Wed, Sep 12, 2012 at 6:31 AM, Gabriel Burca notifications@github.comwrote:

Is the filename the proper key to use in SimpleDB? From the looks of it, when a user uploads MyDataFile.txt from two different locations, the first entry is wiped out by the second upload. Why not use the ArchiveId as the key. That should be unique even if the user uploads /path1/MyDataFile.txt and /path2/MyDataFile.txt doing:

cd /path1 && glacier upload test MyDataFile.txt cd /path2 && glacier upload test MyDataFile.txt

— Reply to this email directly or view it on GitHub< https://github.com/uskudnik/amazon-glacier-cmd-interface/issues/26#issuecomment-8481835>.

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/26#issuecomment-8482238.

gburca commented 11 years ago

Since AWS Glacier itself doesn't have that "limitation", how will this tool work with vaults created by other tools? It can't just give up and return an error.

I agree that unique file names make sense in the context of fuse, but for vaults not meant to be used with fuse I don't think the tool should fail.

gburca commented 11 years ago

Actually, even for fuse you might want to map the same file to more than one archive in order to represent the state of that file at different points in time.

offlinehacker commented 11 years ago

I agree, states represented at different points of time would be great. I support!

On Wed, Sep 12, 2012 at 11:16 PM, Gabriel Burca notifications@github.comwrote:

Actually, even for fuse you might want to map the same file to more than one archive in order to represent the state of that file at different points in time.

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/26#issuecomment-8510304.

offlinehacker commented 11 years ago

So ArchiveId might be a good key for SimpleDB. We can get all archrives with speciffc filename by preforming a query. User should pick than which archive with speciffc filename to retrive. Also we have to record a time whenever a new archive is uploaded of course.

uskudnik commented 11 years ago

We are already saving when upload was uploaded (well, technically, when upload started) - see https://github.com/uskudnik/amazon-glacier-cmd-interface/blob/master/glacier/glacier.py#L221. So the only change that has to be done is changing the format/value of the key which we store (and which should be ArchiveID).

I'll be fixing this and adding support for updating cache with regard to inventory retrieval later in the day.

wvmarle commented 11 years ago

How is the status of bookkeeping now? It's part of the code I haven't looked at much yet (first have to figure out more on SimpleDB to understand really what's going on there).

I think what we need to have is this:

when uploading an archive, store archive details (file name (with or without path?), description, vault, size, time, uploadid, anything else?) in the database (uploadid for later automatic resumption of the upload).
when upload finishes, replace uploadid by archiveid and add the hash.
when doing an rmarchive, remove entry from database.
when downloading an inventory, check inventory of that vault against the bookkeeping db:
- add missing entries
- remove items not in the inventory
when removing a vault, remove any entries from the bookkeeping db that refer to that vault.
when doing a search: filter out incomplete uploads (those without archiveid; may have to add a boolean 'completed' column to the db).

uskudnik commented 11 years ago

I have to take another look at that - I think a bit of code got lost during various changes (or lies in one of my (local) branches), but either way still needs to be finished before I call it a stable.

ATM I'm looking at SNS and I've implemented sending notifications, now I have to implement something to run download archive if it's done.

wvmarle commented 11 years ago

Just realised it misses the "archive size" in the database. Very useful to have this. Adding it as I'm working on the upload function now.

Implemented download resumption; waiting for Amazon to retrieve an archive for me to test it on. Expect to be able to test by tomorrow; when it works will send the patch.

Working on the upload resumption part now; will require some extra SimpleDB calls. Hope to finish that tonight. You'll get the patch when done. At least that can be tested right away :-)

wvmarle commented 11 years ago

Using file name as item key is not a good idea (can't guarantee that's unique - especially when piping in data over stdin without giving a name, in which case we don't even have a file name!); mentioned in the comments already.

As I'm at it already I'll update the db calls to use ArchiveId (and temporary items to use UploadId - need that to easily resume uploads), and write a small conversion routine for old to new style db which reads the db contents, writes it back with the new keys and deletes the old items.

wvmarle commented 11 years ago

Done; see pull request. We can close this issue.

uskudnik / amazon-glacier-cmd-interface

Complete bookkeeping support #26