uskudnik / amazon-glacier-cmd-interface

Command line interface for Amazon Glacier
MIT License
375 stars 103 forks source link

Can glacier-cmd upload folders iteratively ? #118

Open cmgui2 opened 11 years ago

cmgui2 commented 11 years ago

Dear Sir

Thank you for this very excellent freeware tool.

Can glacier-cmd upload folders iteratively ?

That is, will glacier-cmd upload /path/to/* upload everything inside /path/to/ folder including subfolders in it? For example, if we have many layers of subfolders with files in them, e.g, /path/to/subfolder1/subsubfoler1/, /path/to/subfolder2/, etc., will everything be uploaded? Also, when we download from Glacier, glacier-cmd will download the folder structure?

Thank you very much in anticipation.

cmgui2

uskudnik commented 11 years ago

Nope, it will only upload explicitly specified files (archives) and it will download only specified archives.

Cheers, Urban

On Feb 1, 2013, at 19:14 , cmgui2 notifications@github.com wrote:

Dear Sir

Thank you for this very excellent freeware tool.

Can glacier-cmd upload folders iteratively ?

That is, will glacier-cmd upload /path/to/* upload everything inside /path/to/ folder including subfolders in it? For example, if we have many layers of subfolders with files in them, e.g, /path/to/subfolder1/subsubfoler1/, /path/to/subfolder2/, etc., will everything be uploaded? Also, when we download from Glacier, glacier-cmd will download the folder structure?

Thank you very much in anticipation.

cmgui2

— Reply to this email directly or view it on GitHub.

raajheshkannaa commented 11 years ago

Hello,

I was looking for the same feature as well, is there a possibility we can get this feature integrated in our tool here, if not please direct in the right direction to get this going for me. Thank you for your awesome code and contribution :+1:

uskudnik commented 11 years ago

This will not be done, because amazon does not allow storing directories, only archives into vaults:

/some/not/really/very/long/directory/structure

vs.

/vault/archive

What you can (and should) do is zip/tar.gz your directory (basically, make an archive) and upload that to Glacier. This is not something that I believe should be done by a tool like this since there are other a lot more advanced tools that specialise in compression and that support other features that come with this as well (encryption, generating hashes, whether you want recursive or one-level structure, --exclude, --include, etc.)

Cheers, Urban

On Feb 2, 2013, at 07:41 , raajheshkannaa notifications@github.com wrote:

Hello,

I was looking for the same feature as well, is there a possibility we can get this feature integrated in our tool here, if not please direct in the right direction to get this going for me. Thank you for your awesome code and contribution

— Reply to this email directly or view it on GitHub.

wvmarle commented 11 years ago

There are two ways this could be handled by glacier-cmd.

The easy way, as it can be done right now: pack the directory in one file using tar, then pipe that into glacier-cmd. Something like tar <directory> | glacier-cmd <vault name, etc> --stdin. Use the --stdin switch to have it accept data over the command line.

The hard but fancy way: have glacier-cmd accept a directory name as input, then upload the files in the directory one by one. Use the bookkeeping db to keep track of the original file names including directory names. That would allow one to upload a complete directory but retrieve files one by one.

The second option does not exist at the moment, it is something that would need a lot of thought and some proper planning to pull it off.

uskudnik commented 11 years ago

Yes, but that would basically mean you would need to flatten out the structure and I'm not sure I'm comfortable with that.

It definitely requires a lot of thought.

On Feb 5, 2013, at 03:24 , wvmarle notifications@github.com wrote:

There are two ways this could be handled by glacier-cmd.

The easy way, as it can be done right now: pack the directory in one file using tar, then pipe that into glacier-cmd. Something like tar | glacier-cmd <vault name, etc> --stdin. Use the --stdin switch to have it accept data over the command line.

The hard but fancy way: have glacier-cmd accept a directory name as input, then upload the files in the directory one by one. Use the bookkeeping db to keep track of the original file names including directory names. That would allow one to upload a complete directory but retrieve files one by one.

The second option does not exist at the moment, it is something that would need a lot of thought and some proper planning to pull it off.

— Reply to this email directly or view it on GitHub.

wvmarle commented 11 years ago

Well flatten out... here are my first thoughts about this. When you upload a file to Glacier, it gets a unique archive id, instead of a file name. We already link those archive IDs to file names through the bookkeeping database. The storage in Glacier is by nature a flat file structure, the only structure one can add to it is by using different vaults. The easiest way would be to store the file name with complete path in the bookkeeping. The base of the path name is ignored when uploading. E.g. in your home dir you have a folder called images, containing heaps of photos in some kind of date-like dir structure. The / at the end of the file name tells glacier-cmd that we're trying to upload a complete directory. glacier-cmd upload ~/images/ or if we first do a cd images it becomes glacier-cmd upload ./ Containing files like: ~/images/2013/02/04/image001.jpg ~/images/2013/02/05/image001.jpg Then we have all those image files (all two of them) to upload. In the bookkeeping database we would write something like: <archiveID> 2013/02/04/image001.jpg <archiveID> 2013/02/05/image001.jpg And when retrieving those files, create the directory ./2013/02/04/ and ./2013/02/05 in the current working directory, and then store the files in the appropriate directory.