vsespb / mt-aws-glacier

Perl Multithreaded Multipart sync to Amazon Glacier
http://mt-aws.com/
GNU General Public License v3.0
536 stars 57 forks source link

Files are saved under a different name #26

Closed koala73 closed 11 years ago

koala73 commented 11 years ago

When I upload files using the latest version of mt-aws-glacier [0.87 beta], the file names uploaded are pretty much changed to something else like "File Name: mt2 eyJmaWxlbmFtZSI6InVwZGF0ZSA0L3VwZGF0ZSAyIGZvcm0gSUVHL0Z1bGwvSW5HcmlkX0RhbnNUZXNZZXV4...."

Is that normal? Can't I get them back in the original filename??

Thanks

vsespb commented 11 years ago

Not sure about terminalogy:

  1. What is "file names uploaded" ?
  2. Where do you see "File Name: mt2 eyJmaWxlbmFt ..." ?

When you upload files, filenames and file modification times are encoded in Amazon Glacier metadata field x-amz-archive-description. Yes mt2 is the signature for this. Specification of encoding here: https://github.com/vsespb/mt-aws-glacier/blob/master/lib/App/MtAws/MetaData.pm#L37

You probably use another Glacier application which names 'x-amz-archive-description' field as "Filename" ?

You can easy check that filenames are stored right. Issue retrieve-inventory/download-inventory - yo'll see filenames inside new journal file (this is text file).

vsespb commented 11 years ago

Closing for now. Please reopen if needed.

vsespb commented 11 years ago

Just for clarification - there is no such thing as "filename" in Amazon Glacier API. Different Glacier programs store filenames different way in x-amz-archive-description field.

It's impossible to store unencoded filename in this field, because it allows only characters < 127. Also usually glacier clients wish to store file modification time together with filename.

okarmadillo commented 11 years ago

I ran a sync on a directory with 45 files that the names of each file describe the data and date/time of creation, i.e., CFTotalWaterAccum_GC_24hr_20110409_1200.nc.gz.done. After the sync and refresh completed, the journal contains the following information (this is one line). If I view the list in a 3rd party software, such as CloudBerry Explorer or Fast Glacier, the files are absolutely unrecognizable. Does that mean I have to keep the journal around and worry about backing it up as well? Do I have to share this journal with all the remote sites as well, so they know what is what?

From the journal.log: A 1367274027 CREATED SjHVyrAn0DffHMhC3RhbHQuZk_Zlzf2kiRjBSBJ0xilHC8rJhMhOh-baS7qCgYRiOyb_JLh7zVMIaXvKPy12MQVfDbv3hInbokQPhoShdAuQgTSpJgrWck5ganhuzh7HJ8BjBXj50A 8203548 1367264641 4904d1d97fe68e53d9fb9b1b5981c3af39f0dcc20ba959d2b7d77ed17d1907a7 CFTotalWaterAccum_GC_24hr_20110409_1200.nc.gz.done

Image from Fast Glacier: image

vsespb commented 11 years ago
  1. Currently you can restore filename/modification time of files, backuped with mtglacier, only with mtglacier. You cannot restore it with CloudBerry Explorer or Fast Glacier. Because each Glacier client maintain its own format of metadata.
  2. You don't have to backup you Journal file. But it's recommended. You can recover Journal from Amazon Glacier (but it can be slow) See about restoring journal from Amazon Glacier https://github.com/vsespb/mt-aws-glacier#restoring-journal Also see here Journal Concept https://github.com/vsespb/mt-aws-glacier#journal-concept
  3. If you are developer, you can decode metadata, stored by mtglacier, by yourself. Specification is published. https://github.com/vsespb/mt-aws-glacier/blob/master/lib/App/MtAws/MetaData.pm
  4. Usually Amazon Glacier clients do not support formats of other clients. But there are some exceptions. And I am going to implement support of metadata formats of 3rd party clients, but I did not hear yet that anyone is going to support mtglacier format.
  5. Actually seems that file modification time of file CFTotalWaterAccum_GC_24hr_20110409_1200.nc.gz.done is 1367264641 = Mon, 29 Apr 2013 19:44:01 GMT (i.e. not 20110409)
  6. On Amazon forums there are several discussions about metadata formats, like this one https://forums.aws.amazon.com/thread.jspa?threadID=105215&tstart=25
okarmadillo commented 11 years ago

Thank you for taking time to respond. I am starting to understand the difficulties of working with Glacier and finding a common ground for metadata formats. I also realized I need to read the existing documentation more closely. Thanks again...

5: my bad, the date in the name contains the date of the data inside the file, not creation date, hence different. It's weather data for that day.

vsespb commented 11 years ago

I think easier thing, that you can do now, is to put each file to ZIP (or other) archive, which support filenames. This probably beast solution, especially if file names are extremely important.