uskudnik / amazon-glacier-cmd-interface

Command line interface for Amazon Glacier
MIT License
374 stars 100 forks source link

Multi-part upload resumption fails with SHA mismatch error #96

Open echamberlain opened 11 years ago

echamberlain commented 11 years ago

Hello,

All my uploads fail part way through with 408 errors. When I attempt to restart the upload, I get SHA mismatch errors.

Any idea what could be causing the problem?

Oct 28 18:26:12 DEBUG    glacier-cmd Found a matching upload id. Continuing upload resumption attempt.
Oct 28 18:26:12 DEBUG    glacier-cmd {u'MultipartUploadId': u'ecmSekGBlP0oL_3o7iFMoF0NnJ-v0AO8AwDYt-S0nk91ZbKpOXzlvUzXgmBUhq2FFGfKrqkb8ujJI5bH4A7jI2L4_Uy0', u'ArchiveDescription': u'2011-08-06 Bubba.zip', u'CreationDate': u'2012-10-29T01:09:43.595Z', u'PartSizeInBytes': 1048576, u'VaultARN': u'arn:aws:glacier:us-east-1:312324537546:vaults/TimeLapseArchive'}
Oct 28 18:26:13 ERROR    glacier-cmd ERROR: SHA256 hash mismatch.
Oct 28 18:26:13 INFO     glacier-cmd str: Received data does not match uploaded data; please check your uploadid and try again.
Caused by: 1 exception
  str: SHA256 hash mismatch.

Oct 28 18:26:13 DEBUG    glacier-cmd Traceback (most recent call last):
  File "/usr/local/bin/glacier-cmd", line 8, in <module>
    load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 751, in main
    args.func(args)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 147, in wrapper
    return fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 300, in upload
    args.name, args.partsize, args.uploadid, args.resume)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 62, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 204, in glacier_connect_wrap
    return func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 62, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 257, in sdb_connect_wrap
    return func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 62, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 1029, in upload
    code='ResumeError')
wvmarle commented 11 years ago

This is a known bug; fixed; needs merge. Check out my branch for the fixed code.

https://github.com/wvmarle/amazon-glacier-cmd-interface/tree/parallel_uploads

This will also attempt to retry those time-out errors. That happens now and then, I have no idea why, suspect it's either the http connection or the Glacier-side that's responsible. A simple retry should be enough to keep going.

echamberlain commented 11 years ago

When I run the parallel_uploads code, I get the following error:


glacier-cmd upload --uploadid ajxckUN1ffsMXTka-066bOFHBfVHvzFi3fvSLWQ2xtLBTk2mfZMBcUe4Ui6QNRyhr8EOwgxxgc4u_N1EuGdoZHut1ctA TimeLapseArchive 2012-06-05\ Transit.zip 
Traceback (most recent call last):Check done; resuming upload.                                                                                                                                                                       
  File "/usr/local/bin/glacier-cmd", line 8, in <module>
    load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 811, in main
    args.func(args)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 147, in wrapper
    return fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 302, in upload
    args.resume, args.sessions)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 211, in glacier_connect_wrap
    return func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 264, in sdb_connect_wrap
    return func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 1250, in upload
    while q.qsize() > 0: # wait for processing to finish
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 139, in qsize
    return self._maxsize - self._sem._semlock._get_value()
NotImplementedError
echamberlain commented 11 years ago

Also trying to abort the multipart upload generates an error:


glacier-cmd abortmultipart TimeLapseArchive ajxckUN1ffsMXTka-066bOFHBfVHvzFi3fvSLWQ2xtLBTk2mfZMBcUe4Ui6QNRyhr8EOwgxxgc4u_N1EuGdoZHut1ctA
Traceback (most recent call last):
  File "/usr/local/bin/glacier-cmd", line 8, in <module>
    load_entry_point('glacier==0.2dev', 'console_scripts', 'glacier-cmd')()
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 811, in main
    args.func(args)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 147, in wrapper
    return fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/glacier.py", line 220, in abortmultipart
    response = glacier.abortmultipart(args.vault, args.uploadId)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 211, in glacier_connect_wrap
    return func(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 68, in wrapper
    ret = fn(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/glacier-0.2dev-py2.7.egg/glacier/GlacierWrapper.py", line 813, in abortmultipart
    response = self.glacierconn.abort_multipart(vault_name, upload_id)
AttributeError: 'GlacierConnection' object has no attribute 'abort_multipart'
echamberlain commented 11 years ago

The first error looks to be an OS X issue.

From http://docs.python.org/2/library/multiprocessing.html:

qsize() Return the approximate size of the queue. Because of multithreading/multiprocessing semantics, this number is not reliable.

Note that this may raise NotImplementedError on Unix platforms like Mac OS X where sem_getvalue() is not implemented.

wvmarle commented 11 years ago

Mmm... Interesting... I may try to revive my old iBook G4 but no idea what version of Python can run on that, it's an oldie. Gonna be tricky to find a workaround for this.

Second one is a bug; got the wrong method name there. Fixed.

echamberlain commented 11 years ago

I did some research on work abounds for qsize, the best I could find is this: https://github.com/Fenixin/Minecraft-Region-Fixer/commit/6e559138a7fb5944d9f7b3e86a74525eef0687d6

I'm not sure if the workaround is applicable, my Python is a bit rusty.

echamberlain commented 11 years ago

Is there a difference between using qsize() > 0 and not empty()?

wvmarle commented 11 years ago

Should do the job as well. Will make amendments.

I just also noticed a hiatus in the docs: it is mentioned with the multiprocessing.Queue module but not with the Queue module itself. And iirc I'm using a Queue.Queue not a multiprocessing.Queue object. At least it was the docs of the second that I used for reference when implementing this :-)

wvmarle commented 11 years ago

I found an even better solution: it now waits until all upload processes are finished (or have crashed). The old way would cause the program to hang indefinitely when one of those processes would crash out, as the queue would never be fully emptied.

It still goes wrong in the end if all processes crash... still having serious issues with the ResponseException (i.e. credentials rejected).

wvmarle commented 11 years ago

It should work now, completely. Even if all processes crash during upload, in which case a new process will be created to finish up the queue.

echamberlain commented 11 years ago

Thank you for all your work.

I've tried an upload with the latest code, it runs for a while, then the network traffic stops and the following error is in the logs:


Oct 30 09:23:20 ERROR    glacier-cmd ERROR: Expected 204, got (403, code=InvalidSignatureException, message=The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

The Canonical String for this request should have been
'PUT
/-/vaults/TimeLapseArchive/multipart-uploads/5Q2wh0NoVF1suvIiJGaXQr2ixwBH78VH5uNZhu756voiYewlHnv551roHcCz5mlNTKlWCHoHhzuD3QWk-gevBWFIbm8H

host:glacier.us-east-1.amazonaws.com
x-amz-content-sha256:83b69d258c21c574fdbedf34f37d700053e2364bd55024b031fd7fd345e2a26c
x-amz-date:20121030T162320Z
x-amz-glacier-version:2012-06-01
x-amz-sha256-tree-hash:7e872017b16f6704986b7a11f551982c1abb05cebc2bc8f5eae8bbf263632991

host;x-amz-content-sha256;x-amz-date;x-amz-glacier-version;x-amz-sha256-tree-hash
83b69d258c21c574fdbedf34f37d700053e2364bd55024b031fd7fd345e2a26c'

The String-to-Sign should have been
'AWS4-HMAC-SHA256
20121030T162320Z
20121030/us-east-1/glacier/aws4_request
12180ca9fd27db08fddcba9c6e59d4914b8cff30d08d99b88011e6c105f2d03b'
)
Oct 30 09:23:20 INFO     glacier-cmd str: Error while uploading data to Amazon Glacier.
Caused by: 1 exception
  UnexpectedHTTPResponseError: Expected 204, got (403, code=InvalidSignatureException, message=The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

  The Canonical String for this request should have been
  'PUT
  /-/vaults/TimeLapseArchive/multipart-uploads/5Q2wh0NoVF1suvIiJGaXQr2ixwBH78VH5uNZhu756voiYewlHnv551roHcCz5mlNTKlWCHoHhzuD3QWk-gevBWFIbm8H

  host:glacier.us-east-1.amazonaws.com
  x-amz-content-sha256:83b69d258c21c574fdbedf34f37d700053e2364bd55024b031fd7fd345e2a26c
  x-amz-date:20121030T162320Z
  x-amz-glacier-version:2012-06-01
  x-amz-sha256-tree-hash:7e872017b16f6704986b7a11f551982c1abb05cebc2bc8f5eae8bbf263632991

  host;x-amz-content-sha256;x-amz-date;x-amz-glacier-version;x-amz-sha256-tree-hash
  83b69d258c21c574fdbedf34f37d700053e2364bd55024b031fd7fd345e2a26c'

  The String-to-Sign should have been
  'AWS4-HMAC-SHA256
  20121030T162320Z
  20121030/us-east-1/glacier/aws4_request
  12180ca9fd27db08fddcba9c6e59d4914b8cff30d08d99b88011e6c105f2d03b'
  )

The script doesn't recover from this condition it just hangs, until I manually cancel it.

wvmarle commented 11 years ago

Having the same error myself very often, particularly when using multiple sessions. It's something I don't understand really why it happens - I've tried various things and all give the same result. I can't produce that error reliably, making debugging really a matter or trial and error. When using a single session it usually churns along until it's finished.

And besides it is supposed to restart the process now, and continue until done - but it's tricky to recover reliably from crashing sub-processes.

echamberlain commented 11 years ago

I get the error every 4-5 GB of data uploaded and the process does not recover. I'm not specifying any session info on the command line.

echamberlain commented 11 years ago

I find if I set multiple sessions (10), then the processes slowly die, but the upload continues.

echamberlain commented 11 years ago

I'm still getting the InvalidSignatureException and after all my sessions except, the upload freezes. When I manually break out, the child processes are still hanging around and I have to manually kill them.

Even with 10 or 20 sessions, I can't get the upload to run for more than an hour.

offlinehacker commented 11 years ago

I'm rewriting whole upload routine making it fault proof and tested. Will be ready till the end of the week(or even sonner) if things will go right. On Nov 9, 2012 5:10 AM, "Eric Chamberlain" notifications@github.com wrote:

I'm still getting the InvalidSignatureException and after all my sessions except, the upload freezes. When I manually break out, the child processes are still hanging around and I have to manually kill them.

— Reply to this email directly or view it on GitHubhttps://github.com/uskudnik/amazon-glacier-cmd-interface/issues/96#issuecomment-10215100.