vsespb / mt-aws-glacier

Perl Multithreaded Multipart sync to Amazon Glacier
http://mt-aws.com/
GNU General Public License v3.0
535 stars 57 forks source link

Signature expired #8

Closed matthewboeckman-zz closed 11 years ago

matthewboeckman-zz commented 11 years ago

Trying to upload a 1.5 TB tar archive with the following options:

--concurrency=10 --partsize=256

Kicks off child pids, transfer begins and runs for ~15 minutes, then fails:

MT-AWS-Glacier, part of MT-AWS suite, Copyright (c) 2012 Victor Efimov http://mt-aws.com/ Version 0.72beta PID 10914 Started worker PID 10915 Started worker PID 10916 Started worker PID 10917 Started worker PID 10918 Started worker PID 10919 Started worker PID 10920 Started worker PID 10921 Started worker PID 10922 Started worker PID 10923 Started worker PID 10920 Created an upload_id mms0SKw4fYwRk7OrlOR1XBNYmm2uaGd0U25jaghYatulvQzL1y7t_xPatB25dZTOGN-5hDej7iIrz04iQVwSaoDmONTG PID 10920 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [2415919104] PID 10916 Uploaded part for /media/fppp/outgoing/xxxb.tar at offset [0] PID 10923 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [268435456] PID 10919 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [536870912] PID 10917 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [805306368] PID 10917 HTTP 500 This might be normal. Will retry PID 10915 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [1879048192] PID 10918 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [1610612736] PID 10922 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [1073741824] PID 10921 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [1342177280] PID 10914 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [2147483648] PID 10920 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [2684354560] PID 10919 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [3489660928] PID 10917 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [3758096384] PID 10914 Uploaded part for /media/fppp/outgoing/xxxb.tar at offset [5100273664] PID 10916 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [2952790016] PID 10923 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [3221225472] PID 10920 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [5368709120] PID 10922 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [4563402752] PID 10919 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [5637144576] PID 10921 Uploaded part for /media/fppp/outgoing/xxx.tar at offset [4831838208] PID 10915 HTTP 408 This might be normal. Will retry PID 10917 HTTP 408 This might be normal. Will retry Error: PUT http://glacier.us-east-1.amazonaws.com/-/vaults/video-test/multipart-uploads/mms0SKw4fYwRk7OrlOR1XBNYmm2uaGd0U25jaghYatulvQzL1y7t_xPatB25dZTOGN-5hDej7iIrz04iQVwSaoDmONTG Authorization: AWS4-HMAC-SHA256 Credential=AKIAJD3CQKG5ENGHYQUA/20121106/us-east-1/glacier/aws4_request, SignedHeaders=content-length;content-range;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-glacier-version;x-amz-sha256-tree-hash, Signature=2b1b637f2131173dbba62c68d8b14830b8bfe29ad2c3ce54e1dd1cf4595fa217 Host: glacier.us-east-1.amazonaws.com User-Agent: mt-aws-glacier/0.72beta (http://mt-aws.com/) libwww-perl/6.03 Content-Length: 268435456 Content-Range: bytes 4026531840-4294967295/* Content-Type: application/octet-stream X-Amz-Content-Sha256: f1c1d14f7b439d9365a4881bc29698912102c6647e8ea8598aeea0188c5a42d5 X-Amz-Date: 20121106T170833Z X-Amz-Glacier-Version: 2012-06-01 X-Amz-Sha256-Tree-Hash: 4441c69054ad0c84c6673efe877e6928e199b08b98fb99a74354061f429c0043

\xAF\xA8\xA2\xFE\xFF\xAE\xA7\xA1\30\xFF\xAD\xA6\xA1\xFF\xAE\xA6\xA1\xFF\xAF\xA7\xA2\xFF\xAF\xA8\xA2\xFF\xAE\xA7\xA1\xFF\xAD\xA7\xA1\xFF\xAE\xA7\xA1\xFF\xAD\xA8\xA1\xFF\xAE\xA9\xA2\xFF\xAD\xA7\xA2\xFF\xAD\xA6\xA1\xFF\xAE\xA7\xA3\xFF\xAF\xA8\xA4\xFF\xAE\xA7\xA3\xFF\xAD\xA6\xA2\xFF\xAE\xA8\xA4\xFF\xAE\xA7\xA3\xFF\xAE\xA8\xA4\xFF\xAE\xA9\xA5\xFF\xAD\xA8\xA4\xFF\xAC\xA7\xA4\xFF\xAD\xA7\xA4\xFF\xAE\xA8\xA5\xFF\xAE\xA8\xA4\xFB\xFF\xAD\xA8\xA4\xFE\xFF\xAD\xA8\xA3\xFD\xFF\xAD\xA8\xA4\r\xFF\xAD\xA8\xA3\xFF\xAD\xA7\xA3\xFF\xAD\xA8\xA3\xFF\xAE\xA8\xA4\xFF\xAD\xA8\xA4\xFF\xAC\xA8\xA4\xFF\xAC\xA8\xA3\xFF\xAD\xA8\xA3\xFF\xAE\xA9\xA4\xFF\xAF\xA8\xA4\xFF\xAE\xA8\xA4\xFF\xAF\xA8\xA4\xFF\xAE\xA8\xA4\xFE\xFF\xAE\xA7\xA4\xFC\xFF\xAD\xA7\xA3\2\xFF\xAC\xA7\xA2\xFF\xAD\xA7\xA3\xFE\xFF\xAC\xA8\xA3\xFE\xFF\xAD\xA8\xA4\t\xFF\xAE\xA9\xA5\xFF\xAE\xA8\xA5\xFF\xAE\xA7\xA4\xFF\xAE\xA8\xA4\xFF\xAF\xA8\xA5\xFF\xAF\xA8\xA4\xFF\xAE\xA7\xA4\xFF\xAE\xA8\xA4\xFF\xAE\xA7\xA3\xFE\xFF\xAD\xA7\xA3\3\xFF\xAE\xA8\xA4\xFF\xAF\xA8\xA4\xFF\xAE\xA7\xA4\xFE\xFF\xAE\xA7\xA3\1\xFF\xAE\xA7\xA4\xFD\xFF\xAE\xA7\xA3\6\xFF\xAE\xA8\xA4\xFF\xAF\xA8\xA4\xFF\xAF\xA9\xA5\xFF\xAF\xA8\xA5\xFF\xAD\xA8\xA4\xFF\xAD\xA7\xA3\xFC\xFF\xAD\xA8\xA4\1\xFF\xAE\xA8\xA4\xFD\xFF\xAD\xA8\xA3\3\xFF\xAE\xA9\xA4\xFF\xAF\xA9\xA5\xFF\xAD\xA7\xA3\xFE\xFF\xAD\xA8\xA4\10\xFF\xAE\xA8\xA4\xFF\xAD\xA8\xA3\xFF\xAD\xA7\xA3\xFF\xAD\xA8\xA3\xFF\xAE\xA8\xA3\xFF\xAE\xA8\xA4\xFF\xAE\xA9\xA4\xFF\xAD\xA8\xA4\xFE\xFF\xAC\xA7\xA3\5\xFF\xAD\xA7\xA4\xFF\xAD\xA8\xA4\xFF\xAD\xA8\xA5\xFF\xAD\xA7\xA4\xFF\xAD\xA8\xA4\xFE\xFF\xAE\xA8\xA5\xFD\xFF\xAD\xA8\xA4\5\xFF\xAD\xA8\xA3\xFF\xAD\xA8\xA4\xFF\xAE\xA9\xA5\xFF\xAF\xA9\xA5\xFF\xAE\xA9\xA5\xFC\xFF\xAE\xA8\xA4\2\xFF\xAD\xA8\xA4\xFF\xAC\xA7\xA3\xFE\xFF\xAD\xA8\xA4\t\xFF\xAE\xA8\xA4\xFF\xAD\xA8\xA3\xFF\xAD\xA7\xA3\xFF\xAE\xA8\xA4\xFF\xAC\xA7\xA3\xFF\xAD\xA8\xA4\xFF\xAE\xA9\xA4\xFF\xAD\xA8\xA4\xFF\xAD\xA8\xA3\xFD\xFF\xAD\xA8\xA4\xFE\xFF\xAD\xA7\xA5\2\xFF\xAD\xA8\xA6\xFF\xAE\xA8\xA6\xFD\xFF\xAD\xA8\xA6\2\xFF\xAE\xA9\xA7\xFF\xAC\xA7\xA5\xFE\xFF\xAC... (+ 268434944 more bytes not shown) HTTP/1.1 403 Forbidden Date: Tue, 06 Nov 2012 17:25:54 GMT Content-Length: 165 Content-Type: application/json Client-Date: Tue, 06 Nov 2012 17:25:54 GMT Client-Peer: 72.21.195.182:80 Client-Response-Num: 1 X-Amzn-RequestId: liig5Jr4BfYHLFEXowrdjY0mZn176MnqvrOIdhgmRbkZEIQ

{"message":"Signature expired: 20121106T170833Z is now earlier than 20121106T172054Z (20121106T172554Z - 5 min.)","code":"InvalidSignatureException","type":"Client"}

PARENT Exit

vsespb commented 11 years ago

Hello.

In request date is "X-Amz-Date: 20121106T170833Z" In Amazon's response: 20121106T172554Z

difference more than 5 mins indeed.

There can be several reasons for this:

1) You system clock are not accurate enough 2) Network is very slow (considering concurrency=10 and partsize=256Mb). So by the time it's uploaded signature will expire

3) Your system is under extremely high load (CPU 100% and some swapping). In my code current date calculated before calculation of SHA256 hash of chunk of data, so if this checksum calculation takes lot of time, date will expire. (normally it should calculate sha256 within just several seconds) 4) Amazon is slow - same result, signature will expire.

I believe most probably it's (1) or (2) or both. So, check your system clock accuracy. Reduce concurrency and/or partsize. I belive concurrency=5 is almost as good as 10.

I will try to handle (2) or (3),(4) situations somehow graceful in next versions.

vsespb commented 11 years ago

Yes, also "PID 10915 HTTP 408 This might be normal. Will retry" means timeout. It's symptom of (2),(4) too.

matthewboeckman-zz commented 11 years ago

Thanks - I'm trying to find the right balance - uploading a 1.5TB tarball, I'm on a 30mbps pipe - if I try to use the default partsize, I'm getting

Fatal Error: 0 With current partsize=16777216 we will exceed 10000 parts limit for the file /media/fppp/outgoing/xxx.tar (filesize 1604744837120) at FileCreateJob.pm line 50.

The network is not slow per se, but this transfer is flooding it and 5 256MB uploads seem to be enough to make it seem slow. Host system clock is accurate via NTP (verified) and no CPU load. I think I'm going to keep playing with concurrency and partsize until I can find a balance to get the parts uploaded within the timeframe required.

vsespb commented 11 years ago

I am thinking now..

When Amazon timeout happen (HTTP 408) due to high bandwidth usage, I retry that request. But in that retried request I use same signature/timestamp, like in previous.

So this can be reason why you receive "Signature expired" error.

This looks like bug indeed.

So, you still need to find right ballance between concurrency and partsize to avoid HTTP 408 (any 408 is a waste of bandwidth). But I will fix that bug with signature timestamp of retried requests in a day or two.

Thanks for reporting this.

p.s. also here https://forums.aws.amazon.com/thread.jspa?threadID=108795 people complainting about HTTP 408 timeouts with multipart uploads (they're using different tools), so this might be Amazon issue.

vsespb commented 11 years ago

Released version 0.73beta now. All HTTP 408 requests will be retried with new signature timestamp, so if you have sequence of HTTP 408 you won't get Signature Expired in the end.

However need to use lower concurrency to avoid timeout/expiration due to high network load.

vsespb commented 11 years ago

note: better use 0.74beta, there was a bug in 0.73beta.