ihartley-zz commented 10 years ago

Hi, I am using the release version on a QNAP NAS (ARM based). I uploaded 30GB of jpgs with concurrency 4 and partsize 8, and consistently got 400KB/sec over a 3Mb/sec upload link (i.e. using all bandwidth).

Now I am uploading large GB video files, and using concurrency 1 or 2 and partsize 32. I fail to get over 100KB/s and the CPU is constantly maxed out (which I would expect for hasing but not uploading). Am I doing something wrong?? (I don't have memory issues or other heavy processes running, and I tested that other processes can utilise all the bandwidth).

As a suitable point I will restart and go down to part 8 and con=4 as before to check if it improves things. But this limits the filesize I can upload.

It's STILL great software though (and am still researching my other issue). Thanks, H.

ihartley-zz commented 10 years ago

Update: I can confirm that with 8MB partsize and con=4 I get full bandwidth usage. Same with con=2 and almost with con=1 (slight drop). Doubling the partsize halves the upload to 200KB/s, and with partsize=16 = 100KB/sec. Something strange happening???

vsespb commented 10 years ago

Hi.

I fail to get over 100KB/s

There is something documented about this:

With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500.

So poor overall bandwidth utilization can be explained if you get HTTP 408/500 / Timeouts too often. Need to see program output to confirm this. Also if you have 100 timeouts at row, you'll get error and program will terminate (with SIGCHLD etc).

Why partsize*concurrency causes timeouts? I am not sure. I think Amazon issue.

They did not answer my question about this https://forums.aws.amazon.com/message.jspa?messageID=421321

In another thread they claim internet works that way https://forums.aws.amazon.com/thread.jspa?threadID=136432

If you are attempting to upload very large archives (100MB+), you could be seeing issues with packet loss across the internet. Large file transfer bandwidth typically degrades over time due to this. This is a general tcp over the internet problem,

(I cannot find scientific proof for this explanation too)

You also can play with tcp congestion control. I myself experience this too. I have ADSL 10Mb/1Mb connection, I can do upload/download over TCP with any concurrency to other server. But not to Amazon. partsize=8 and conc=2 is something optimal to me.

and the CPU is constantly maxed out (which I would expect for hasing but not uploading

That should not happen. Strange. Need to see program output for the beginning.

part 8 and con=4 as before to check if it improves things. But this limits the filesize I can upload.

part 8 is 80Gb file limit

Also, I think you first need set up logging on your NAS. Try create .sh file with a redirection

(something like mtglacier ... 2>&1 | tee -a /some/log/file (didn't test this line) )

then change your secret key to wrong one and run, you should be ablse to see errors in logs (this way you can test your logging and then run in real)

ihartley-zz commented 10 years ago

OK, did that, but the file still has just EXIT on SIGCHLD

I think I need to set more debugging info. FYI I also posted an issue at https://forums.aws.amazon.com/thread.jspa?threadID=141380 . It IS probably related to my IP stack, but if I'm having the problem I'm sure others are too. H.

vsespb commented 10 years ago

still has just EXIT on SIGCHLD

lets try my script:

#!/bin/sh
mtglacier --key xxxxxxxxxxxxxxxxxxxx --secret xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --region us-west-1 sync --dir . --vault somevault --journal /tmp/j.tmp 2>&1 | tee -a mylog.log

(dont change anything, vault is "somevault" it real, key and secret are wrong (repeat of "x" chars) intentionally. well, you can change dir to point to non-empty directory with few small files)

If I run this .sh file (from same directory where this file resides) I am getting (in mylog.log):

MT-AWS-Glacier, Copyright 2012-2013 Victor Efimov http://mt-aws.com/ Version 1.101

PID 25764 Started worker
PID 25765 Started worker
PID 25766 Started worker
PID 25767 Started worker
Error:
===REQUEST:
POST http://glacier.us-west-1.amazonaws.com/-/vaults/somevault/multipart-uploads
Authorization: AWS4-HMAC-SHA256 Credential=***REMOVED***/20131209/us-west-1/glacier/aws4_request, SignedHeaders=host;x-amz-archive-description;x-amz-date;x-amz-glacier-version;x-amz-part-size, Signature=***REMOVED***
Host: glacier.us-west-1.amazonaws.com
User-Agent: mt-aws-glacier/1.101 (http://mt-aws.com/) libwww-perl/6.05
X-Amz-Archive-Description: mt2 eyJmaWxlbmFtZSI6IjEuc2giLCJtdGltZSI6IjIwMTMxMjA5VDExNTAyOFoifQ
X-Amz-Date: 20131209T115035Z
X-Amz-Glacier-Version: 2012-06-01
X-Amz-Part-Size: 16777216

===RESPONSE:
HTTP/1.1 403 Forbidden
Date: Mon, 09 Dec 2013 11:50:35 GMT
Content-Length: 121
Content-Type: application/json
Client-Date: Mon, 09 Dec 2013 11:50:36 GMT
Client-Peer: 204.246.160.247:80
Client-Response-Num: 1
X-Amzn-RequestId: ShKM-eWWWvcnlJOEn_pZ5-YZEhZFZJABQEgOfHIXUUQS0hg

{"message":"The security token included in the request is invalid.","code":"UnrecognizedClientException","type":"Client"}

ERROR (child 25767): Unexpected reply from remote server

EXIT on SIGCHLD

you should get something like this too.

ihartley-zz commented 10 years ago

Yes, got that....

My command was mtglacier sync --config camcorder.cfg --new --replace-modified --exclude '.@_thumb/' --exclude '.symform/' --exclude 'symform/' --exclude 'iPod Photo Cache/' --filter '-.ini -.FCS -.txt -.db -.info -.INFO-.THM -.thm' --filter '-.IDX -.ZBD -.ZST -.ithmb -.tmp' --filter '-.modd -.moff -_.mta' 2>&1 | tee ~/backups/error.log

Maybe something wrong. I don't get any error on stdout or in the log?? Sorry for being stupid English bad Linux guy :-)

On Monday, 9 December 2013, 11:55, Victor Efimov notifications@github.com wrote:

still has just EXIT on SIGCHLD

lets try my script:

!/bin/sh

mtglacier --key xxxxxxxxxxxxxxxxxxxx --secret xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --region us-west-1 sync --dir . --vault somevault --journal /tmp/j.tmp 2>&1 | tee -a mylog.log (dont change anything, vault is "somevault" it real, key and secret are wrong (repeat of "x" chars) intentionally. well, you can change dir to point to non-empty directory with few small files) If I run this .sh file (from same directory where this file resides) I am getting (in mylog.log): MT-AWS-Glacier, Copyright 2012-2013 Victor Efimov http://mt-aws.com/ Version 1.101 PID 25764 Started worker PID 25765 Started worker PID 25766 Started worker PID 25767 Started worker Error: ===REQUEST: POST http://glacier.us-west-1.amazonaws.com/-/vaults/somevault/multipart-uploads Authorization: AWS4-HMAC-SHA256 Credential=_REMOVED_/20131209/us-west-1/glacier/aws4_request, SignedHeaders=host;x-amz-archive-description;x-amz-date;x-amz-glacier-version;x-amz-part-size, Signature=_REMOVED_ Host: glacier.us-west-1.amazonaws.com User-Agent: mt-aws-glacier/1.101 (http://mt-aws.com/) libwww-perl/6.05 X-Amz-Archive-Description: mt2 eyJmaWxlbmFtZSI6IjEuc2giLCJtdGltZSI6IjIwMTMxMjA5VDExNTAyOFoifQ X-Amz-Date: 20131209T115035Z X-Amz-Glacier-Version: 2012-06-01 X-Amz-Part-Size: 16777216 ===RESPONSE: HTTP/1.1 403 Forbidden Date: Mon, 09 Dec 2013 11:50:35 GMT Content-Length: 121 Content-Type: application/json Client-Date: Mon, 09 Dec 2013 11:50:36 GMT Client-Peer: 204.246.160.247:80 Client-Response-Num: 1 X-Amzn-RequestId: ShKM-eWWWvcnlJOEn_pZ5-YZEhZFZJABQEgOfHIXUUQS0hg {"message":"The security token included in the request is invalid.","code":"UnrecognizedClientException","type":"Client"} ERROR (child 25767): Unexpected reply from remote server EXIT on SIGCHLD you should get something like this too. — Reply to this email directly or view it on GitHub.

vsespb commented 10 years ago

Maybe something wrong. I don't get any error on stdout or in the log?

even if I use wrong command:

#!/bin/sh
totally-broken--nonexistant-command 2>&1 | tee -a mylog.log

I get something in logs:

./2.sh: 2: ./2.sh: totally-broken--nonexistant-command: not found

so try first fix your logging script. try first the command I posted above (with "xxx" in keys and secret) (i.e. don't debug two complex thing at once. debug simple things first - like empty commands with predictable output, then debug complex thing - your real mtglacier command)

or even use something very simple

#!/bin/sh
echo Hello_World 2>&1 | tee -a mylog.log

I am not sure how cron etc should work on QNAP NAS try ask on QNAS forum for help getting something very simple to work (like Hello_World above)

ihartley-zz commented 10 years ago

Hi, Sorry but I don't get any further info. I even tried "use diagnostics;" on mtglacier

All I get is: PID 11040 Started worker PID 11041 Started worker PID 11042 Started worker PID 11043 Started worker Found 1000 local files PID 11042 Created an upload_id sO4ePIAw0gBVDSld3Cq2D6tkXrfdQW9agw5oTEeITKcAF2IEaL0cE6eNoXgTyVeDIJoX1DMJhAql7PbitiQDlgsNStQS PID 11040 Created an upload_id hXhuQMcCAvLx6wG1uQGTXsBuMOO_LTdVkc-U6IzZgsaJk3UhVbqGKfKzZI0kk2hHqkzsw6Qa9HYpK8stSwjdky5PlBtx PID 11041 Created an upload_id 0Du2W44SfvLHg_Y_SOdx-cmO2sDdx05XRunVwqiAam3Jh74gCt8rs4ACfmGkn63IS-8VkRVYd1hkfljfmKR2RuEOyMUE PID 11043 Created an upload_id vK3rxkf3rlLM7JaK6UBI2rRsAkQI0a0HFzMRSBG8Ai-MHw9rrLiuzt3D9R8Yijd13GgBAwxpUCEO1wlrvudBEXKZT6Vx PID 11041 HTTP 408 This might be normal. Will retry (239 seconds spent for request) PID 11040 HTTP 408 This might be normal. Will retry (312 seconds spent for request) PID 11042 Uploaded part for 14-02-2012/20111224194152.mpg at offset [0] PID 11043 Uploaded part for 20070602223110.mpg at offset [33554432] PID 11042 HTTP 408 This might be normal. Will retry (15 seconds spent for request) PID 11043 HTTP 408 This might be normal. Will retry (73 seconds spent for request) PID 11043 HTTP 408 This might be normal. Will retry (82 seconds spent for request) PID 11041 Uploaded part for 20070602223110.mpg at offset [16777216] PID 11040 Uploaded part for 20070602223110.mpg at offset [0] PID 11041 HTTP 408 This might be normal. Will retry (112 seconds spent for request) PID 11042 Uploaded part for 20070602223110.mpg at offset [50331648] PID 11043 Uploaded part for 20070602223110.mpg at offset [67108864]

EXIT on SIGCHLD

I can see it's a timeout, but I can't see why... Am happy to try any debugging if you can give instructions. My perl is a bit limited (well, very limited). Will move back to 8MB partsize for the moment.

vsespb commented 10 years ago

hi. ok at least I see now some output.

and did you tried:

#!/bin/sh
mtglacier --key xxxxxxxxxxxxxxxxxxxx --secret xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --region us-west-1 sync --dir . --vault somevault --journal /tmp/j.tmp 2>&1 | tee -a mylog.log

?

vsespb commented 10 years ago

also, did you check logs/dmesg for OOM killer ?

http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

egrep -i 'killed process' /var/log/messages
dmesg | egrep -i 'killed process'
grep "Killed process" /var/log/syslog

ihartley-zz commented 10 years ago

Sorry yes, I get same output as you - more detailed info. But with SIGCHLD no extra info.

I will hammer Amazon to fix their API if I need, but I first need to know what's happening. And for that I need your help. I'm currently uploading a 7GB file with partsize 8 so I will see if that finishes, then I'll update you. But of course it would be better with partsize 128 :-)

The "double partsize, half bandwidth" looks like some kind of bug somewhere. I know a lot about how the internet works, and that's not it - blimey when I first started using the Internet even Vint Cerf didn't have a beard :-)

vsespb commented 10 years ago

test for OOM killer (see post above)
try this:

perl -e 'my $s = "x" x 100_000_000; sleep 10;' && echo OK

(this should take 100Mb of RAM. sleep 10 seconds and print OK; otherwise you have memory limit somewhere)

know a lot about how the internet works, and that's not it

I agree. Today I modified mtglacier to use curl (binary program, not library). I got 408's with hight partsize/concurrency too.

* About to connect() to glacier.us-east-1.amazonaws.com port 80 (#0)
*   Trying 72.21.195.182...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0connected
> PUT /-/vaults/test1/multipart-uploads/O8c5LN9yXIaxcy0iZrEV3Mmg8HNFMqvRwWwh42LvnHDrF4nE-7dSeww9hYjrpJDekilJZKukH18mUqZzy6MCKTIZfiFj HTTP/1.0
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Accept: */*
> Host: glacier.us-east-1.amazonaws.com
> x-amz-glacier-version: 2012-06-01
> Content-Type: application/octet-stream
> Content-Length: 4241619
> x-amz-content-sha256: 900f4beaf903ba7e35501d1ed46597e94029c5c741c9696718dc1690aed17c6c
> x-amz-sha256-tree-hash: a220cd3f298da3a20b9aa38ebb6deb7152b38376566bd991990a8b7a4aacdb56
> Content-Range: bytes 0-4241618/*
> x-amz-date: 20131210T113504Z
> Authorization: AWS4-HMAC-SHA256 Credential=**REMOVED**20131210/us-east-1/glacier/aws4_request, SignedHeaders=content-length;content-range;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-glacier-version;x-amz-sha256-tree-hash, Signature=**REMOVED**
> 
} [data not shown]
  1 4142k    0     0    1 63904      0   5140  0:13:45  0:00:12  0:13:33     0< HTTP/1.1 408 Request Timeout
< x-amzn-RequestId: JBvTo3-SV2NYQi8Hyq9q6zOr4OgOHoEHo4T94aJ9dL1sOwE
< Content-Type: application/json
< Content-Length: 81
< Date: Tue, 10 Dec 2013 11:35:16 GMT
< Connection: keep-alive
* HTTP error before end of send, stop sending
< 
{ [data not shown]
  1 4142k  100    81    1 63904      6   5135  0:13:46  0:00:12  0:13:34     0
* Closing connection #0
{"code":"RequestTimeoutException","message":"Request timed out.","type":"Client"}

so it's related to amazon, imho.

also, some time ago I tried on Amazon EC2 servers, network issues present too (although more concurrency allowed, of course)

vsespb commented 10 years ago

I posted question about timeout to Amazon forums again: https://forums.aws.amazon.com/thread.jspa?threadID=115858&tstart=0

ihartley-zz commented 10 years ago

Hi, please don't spend too long on this issue: I suspect it's either Amazon or my IP stack.

No memory issues (already checked and used your script). As soon as I set 8MB partsize I get constant 400KB/sec. I have excellent DSL/cable service so that's not it. And I don't get any problems with other apps on my platform....

Of course, I'd like it resolved. So am happy to blast Amazon if needed. Thanks.

vsespb commented 10 years ago

I think there are two issues:

1) many http 408 timeouts 2) exit on SIGCHLD

(1) is we suspect Amazon related

(2) should not happen because (1). not sure why it's happening. never seen before. only real reason for this is OOM killer or other OS killer with signals.

if I knew why (2) happening, I would fix it. But I have no idea yet.

ihartley-zz commented 10 years ago

If you can tell me how to run more debugging I will!

On Tuesday, 10 December 2013, 13:06, Victor Efimov notifications@github.com wrote:

I think there are two issues:

1) many http 408 timeouts 2) exit on SIGCHLD (1) is we suspect Amazon related (2) should not happen because (1). not sure why it's happening. never seen before. only real reason for this is OOM killer or other OS killer with signals. if I knew why (2) happening, I would fix it. But I have no idea yet. — Reply to this email directly or view it on GitHub.

vsespb commented 10 years ago

need output from mtglacier --version. it will print all modules versions and mtglacier version. also need output from perl -V (capital "V")

vsespb commented 10 years ago

Also here is version https://github.com/vsespb/mt-aws-glacier/tree/stability_check (i.e. branch stability_check) which will print additional info when processes terminate. this would help if you reproduce issue again with this version, and send output to me

ihartley-zz commented 10 years ago

mt-aws-glacier version: 1.059 Perl Version: 5.010000 AutoLoader 5.63 /opt/lib/perl5/5.10.0/AutoLoader.pm Carp 1.08 /opt/lib/perl5/5.10.0/Carp.pm Carp::Heavy undef /opt/lib/perl5/5.10.0/Carp/Heavy.pm Class::Struct 0.63 /opt/lib/perl5/5.10.0/Class/Struct.pm Config undef /opt/lib/perl5/5.10.0/arm-linux/Config.pm Cwd 3.2501 /opt/lib/perl5/5.10.0/arm-linux/Cwd.pm Digest::SHA 5.45 /opt/lib/perl5/5.10.0/arm-linux/Digest/SHA.pm Digest::base 1.00 /opt/lib/perl5/5.10.0/Digest/base.pm DynaLoader 1.08 /opt/lib/perl5/5.10.0/arm-linux/DynaLoader.pm Encode 2.23 /opt/lib/perl5/5.10.0/arm-linux/Encode.pm Encode::Alias 2.07 /opt/lib/perl5/5.10.0/arm-linux/Encode/Alias.pm Encode::Config 2.04 /opt/lib/perl5/5.10.0/arm-linux/Encode/Config.pm Encode::Encoding 2.05 /opt/lib/perl5/5.10.0/arm-linux/Encode/Encoding.pm Errno 1.1 /opt/lib/perl5/5.10.0/arm-linux/Errno.pm Exporter 5.62 /opt/lib/perl5/5.10.0/Exporter.pm Exporter::Heavy 5.62 /opt/lib/perl5/5.10.0/Exporter/Heavy.pm Fcntl 1.06 /opt/lib/perl5/5.10.0/arm-linux/Fcntl.pm File::Basename 2.76 /opt/lib/perl5/5.10.0/File/Basename.pm File::Find 1.12 /opt/lib/perl5/5.10.0/File/Find.pm File::Path 2.09 /opt/lib/perl5/5.10.0/File/Path.pm File::Spec 3.2501 /opt/lib/perl5/5.10.0/File/Spec.pm File::Spec::Unix 3.2501 /opt/lib/perl5/5.10.0/File/Spec/Unix.pm File::Temp 0.2304 /opt/lib/perl5/5.10.0/File/Temp.pm File::stat 1.00 /opt/lib/perl5/5.10.0/File/stat.pm FileHandle 2.01 /opt/lib/perl5/5.10.0/FileHandle.pm Getopt::Long 2.37 /opt/lib/perl5/5.10.0/Getopt/Long.pm HTTP::Date 6.02 /opt/lib/perl5/site_perl/5.10.0/HTTP/Date.pm HTTP::Headers 6.05 /opt/lib/perl5/site_perl/5.10.0/HTTP/Headers.pm HTTP::Message 6.06 /opt/lib/perl5/site_perl/5.10.0/HTTP/Message.pm HTTP::Request 6.00 /opt/lib/perl5/site_perl/5.10.0/HTTP/Request.pm HTTP::Response 6.04 /opt/lib/perl5/site_perl/5.10.0/HTTP/Response.pm HTTP::Status 6.03 /opt/lib/perl5/site_perl/5.10.0/HTTP/Status.pm I18N::Langinfo 0.02 /opt/lib/perl5/5.10.0/arm-linux/I18N/Langinfo.pm IO 1.23_01 /opt/lib/perl5/5.10.0/arm-linux/IO.pm IO::File 1.14 /opt/lib/perl5/5.10.0/arm-linux/IO/File.pm IO::Handle 1.27 /opt/lib/perl5/5.10.0/arm-linux/IO/Handle.pm IO::Pipe 1.13 /opt/lib/perl5/5.10.0/arm-linux/IO/Pipe.pm IO::Seekable 1.1 /opt/lib/perl5/5.10.0/arm-linux/IO/Seekable.pm IO::Select 1.17 /opt/lib/perl5/5.10.0/arm-linux/IO/Select.pm JSON::XS 1.5 /opt/lib/perl5/site_perl/5.10.0/arm-linux/JSON/XS.pm LWP 6.05 /opt/lib/perl5/site_perl/5.10.0/LWP.pm LWP::MemberMixin undef /opt/lib/perl5/site_perl/5.10.0/LWP/MemberMixin.pm LWP::Protocol 6.00 /opt/lib/perl5/site_perl/5.10.0/LWP/Protocol.pm LWP::UserAgent 6.05 /opt/lib/perl5/site_perl/5.10.0/LWP/UserAgent.pm List::Util 1.19 /opt/lib/perl5/5.10.0/arm-linux/List/Util.pm MIME::Base64 3.07_01 /opt/lib/perl5/5.10.0/arm-linux/MIME/Base64.pm POSIX 1.13 /opt/lib/perl5/5.10.0/arm-linux/POSIX.pm PerlIO::encoding 0.10 /opt/lib/perl5/5.10.0/arm-linux/PerlIO/encoding.pm Scalar::Util 1.19 /opt/lib/perl5/5.10.0/arm-linux/Scalar/Util.pm SelectSaver 1.01 /opt/lib/perl5/5.10.0/SelectSaver.pm Storable 2.18 /opt/lib/perl5/5.10.0/arm-linux/Storable.pm Symbol 1.06 /opt/lib/perl5/5.10.0/Symbol.pm Tie::Hash 1.02 /opt/lib/perl5/5.10.0/Tie/Hash.pm Time::Local 1.18 /opt/lib/perl5/5.10.0/Time/Local.pm Time::localtime 1.02 /opt/lib/perl5/5.10.0/Time/localtime.pm Time::tm 1.00 /opt/lib/perl5/5.10.0/Time/tm.pm URI 1.35 /opt/lib/perl5/site_perl/5.10.0/URI.pm URI::Escape 3.28 /opt/lib/perl5/site_perl/5.10.0/URI/Escape.pm XSLoader 0.08 /opt/lib/perl5/5.10.0/arm-linux/XSLoader.pm base 2.13 /opt/lib/perl5/5.10.0/base.pm bytes 1.03 /opt/lib/perl5/5.10.0/bytes.pm constant 1.13 /opt/lib/perl5/5.10.0/constant.pm integer 1.00 /opt/lib/perl5/5.10.0/integer.pm overload 1.06 /opt/lib/perl5/5.10.0/overload.pm parent 0.228 /opt/lib/perl5/site_perl/5.10.0/parent.pm re 0.08 /opt/lib/perl5/5.10.0/arm-linux/re.pm strict 1.04 /opt/lib/perl5/5.10.0/strict.pm utf8 1.07 /opt/lib/perl5/5.10.0/utf8.pm vars 1.01 /opt/lib/perl5/5.10.0/vars.pm warnings 1.06 /opt/lib/perl5/5.10.0/warnings.pm warnings::register 1.01 /opt/lib/perl5/5.10.0/warnings/register.pm OK DONE

This is what I am running right now.

vsespb commented 10 years ago

Thanks, but I need other data (I posted above)

ihartley-zz commented 10 years ago

Duh - silly me. See below. I've uploaded about 70GB over the last few days. Increasing my IP connection from 3Mb/s to 12Mb/s didn't have any impact, though speedtest.net consistently reports 10-12Mb/s upload speed. I have no issues with OOM, even with 0.5GB RAM + the same swap it's

Perl -V:

perl -V

Summary of my perl5 (revision 5 version 10 subversion 0) configuration: Platform: osname=linux, osvers=2.6.26-1-orion5x, archname=arm-none-linux-gnueabi uname='linux mv2120 2.6.26-1-orion5x #1 sat aug 9 21:19:51 utc 2008 armv5tel gnulinux ' config_args='-Dcc=gcc -Dprefix=/opt -Duseshrplib -Dd_dlopen -de' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='arm-none-linux-gnueabi-gcc', ccflags ='-fno-strict-aliasing -pipe -I/opt/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing' ccversion='', gccversion='4.3.1', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='arm-none-linux-gnueabi-ld', ldflags =' -L/home/slug/optware/cs08q1armel/staging/opt/lib -Wl,-rpath,/opt/lib -Wl,-rpath-link,/home/slug/optware/cs08q1armel/staging/opt/lib -Wl,-rpath,/opt/lib/perl5/5.10.0/arm-linux/CORE' libpth=/opt/lib /lib /usr/lib libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc -lgcc_s $$($(CC) -print-libgcc-file-name) libc=/lib/libc-2.7.so, so=so, useshrplib=true, libperl=libperl.so gnulibc_version='2.7' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/opt/lib/perl5/5.10.0/arm-linux/CORE' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/opt/lib -L/opt/local/lib'

Characteristics of this binary (from libperl): Compile-time options: PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP USE_LARGE_FILES USE_PERLIO Built under linux Compiled at May 2 2012 20:50:46 @INC: /opt/lib/perl5/5.10.0/arm-linux /opt/lib/perl5/5.10.0 /opt/lib/perl5/site_perl/5.10.0/arm-linux /opt/lib/perl5/site_perl/5.10.0 .

vsespb commented 10 years ago

And did you try https://github.com/vsespb/mt-aws-glacier/tree/stability_check (see I posted above)

vsespb commented 10 years ago

also I need output of perl -MNet::HTTP -e 'print $Net::HTTP::VERSION'

ihartley-zz commented 10 years ago

Update: I ran the software under Windows/Cygwin and got consistent 11.7Mb/s upload. So it is a platform/QNAP issue. I don't think it's related to the network if as I can easily get 100MB/s transfer rate transferring files over a Gb LAN. I suspect it might be maxing out the CPU, and this is causing the issue, but don't know why?? I'll keep trying... :-)

vsespb commented 10 years ago

Need perl -MNet::HTTP -e 'print $Net::HTTP::VERSION' for both Cygwin and QNAP
Also, don't use mtglacier under Cygwin for production, it should be broken under cygwin.
Need distinc three different issues - a) EXIT with SIGCHLD b) 100% CPU Usage c) Lots of HTTP 408/500

for (a) I have branch https://github.com/vsespb/mt-aws-glacier/tree/stability_check for (b) I need versions of perl -MNet::HTTP -e 'print $Net::HTTP::VERSION' for (c).. Well, your testing under Cygwin does not prove much. The thing is I think that Amazon servers have short socket timeouts (like 10 seconds or so) and Linux default TCP cognition control works bad with it. I.e. if there is no enough bandwidth, delays for each individual connection increase (and they really do). Windows probably not really affected by this problem. And I still think that it's Amazon issue because they set too short timeouts which works bad under linux with default network config.

ihartley-zz commented 10 years ago

OK, I think we have solved the problem. CPU needs another hamster! :-)

The ARM CPU is overloaded doing encryption for for https. Changing to http gives full bandwidth upload. My bad :-( You might want to put something in your readme - lots of NAS have poor processors, and the NAS mfrs like to sell all the bells and whistles, so enable by default media transcoding, web server, DLNA, anti-virus, etc, etc.

I'm not saying this is the answer to similar problems, but it solves my problem.

FYI - my ARM in a QNAP 419+ (currently model) is obviously maxing on encryption - Windows on an AMD X3 425 was hitting about 10% CPU util....

YOUR SOFTWARE ROCKS!! Thank you so much. Must earn more money to buy better NAS :-) P.S. Cygwin does give a BIG warning "don't use this" when running script :-)

vsespb commented 10 years ago

Ok, so that fixed 100% CPU problem.

But what about too many HTTP 408/500 ? How about SIGCHLD ?

P.S. Cygwin does give a BIG warning "don't use this" when running script :-) ah yes I added that already.

ihartley-zz commented 10 years ago

I will create/upload a new vault tonight so will let you know tomorrow. Under cygwin (just because no CPU issue; but no https as my install didn't have t right packages) I see a few 408s but I was running conc=10 and partsize=64, so expected behvaiour. Still worked OK.

Will check re SIGCHLD - it could be timeout from CPU issue, but not from OOM / bandwidth. Even with all your suggestions I never had any other details/errors - just "exit with SIGCHLD". So again I guess a platform issue.

I'm a bit dumb I think! :-) Had lots of issues with Symform and others because of "encrypt locally before send", never thought that https would have the same overhead....... Duh.

vsespb commented 10 years ago

For sigchild testing use this branch https://github.com/vsespb/mt-aws-glacier/tree/stability_check

HTTPS encryption performance - who knows, maybe only perl HTTPS library is slow, or perl HTTPS library contains 100% CPU bug.

ihartley-zz commented 10 years ago

Currently running conc=10, partsize=16 with no probs and full bandwidth usage :-) CPU running <30% with no encryption

vsespb commented 10 years ago

So, what are outstanding issues now?

vsespb commented 10 years ago

Maybe you have kind of task killer on your system which kills processes which uses too much CPU.

Try to run

perl -e 'while(){}' || echo $?

same way you run mtglacier

this will cause infinite loop with 100% one core usage. wait 10 mins. maybe OS will kill it? otherwise you'll have to kill it yourself.

be careful don't overheat your system.

ihartley-zz commented 10 years ago

Hi Victor,

I think all my problems are solved:

408s seem to be caused if the system can't upload fast enough. For me this was a combination of: a. 100% CPU utilisation, causing processes to wait , and b. setting partsize * conc too high when uploading videos.
the SIGCHD was caused by 100% CPU usage. I don't have any task killer but they must have timed out (maybe in network / SSL software?). There was no error indication from mtglacier
the 100% CPU utilisation was caused by using SSL - the ARM processor isn't capable or encrypting fast enough.

I have uploaded over 100GB, retrieved a little (to test), and deleted 50GB of archives oriignally created with fastglacier. Everything is fantastic I think. The flexibility it gives means you can filter the data, and back up what you want when you want without any real software limitations.

So thank you. Very much.

H.

On Saturday, 14 December 2013, 19:16, Victor Efimov notifications@github.com wrote:

Maybe you have kind of task killer on your system which kills processes which uses too much CPU.

Try to run perl -e 'while(){}' || echo $? same way you run mtglacier this will cause infinite loop with 100% one core usage. wait 10 mins. maybe OS will kill it? otherwise you'll have to kill it yourself. be careful don't overheat your system. — Reply to this email directly or view it on GitHub.

vsespb commented 10 years ago

Hello. Ok, cool. So I am closing it for now.

There was no error indication from mtglacier

that should be a bug somewhere, if I ever reproduce it I'll try to fix/workaround

vsespb commented 10 years ago

hey, @ihartley I saw this thread http://forum.qnap.com/viewtopic.php?uid=178719&f=24&t=62952&start=105 on qnap forums.

If you want, I am ready to help making instruction for QTS install. I don't have access to QTS. But if yo'd have good description (with program output) of problems during install, I might advice something.

let's either start new Github issue or switch to email - vs at vs-dev.com

vsespb / mt-aws-glacier

High CPU / slow upload for large file/partsize #55

!/bin/sh

perl -V