Closed klokan closed 11 years ago
We may need to parallelise also deleting of files in a bucket, because this must be as fast as possible too.
There is an operation in Amazon S3 REST API, which can delete multiple objects (up to 1000) using a single HTTP request:
http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html
This service is not currently implemented by webstor, but I can add it.
Let's first finish well the uploading part please... delete can be added later on.
Related to performance: http://www.oblaksoft.com/high-performance-amazon-s3-api-webstor/ They make the tests with wsperf and 1MB file (I think). We may need even more threads for smaller files? I don't know what happens if you have too many threads on a slow upload line, probably nothing bad, so we can set the default pretty high probably? 64 or more?
Do you have timeframe for making these tests, Vasek?
I raised the limit on maximal number of connections to 256 in this commit:
https://github.com/klokan/webstor/commits/testbcba2cf6e5956302f7c5c6c5604dc6ab69a1ca3b
Preliminary tests on grandcanyon collection on 100 MBit upload line are:
R time 4 3:55 8 1:44 16 0:59 32 0:28 64 0:19
I will do more testing tomorrow using standard deviation and confidence interval (there is a big difference in subsequent runs).
Webstor uses asynchronous I/O, so you don't need a lot of threads:
Please implement also the directory walk under Microsoft Windows.
Please made also the delAll in a more efficient way. THX.
I expect this ticket is closed with 208e5ed038090f89d1d20238bed157868885019f. Correct Vasek?
Vasek, could you please update documentation in README in master and close this ticket, if everything is finished on this work?
Implement efficient upload (PUT) of a supplied directory with thousands of small (~1Kb) files - with maximised transfer speed and full saturation of the available network line.
Implementation should extend the wscmd command:
Instead of the 'image.jpg' could be supplied a directory, in such case a recursive walk will upload all files within, using multiple asynchronous managers (as demonstrated in the wsperf.cpp code).
Run tests comparing performance of such a solution with the https://github.com/klokan/s3-parallel-put