klokan / webstor

High performance C++ API for Amazon S3
Apache License 2.0
1 stars 2 forks source link

Directory upload #2

Closed klokan closed 11 years ago

klokan commented 11 years ago

Implement efficient upload (PUT) of a supplied directory with thousands of small (~1Kb) files - with maximised transfer speed and full saturation of the available network line.

Implementation should extend the wscmd command:

* upload a file:                                                              
   wscmd -i WS_ACCESS_KEY -s WS_SECRET_KEY -a put -n mybucket                  
   -f image.jpg -p folder/image.jpg -v  

Instead of the 'image.jpg' could be supplied a directory, in such case a recursive walk will upload all files within, using multiple asynchronous managers (as demonstrated in the wsperf.cpp code).

Run tests comparing performance of such a solution with the https://github.com/klokan/s3-parallel-put

klokan commented 11 years ago

We may need to parallelise also deleting of files in a bucket, because this must be as fast as possible too.

xrosecky commented 11 years ago

There is an operation in Amazon S3 REST API, which can delete multiple objects (up to 1000) using a single HTTP request:

http://docs.aws.amazon.com/AmazonS3/latest/API/multiobjectdeleteapi.html

This service is not currently implemented by webstor, but I can add it.

klokan commented 11 years ago

Let's first finish well the uploading part please... delete can be added later on.

klokan commented 11 years ago

Related to performance: http://www.oblaksoft.com/high-performance-amazon-s3-api-webstor/ They make the tests with wsperf and 1MB file (I think). We may need even more threads for smaller files? I don't know what happens if you have too many threads on a slow upload line, probably nothing bad, so we can set the default pretty high probably? 64 or more?

Do you have timeframe for making these tests, Vasek?

xrosecky commented 11 years ago

I raised the limit on maximal number of connections to 256 in this commit:

https://github.com/klokan/webstor/commits/testbcba2cf6e5956302f7c5c6c5604dc6ab69a1ca3b

Preliminary tests on grandcanyon collection on 100 MBit upload line are:

R time 4 3:55 8 1:44 16 0:59 32 0:28 64 0:19

I will do more testing tomorrow using standard deviation and confidence interval (there is a big difference in subsequent runs).

xrosecky commented 11 years ago

Webstor uses asynchronous I/O, so you don't need a lot of threads:

http://en.wikipedia.org/wiki/Asynchronous_I/O

klokan commented 11 years ago

Please implement also the directory walk under Microsoft Windows.

klokan commented 11 years ago

Please made also the delAll in a more efficient way. THX.

klokan commented 11 years ago

I expect this ticket is closed with 208e5ed038090f89d1d20238bed157868885019f. Correct Vasek?

klokan commented 11 years ago

Vasek, could you please update documentation in README in master and close this ticket, if everything is finished on this work?