Open mboutet opened 1 year ago
Hi, there is a flag part-size
for the cp
command. You can adjust the chunk size as you wish.
@denizsurmeli, unfortunately --part-size
didn't help.
I tested with all the combinations of the following parameters:
concurrency = 25, part_size = 10 gave the best throughput (around 20 MB/s), while most of the other combinations yield throughputs of 2-5 MB/s. 20 MB/s is still way below what awscli is able to do. For small objects less than around 20MB, s5cmd wins, but it's just because it has no overhead at startup whereas awscli has around 6-7s overhead before it actually starts to do something.
Just for reference:
The problem seems to be related to #418
At the time I tried to tackle it but I couldn't:(
I made a few attempts to optimize write requests to achieve increase throughput without using the storage optimized instances. But I couldn't find a viable solution.
https://github.com/peak/s5cmd/issues/418#issuecomment-1249494581
see also https://github.com/peak/s5cmd/issues/418#issuecomment-1208262659_
Uploading a single file of around 152M is significantly slower using s5cmd compared to using awscli. awscli is able to achieve a throughput of around ~55MiB/s whereas s5cmd is only able to reach ~4.4MiB/s. I tested with various concurrency settings (1, 5, 10, 25, 50) and always 1 worker (since it's a single file) and it makes close to no difference. I also tested with various file size: 36M, 152M, 545M, 2.6G, 6.9G and I can observe the same low throughput.
Here's a screenshot of a network capture I made comparing awscli (left) and s5cmd (right) using a concurrency setting of 5:
It seems like s5cmd is transferring the file into many smaller chunks instead of fewer bigger chunks like awscli is doing.
The command I'm using is:
Versions:
I'm using Ceph S3 and I'm able to reproduce the issue when running the same upload command on other servers.