peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.67k stars 237 forks source link

Very low throughput when uploading single file compared to using awscli #667

Open mboutet opened 1 year ago

mboutet commented 1 year ago

Uploading a single file of around 152M is significantly slower using s5cmd compared to using awscli. awscli is able to achieve a throughput of around ~55MiB/s whereas s5cmd is only able to reach ~4.4MiB/s. I tested with various concurrency settings (1, 5, 10, 25, 50) and always 1 worker (since it's a single file) and it makes close to no difference. I also tested with various file size: 36M, 152M, 545M, 2.6G, 6.9G and I can observe the same low throughput.

Here's a screenshot of a network capture I made comparing awscli (left) and s5cmd (right) using a concurrency setting of 5:

Screenshot 2023-09-22 at 9 44 41 AM

It seems like s5cmd is transferring the file into many smaller chunks instead of fewer bigger chunks like awscli is doing.

The command I'm using is:

s5cmd \
    --profile my_profile --numworkers=1 \
    --endpoint-url=https://mycephs3endpoint \
    cp --concurrency=5 --show-progress \
    "${temp_dir}/archive.tar.lz4" \
    "s3://${bucket_name}/test-mboutet/${key}/archive.tar.lz4"

Versions:

❯ aws --version
aws-cli/2.11.5 Python/3.11.2 Linux/5.4.0-163-generic exe/x86_64.ubuntu.20 prompt/off

❯ s5cmd version
v2.2.2-48f7e59

I'm using Ceph S3 and I'm able to reproduce the issue when running the same upload command on other servers.

denizsurmeli commented 1 year ago

Hi, there is a flag part-size for the cp command. You can adjust the chunk size as you wish.

mboutet commented 1 year ago

@denizsurmeli, unfortunately --part-size didn't help.

I tested with all the combinations of the following parameters:

concurrency = 25, part_size = 10 gave the best throughput (around 20 MB/s), while most of the other combinations yield throughputs of 2-5 MB/s. 20 MB/s is still way below what awscli is able to do. For small objects less than around 20MB, s5cmd wins, but it's just because it has no overhead at startup whereas awscli has around 6-7s overhead before it actually starts to do something.

kucukaslan commented 11 months ago

Just for reference:

The problem seems to be related to #418

At the time I tried to tackle it but I couldn't:(

I made a few attempts to optimize write requests to achieve increase throughput without using the storage optimized instances. But I couldn't find a viable solution.

https://github.com/peak/s5cmd/issues/418#issuecomment-1249494581

see also https://github.com/peak/s5cmd/issues/418#issuecomment-1208262659_