rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
46.77k stars 4.18k forks source link

S3 (Minio) default parameters to improve performance #4255

Open darkdragon-001 opened 4 years ago

darkdragon-001 commented 4 years ago

What is your current rclone version (output from rclone version)?

rclone v1.50.2
- os/arch: linux/amd64
- go version: go1.13.6

What problem are you are trying to solve?

Good default performance.

How do you think rclone should be changed to solve that?

Calculate default values for --s3-chunk-size automatically based on file size. Minio client (mc) uses default of 64M and does this auto calculation -> you should be able to find it in their source code.

ncw commented 4 years ago

Rclone will raise the chunk size automatically to stay within the 10,000 parts limit.

You can set --s3-chunk-size to 64MB and rclone will use that just fine.

Note that rclone buffers chunks in memory so we don't want them too big.

Rclone also sends --s3-concurrency chunks at once by default.

What do you think rclone should be doing?

darkdragon-001 commented 4 years ago

I am saying that the chunk size should be determined more intelligently than a constant. Take into account number of files, number of transfers, file size, remote types (local, S3, MinIO, ...), ...

ncw commented 4 years ago

Presumably the user would have some goal in mind?

All those are conflicting so the user would have to choose which one was their goal.

The defaults in rclone are middle of the road for each of these. Efficient without being too resource intensive.

Rclone has all the parameters for tuning available for the user to twiddle with.

darkdragon-001 commented 4 years ago

IMO the default options of aws cli and mc both are more in line with what I would expect to be default. Optimizing maximum transfer speed while maintaining enough data integrity with the available resources (memory, CPU).