s3tools / s3cmd

Official s3cmd repo -- Command line tool for managing S3 compatible storage services (including Amazon S3 and CloudFront).
https://s3tools.org/s3cmd
GNU General Public License v2.0
4.59k stars 905 forks source link

Sync memory consumption #206

Open yuvadm opened 11 years ago

yuvadm commented 11 years ago

I'm trying to sync a pretty large directory structure (~100GB and ~500K files, in approximately 3-4 hierarchy levels) using the standard command:

$ s3cmd sync /dir/to/sync s3://bucket-name

But the memory usage is very high, and the s3cmd process is killed after a while by the kernel (out of memory etc... in dmesg).

Any thoughts on which workarounds I can use to manage to sync this directory?

mdomsch commented 11 years ago

The only workaround I know of is to sync subsets of the tree separately.

I know of no other workaround for a kernel killing OOM situation at this time except to add RAM.

If you have a 32-bit python process, it will often trigger a python MemoryError (exception) when it runs out of 32-bit address space. This can be worked around by using a 64-bit python process, on a machine with enough RAM (more than 2x the RAM you had on the 32-bit machine, because python's memory handling is so poor in this regard). I'd recommend >8GB, 12-16GB may be sufficient, for this process to succeed.

Thanks, Matt

On Mon, Jul 29, 2013 at 8:18 AM, Yuval Adam notifications@github.comwrote:

I'm trying to sync a pretty large directory structure (~100GB and ~500K files, in approximately 3-4 hierarchy levels) using the standard command:

$ s3cmd sync /dir/to/sync s3://bucket-name

But the memory usage is very high, and the s3cmd process is killed after a while by the kernel (out of memory etc... in dmesg).

Any thoughts on which workarounds I can use to manage to sync this directory?

— Reply to this email directly or view it on GitHubhttps://github.com/s3tools/s3cmd/issues/206 .