s3tools / s3cmd

Official s3cmd repo -- Command line tool for managing S3 compatible storage services (including Amazon S3 and CloudFront).
https://s3tools.org/s3cmd
GNU General Public License v2.0
4.58k stars 905 forks source link

s3cmd leaks memory trying to sync large files #740

Open julian1 opened 8 years ago

julian1 commented 8 years ago

Uses 93% of 16GB ram, attempting to sync 2 large files.

Swap: 15629308k total,  9556472k used,  6072836k free,    26660k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                           
10135 root      20   0 17336 1372  984 R    1  0.0   0:00.08 top                                                
32195 root      20   0 23.5g  14g  920 D    1 92.8  59:19.44 s3cmd                                              
    1 root      20   0 24464  752   96 S    0  0.0   0:02.08 init                                               
    2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd     
# invoked with,
$ s3cmd -v sync xxx s3://mybucket/
...

$ ls -lh $( find  xxx -type f ) 
-rw-r--r-- 1 root root 22G May 10 12:22 xxx/yyy.tgz
-rw-r--r-- 1 root root 20G May 10 13:34 xxx/zzz.tgz

$ s3cmd --version
s3cmd version 1.6.0

$ uname -a
Linux ppppp 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I searched and found other issues raised in regards to memory use - but the context seems to be that it's the large key-listing size that's the problem.

Given that there are only two files here - I think it may be a separate issue.

mdomsch commented 8 years ago

That's odd. We only ever read files in small(ish) chunks. I wonder if the garbage collector isn't getting run frequently enough. I assume you didn't change the multipart chunk size to be enormous...

On Tue, May 10, 2016 at 7:46 PM, julian1 notifications@github.com wrote:

Uses 93% of 16GB ram, attempting to sync 2 large files.

Swap: 15629308k total, 9556472k used, 6072836k free, 26660k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10135 root 20 0 17336 1372 984 R 1 0.0 0:00.08 top 32195 root 20 0 23.5g 14g 920 D 1 92.8 59:19.44 s3cmd 1 root 20 0 24464 752 96 S 0 0.0 0:02.08 init 2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd

$ s3cmd -v sync xxx s3://mybucket/ ...

$ ls -lh $( find xxx -type f ) -rw-r--r-- 1 root root 22G May 10 12:22 xxx/yyy.tgz -rw-r--r-- 1 root root 20G May 10 13:34 xxx/zzz.tgz

$ s3cmd --version s3cmd version 1.6.0

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/s3tools/s3cmd/issues/740

julian1 commented 8 years ago

This is the ~/.s3cfg. recv and send chunk size is set to 4096. I don't know if that's bytes or MB.

[default]
access_key = AAAAAAAAAAAAA
bucket_location = US
cloudfront_host = cloudfront.amazonaws.com
cloudfront_resource = /dddddddd/distribution
default_mime_type = binary/octet-stream
delete_removed = False
dry_run = False
encoding = ANSI_X3.4-1968
encrypt = False
follow_symlinks = False
force = False
get_continue = False
gpg_command = /usr/bin/gpg
gpg_decrypt = %(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_encrypt = %(gpg_command)s -c --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s
gpg_passphrase =
guess_mime_type = True
host_base = s3-nnnnnnnnnn.amazonaws.com
host_bucket = %(bucket)s.nnnnnnnnnnn.amazonaws.com
human_readable_sizes = False
list_md5 = False
log_target_prefix =
preserve_attrs = True
progress_meter = True
proxy_host =
proxy_port = 0
recursive = False
recv_chunk = 4096
reduced_redundancy = False
secret_key = SSSSSSSSSSSSSSS
send_chunk = 4096
simpledb_host = sdb.amazonaws.com
skip_existing = False
socket_timeout = 100
urlencoding_mode = normal
use_https = False
verbosity = WARNING