xiongxu / s3fs

Automatically exported from code.google.com/p/s3fs
GNU General Public License v2.0
0 stars 0 forks source link

leak somewhere #104

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It seems there is a leak somewhere when using 'https://' it consume little bit 
by little bit. Situation: listing a lot of small files.

What steps will reproduce the problem?
1. Mount a bucket with "https://" (url=https://s3.amazonaws.com)
2. Copy something with a lot of small files
3. Use this mount point (eg., "find .")

What version of the product are you using? On what operating system?
s3fs 1.0
fuse 2.8.5
libcurl 7.19.7-15.16.amzn1
openssl 1.0.0a-1.2.amzn1
Amazon linux (ami) 2.6.34.7-56.40.amzn1.x86_64

Please provide any additional information below.

I tried command line and fstab - no difference. Also I tried to turn on and off 
"use_cache" - no difference too.

Command line: 
s3fs bucket -o allow_other -o use_cache=/tmp -o url=https://s3.amazonaws.com 
/mnt/acmcms/

fstab entry:
s3fs#bucket /mnt/bucket fuse 
allow_other,use_cache=/tmp,url=https://s3.amazonaws.com 0 0

Original issue reported on code.google.com by telem...@gmail.com on 27 Oct 2010 at 4:09

GoogleCodeExporter commented 9 years ago
Hello Telememo,

Thank you for filing a bug report :) Can you also describe the memory behavior 
you are seeing? You provided a helpful list of reproduction steps, but not the 
actual/expected outcomes. I'm curious how much memory use grew if the files are 
"small".

Thanks!

Original comment by apetresc on 27 Oct 2010 at 4:24

GoogleCodeExporter commented 9 years ago
I don't know how to do it better, so I use "ps axu" data.
Just after I mount - 6356
'ls ./folder | wc -w' 23 files - 11520
same again - 12144
same again - 12260
same again - 13104
'for ITEM in $(seq 5); do ls features | wc -w; done' - 22348
same again - 23652
same again - 25216
'for ITEM in $(seq 50); do ls features | wc -w; done' - 40360
same again - 55892
same again - 70784

after that I mentioned that speed became really slow, and I check logs:
Oct 27 05:00:05 ip-*removed* s3fs: ###35 SSL connect error
Oct 27 05:00:05 ip-*removed* s3fs: ###28 Timeout was reached
Oct 27 05:00:05 ip-*removed* last message repeated 11 times
but "ls ./folder" still working

after another 50 - 86380
and another 50 - 102028

Original comment by telem...@gmail.com on 27 Oct 2010 at 5:06

GoogleCodeExporter commented 9 years ago
upper comment was with
with "use_cache"
but I believe that it's the same without, I can try later

Original comment by telem...@gmail.com on 27 Oct 2010 at 5:08

GoogleCodeExporter commented 9 years ago
Thanks a lot. I will take a look as well.

Original comment by apetresc on 27 Oct 2010 at 5:12

GoogleCodeExporter commented 9 years ago
In addition I made "valgrind" report, just one memory tool that I found.
Maybe it would be helpful, I don't really understand what is there.
http://pastebin.com/gYQQcTTq

Original comment by telem...@gmail.com on 27 Oct 2010 at 5:51

GoogleCodeExporter commented 9 years ago
That's actually perfect. I will be taking a look at this within the next day or 
two.

Original comment by apetresc on 27 Oct 2010 at 5:57

GoogleCodeExporter commented 9 years ago
Here's some output of valgrind:

==1449== 1,408 bytes in 11 blocks are definitely lost in loss record 500 of 509
==1449==    at 0x40243B0: operator new[](unsigned int) (vg_replace_malloc.c:299)
==1449==    by 0x80507D8: my_curl_easy_perform(void*, _IO_FILE*) (s3fs.cpp:309)
==1449==    by 0x805729C: s3fs_getattr(char const*, stat*) (s3fs.cpp:910)
==1449==    by 0x404218E: fuse_fs_getattr (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404391E: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x4045A1B: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404D6F0: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404BFB9: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404E8F5: fuse_session_process (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x4049E32: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x448B954: start_thread (pthread_create.c:300)
==1449==    by 0x456AE7D: clone (clone.S:130)
==1449== 
==1449== 2,432 bytes in 19 blocks are definitely lost in loss record 504 of 509
==1449==    at 0x40243B0: operator new[](unsigned int) (vg_replace_malloc.c:299)
==1449==    by 0x80507D8: my_curl_easy_perform(void*, _IO_FILE*) (s3fs.cpp:309)
==1449==    by 0x805803A: s3fs_readdir(char const*, void*, int (*)(void*, char 
const*, stat const*, long long), long long, fuse_file_info*) (s3fs.cpp:1512)
==1449==    by 0x404125B: fuse_fs_readdir (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x4045E95: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404C816: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404BFB9: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x404E8F5: fuse_session_process (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x4049E32: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449==    by 0x448B954: start_thread (pthread_create.c:300)
==1449==    by 0x456AE7D: clone (clone.S:130)

This points to a "new" that does not have a corresponding "delete" in these 
lines of code:

static int my_curl_easy_perform(CURL* curl, FILE* f = 0) {
  char* url = new char[128];
  curl_easy_getinfo(curl, CURLINFO_EFFECTIVE_URL , &url);
  if(debug) syslog(LOG_DEBUG, "connecting to URL %s", url);
  ...

"url"  is only used in these first few lines and not the rest of the function.  
I've tried and tried 
different things to try to address this and either I get a compile errors or it 
won't make it through
"make check" -- I'm baffled.

valgrind points to more issues, but this one seemed to be low hanging fruit

Original comment by dmoore4...@gmail.com on 10 Dec 2010 at 9:59

GoogleCodeExporter commented 9 years ago
I can confirm the memory leak here on centos 5 curl-7.15.5-9.el5.

I am transferring a large number of small files (a few K to a few meg).  Now 
that r279 doesn't exit hourly on errors (Thanks Dan!) s3fs runs until it 
crashes with error like:

Dec 12 12:44:20 xxxx out of memory [30275]
Dec 12 12:46:42 xxxx last message repeated 10 times
Dec 12 12:46:42 xxxx last message repeated 6 times
Dec 12 12:46:48 xxxx s3fs: init $Rev: 279 $

This happens after about 150,000 files.

Original comment by stephent...@gmail.com on 13 Dec 2010 at 6:43

GoogleCodeExporter commented 9 years ago
Changes in r281 took care of several memory leaks. In a small test only two 
leaks remain, as reported by valgrind:

==17569== HEAP SUMMARY:
==17569==     in use at exit: 321,230 bytes in 2,516 blocks
==17569==   total heap usage: 76,918 allocs, 74,402 frees, 5,223,573 bytes 
allocated
==17569== 
==17569== 55 (16 direct, 39 indirect) bytes in 1 blocks are definitely lost in 
loss record 415 of 490
==17569==    at 0x4024046: realloc (vg_replace_malloc.c:525)
==17569==    by 0x404DE73: fuse_opt_add_arg (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x404DF8F: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x404E6EE: fuse_opt_parse (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x8053FF1: main (s3fs.cpp:2945)
==17569== 
==17569== 285,197 (103,608 direct, 181,589 indirect) bytes in 3 blocks are 
definitely lost in loss record 490 of 490
==17569==    at 0x402328F: calloc (vg_replace_malloc.c:467)
==17569==    by 0x408DAA5: Curl_open (in /usr/lib/libcurl.so.4.2.0)
==17569==    by 0x40A0B92: curl_easy_init (in /usr/lib/libcurl.so.4.2.0)
==17569==    by 0x804CE3C: alloc_curl_handle() (s3fs.cpp:114)
==17569==    by 0x80585CF: s3fs_readdir(char const*, void*, int (*)(void*, char 
const*, stat const*, long long), long long, fuse_file_info*) (s3fs.cpp:1734)
==17569==    by 0x404125B: fuse_fs_readdir (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x4045E95: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x404C816: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x404BFB9: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x404E8F5: fuse_session_process (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x4049E32: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569==    by 0x448B954: start_thread (pthread_create.c:300)
==17569== 
==17569== LEAK SUMMARY:
==17569==    definitely lost: 103,624 bytes in 4 blocks
==17569==    indirectly lost: 181,628 bytes in 80 blocks
==17569==      possibly lost: 0 bytes in 0 blocks
==17569==    still reachable: 35,978 bytes in 2,432 blocks
==17569==         suppressed: 0 bytes in 0 blocks
==17569== Reachable blocks (those to which a pointer was found) are not shown.
==17569== To see them, rerun with: --leak-check=full --show-reachable=yes
==17569== 
==17569== For counts of detected and suppressed errors, rerun with: -v
==17569== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 74 from 9)

The big leak is in the s3fs_readdir function. Apparently memory allocations 
done by the curl_multi stuff are not getting released.

Original comment by dmoore4...@gmail.com on 17 Dec 2010 at 7:46

GoogleCodeExporter commented 9 years ago
I've been working on this and have reduced the memory leaks (as reported by 
valgrind) to what I think may be a minimum:

==7397== HEAP SUMMARY:
==7397==     in use at exit: 82,714 bytes in 2,455 blocks
==7397==   total heap usage: 155,394 allocs, 152,939 frees, 38,214,087 bytes 
allocated
==7397== 
==7397== 116 (32 direct, 84 indirect) bytes in 1 blocks are definitely lost in 
loss record 414 of 474
==7397==    at 0x4C245E2: realloc (vg_replace_malloc.c:525)
==7397==    by 0x4E3E86D: fuse_opt_add_arg (in /usr/lib/libfuse.so.2.8.4)
==7397==    by 0x4E3F096: fuse_opt_parse (in /usr/lib/libfuse.so.2.8.4)
==7397==    by 0x40BFE7: main (s3fs.cpp:3239)
==7397== 
==7397== LEAK SUMMARY:
==7397==    definitely lost: 32 bytes in 1 blocks
==7397==    indirectly lost: 84 bytes in 3 blocks
==7397==      possibly lost: 0 bytes in 0 blocks
==7397==    still reachable: 82,598 bytes in 2,451 blocks
==7397==         suppressed: 0 bytes in 0 blocks
==7397== Reachable blocks (those to which a pointer was found) are not shown.
==7397== To see them, rerun with: --leak-check=full --show-reachable=yes
==7397== 
==7397== For counts of detected and suppressed errors, rerun with: -v
==7397== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)

This was after a small duration test that did a variety of file system 
operations. (previously this always returned more
than just this one).

Admittedly, I'm not a big fan of cpp and prefer lower language constructs.  In 
order to eliminate the leak, I had to understand what was going on behind the 
scenes.

The major culprit in the leaks was due to curl handles not being cleaned
up (curl_easy_cleanup).  And in particular, the multitude of curl
handles associated with a curl_multi_handle created during directory
reads.  There is a specific order needed to cleanup the
curl_multi_handles:

  - remove each curl_handles from its associated curl_multi_handle 
  - cleanup the individual curl_handle's
  - cleanup the individual curl_multi_handle's

Randy had the some of this, but not all, implemented.  He also was using
c++ memory structures and classes.  In order to fix this, I would need
to unravel this and understand it completely.  I ended up "duplicating"
this stuff using plain C structures and linked lists.  Yeah, more
overhead, but at least I understand it AND the memory leaks are
resolved.

Look for an upcoming release -- there's some more cleanup needed.

Original comment by dmoore4...@gmail.com on 18 Dec 2010 at 8:09

GoogleCodeExporter commented 9 years ago
Issue 84 has been merged into this issue.

Original comment by dmoore4...@gmail.com on 19 Dec 2010 at 1:25

GoogleCodeExporter commented 9 years ago
This issue was closed by revision r285.

Original comment by dmoore4...@gmail.com on 19 Dec 2010 at 1:34