Closed GoogleCodeExporter closed 8 years ago
Hello Telememo,
Thank you for filing a bug report :) Can you also describe the memory behavior
you are seeing? You provided a helpful list of reproduction steps, but not the
actual/expected outcomes. I'm curious how much memory use grew if the files are
"small".
Thanks!
Original comment by apetresc
on 27 Oct 2010 at 4:24
I don't know how to do it better, so I use "ps axu" data.
Just after I mount - 6356
'ls ./folder | wc -w' 23 files - 11520
same again - 12144
same again - 12260
same again - 13104
'for ITEM in $(seq 5); do ls features | wc -w; done' - 22348
same again - 23652
same again - 25216
'for ITEM in $(seq 50); do ls features | wc -w; done' - 40360
same again - 55892
same again - 70784
after that I mentioned that speed became really slow, and I check logs:
Oct 27 05:00:05 ip-*removed* s3fs: ###35 SSL connect error
Oct 27 05:00:05 ip-*removed* s3fs: ###28 Timeout was reached
Oct 27 05:00:05 ip-*removed* last message repeated 11 times
but "ls ./folder" still working
after another 50 - 86380
and another 50 - 102028
Original comment by telem...@gmail.com
on 27 Oct 2010 at 5:06
upper comment was with
with "use_cache"
but I believe that it's the same without, I can try later
Original comment by telem...@gmail.com
on 27 Oct 2010 at 5:08
Thanks a lot. I will take a look as well.
Original comment by apetresc
on 27 Oct 2010 at 5:12
In addition I made "valgrind" report, just one memory tool that I found.
Maybe it would be helpful, I don't really understand what is there.
http://pastebin.com/gYQQcTTq
Original comment by telem...@gmail.com
on 27 Oct 2010 at 5:51
That's actually perfect. I will be taking a look at this within the next day or
two.
Original comment by apetresc
on 27 Oct 2010 at 5:57
Here's some output of valgrind:
==1449== 1,408 bytes in 11 blocks are definitely lost in loss record 500 of 509
==1449== at 0x40243B0: operator new[](unsigned int) (vg_replace_malloc.c:299)
==1449== by 0x80507D8: my_curl_easy_perform(void*, _IO_FILE*) (s3fs.cpp:309)
==1449== by 0x805729C: s3fs_getattr(char const*, stat*) (s3fs.cpp:910)
==1449== by 0x404218E: fuse_fs_getattr (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404391E: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x4045A1B: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404D6F0: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404BFB9: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404E8F5: fuse_session_process (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x4049E32: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x448B954: start_thread (pthread_create.c:300)
==1449== by 0x456AE7D: clone (clone.S:130)
==1449==
==1449== 2,432 bytes in 19 blocks are definitely lost in loss record 504 of 509
==1449== at 0x40243B0: operator new[](unsigned int) (vg_replace_malloc.c:299)
==1449== by 0x80507D8: my_curl_easy_perform(void*, _IO_FILE*) (s3fs.cpp:309)
==1449== by 0x805803A: s3fs_readdir(char const*, void*, int (*)(void*, char
const*, stat const*, long long), long long, fuse_file_info*) (s3fs.cpp:1512)
==1449== by 0x404125B: fuse_fs_readdir (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x4045E95: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404C816: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404BFB9: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x404E8F5: fuse_session_process (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x4049E32: ??? (in /usr/lib/libfuse.so.2.8.4)
==1449== by 0x448B954: start_thread (pthread_create.c:300)
==1449== by 0x456AE7D: clone (clone.S:130)
This points to a "new" that does not have a corresponding "delete" in these
lines of code:
static int my_curl_easy_perform(CURL* curl, FILE* f = 0) {
char* url = new char[128];
curl_easy_getinfo(curl, CURLINFO_EFFECTIVE_URL , &url);
if(debug) syslog(LOG_DEBUG, "connecting to URL %s", url);
...
"url" is only used in these first few lines and not the rest of the function.
I've tried and tried
different things to try to address this and either I get a compile errors or it
won't make it through
"make check" -- I'm baffled.
valgrind points to more issues, but this one seemed to be low hanging fruit
Original comment by dmoore4...@gmail.com
on 10 Dec 2010 at 9:59
I can confirm the memory leak here on centos 5 curl-7.15.5-9.el5.
I am transferring a large number of small files (a few K to a few meg). Now
that r279 doesn't exit hourly on errors (Thanks Dan!) s3fs runs until it
crashes with error like:
Dec 12 12:44:20 xxxx out of memory [30275]
Dec 12 12:46:42 xxxx last message repeated 10 times
Dec 12 12:46:42 xxxx last message repeated 6 times
Dec 12 12:46:48 xxxx s3fs: init $Rev: 279 $
This happens after about 150,000 files.
Original comment by stephent...@gmail.com
on 13 Dec 2010 at 6:43
Changes in r281 took care of several memory leaks. In a small test only two
leaks remain, as reported by valgrind:
==17569== HEAP SUMMARY:
==17569== in use at exit: 321,230 bytes in 2,516 blocks
==17569== total heap usage: 76,918 allocs, 74,402 frees, 5,223,573 bytes
allocated
==17569==
==17569== 55 (16 direct, 39 indirect) bytes in 1 blocks are definitely lost in
loss record 415 of 490
==17569== at 0x4024046: realloc (vg_replace_malloc.c:525)
==17569== by 0x404DE73: fuse_opt_add_arg (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x404DF8F: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x404E6EE: fuse_opt_parse (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x8053FF1: main (s3fs.cpp:2945)
==17569==
==17569== 285,197 (103,608 direct, 181,589 indirect) bytes in 3 blocks are
definitely lost in loss record 490 of 490
==17569== at 0x402328F: calloc (vg_replace_malloc.c:467)
==17569== by 0x408DAA5: Curl_open (in /usr/lib/libcurl.so.4.2.0)
==17569== by 0x40A0B92: curl_easy_init (in /usr/lib/libcurl.so.4.2.0)
==17569== by 0x804CE3C: alloc_curl_handle() (s3fs.cpp:114)
==17569== by 0x80585CF: s3fs_readdir(char const*, void*, int (*)(void*, char
const*, stat const*, long long), long long, fuse_file_info*) (s3fs.cpp:1734)
==17569== by 0x404125B: fuse_fs_readdir (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x4045E95: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x404C816: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x404BFB9: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x404E8F5: fuse_session_process (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x4049E32: ??? (in /usr/lib/libfuse.so.2.8.4)
==17569== by 0x448B954: start_thread (pthread_create.c:300)
==17569==
==17569== LEAK SUMMARY:
==17569== definitely lost: 103,624 bytes in 4 blocks
==17569== indirectly lost: 181,628 bytes in 80 blocks
==17569== possibly lost: 0 bytes in 0 blocks
==17569== still reachable: 35,978 bytes in 2,432 blocks
==17569== suppressed: 0 bytes in 0 blocks
==17569== Reachable blocks (those to which a pointer was found) are not shown.
==17569== To see them, rerun with: --leak-check=full --show-reachable=yes
==17569==
==17569== For counts of detected and suppressed errors, rerun with: -v
==17569== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 74 from 9)
The big leak is in the s3fs_readdir function. Apparently memory allocations
done by the curl_multi stuff are not getting released.
Original comment by dmoore4...@gmail.com
on 17 Dec 2010 at 7:46
I've been working on this and have reduced the memory leaks (as reported by
valgrind) to what I think may be a minimum:
==7397== HEAP SUMMARY:
==7397== in use at exit: 82,714 bytes in 2,455 blocks
==7397== total heap usage: 155,394 allocs, 152,939 frees, 38,214,087 bytes
allocated
==7397==
==7397== 116 (32 direct, 84 indirect) bytes in 1 blocks are definitely lost in
loss record 414 of 474
==7397== at 0x4C245E2: realloc (vg_replace_malloc.c:525)
==7397== by 0x4E3E86D: fuse_opt_add_arg (in /usr/lib/libfuse.so.2.8.4)
==7397== by 0x4E3F096: fuse_opt_parse (in /usr/lib/libfuse.so.2.8.4)
==7397== by 0x40BFE7: main (s3fs.cpp:3239)
==7397==
==7397== LEAK SUMMARY:
==7397== definitely lost: 32 bytes in 1 blocks
==7397== indirectly lost: 84 bytes in 3 blocks
==7397== possibly lost: 0 bytes in 0 blocks
==7397== still reachable: 82,598 bytes in 2,451 blocks
==7397== suppressed: 0 bytes in 0 blocks
==7397== Reachable blocks (those to which a pointer was found) are not shown.
==7397== To see them, rerun with: --leak-check=full --show-reachable=yes
==7397==
==7397== For counts of detected and suppressed errors, rerun with: -v
==7397== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
This was after a small duration test that did a variety of file system
operations. (previously this always returned more
than just this one).
Admittedly, I'm not a big fan of cpp and prefer lower language constructs. In
order to eliminate the leak, I had to understand what was going on behind the
scenes.
The major culprit in the leaks was due to curl handles not being cleaned
up (curl_easy_cleanup). And in particular, the multitude of curl
handles associated with a curl_multi_handle created during directory
reads. There is a specific order needed to cleanup the
curl_multi_handles:
- remove each curl_handles from its associated curl_multi_handle
- cleanup the individual curl_handle's
- cleanup the individual curl_multi_handle's
Randy had the some of this, but not all, implemented. He also was using
c++ memory structures and classes. In order to fix this, I would need
to unravel this and understand it completely. I ended up "duplicating"
this stuff using plain C structures and linked lists. Yeah, more
overhead, but at least I understand it AND the memory leaks are
resolved.
Look for an upcoming release -- there's some more cleanup needed.
Original comment by dmoore4...@gmail.com
on 18 Dec 2010 at 8:09
Issue 84 has been merged into this issue.
Original comment by dmoore4...@gmail.com
on 19 Dec 2010 at 1:25
This issue was closed by revision r285.
Original comment by dmoore4...@gmail.com
on 19 Dec 2010 at 1:34
Original issue reported on code.google.com by
telem...@gmail.com
on 27 Oct 2010 at 4:09