niuys / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Does tcmalloc use recursion? (Stack overflow with fragmented memory suspected) #151

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is more of a question that a defect report, however I'm stumped and
would greatly appreciate some feedback.

I am running a multi threaded (pthreads) program under Linux with hundreds
of threads, the stack size I'm choosing is 1.5MB. Without tcmalloc the
program runs fine, but slows dramatically after a while due I think to
memory frangmentation and contention between threads and the
malloc/realloc/free calls.

So, I tried tcmalloc and the program runs much faster, however, after a
while, instead of slowing down like with the libc's standard
malloc/realloc/free, it sometimes crashes with random stack traces and in
random places. By the way, if I increase the stack size to 3MB I can't run
as many threads (due to memory limitations) but the crashing happens less
often.

I ran it through valgrind for days (without tcmalloc) and see no memory
handling errors, so I'm strongly suspecting stack overflow
(stack-smashing?) causing memory corruption.

So, if the program has no errors in valgrind (memcheck), and runs fine (All
be it slower) with libc's malloc/free, then why introducing tcmalloc does
it randomly crash?

I suspect that because the memory is so fragmented, and there are so many
calls to malloc/free/realloc in my program, that after a while tcmalloc
requires a lot of stack space to operate, smashes the stack on one thread
and causes memory corruption resulting in the random crashes.

Do you have any suggestions here? How can I detect if tcmalloc overflowing
the stack?

Regards, Dan...

Original issue reported on code.google.com by donavanb...@gmail.com on 26 Jun 2009 at 8:10

GoogleCodeExporter commented 9 years ago
Hmm, this is a tricky one.  tcmalloc doesn't recurse, and doesn't provide any
inlineable API in normal usage, so I don't think tcmalloc is contributing to the
stack overflow, if indeed that's what's happening here.  However, it's entirely
plausible that tcmalloc would handle stack overflow differently than libc, and
corruptions that are a big problem in one implementation may be a smaller in 
another
(and vice versa).

However, it would be strange that a stack overflow would cause a program to 
slow down
but not crash, so it may be something else going on entirely.

I think there are tools to help detect stack overflow, some built into gcc (if 
you're
using that): -fstack-protector-all, and also a flag that will warn at compile 
time if
the stack is too big (I don't know how it handles alloca and equivalent).  See, 
e.g.,
http://www.linuxfromscratch.org/hints/downloads/files/ssp.txt
You may also want to check out mudflap:
   http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging

Sounds like a very frustrating problem. :-(  Good luck tracking it down!

Original comment by csilv...@gmail.com on 26 Jun 2009 at 3:25

GoogleCodeExporter commented 9 years ago
Hi,

I think I've tracked it down. The problem was in a seemingly completely 
unrelated
external dynamically linked library. First of all, I found a case where the heap
corruption was repeatable, I then started taking bits of code out 
trial-and-error
style to see what was causing the problem.

It turns out that linking with *name deleted to save embarresment*'s anti-virus 
API,
caused heap corruption in such a way that later down the line either glibc's
malloc/free or tcmalloc's crashed a burnt.

There must be something in this libraries init() function causing memory 
(probably in
the heap) corruption showhow, as I don't even need to call any of this API's
functions, just link with it at compile time and obviously include function 
calls in
the code to make it actually link properly.

Sorry for the red herring, and keep up the good work!

Regards, Dan....

Original comment by donavanb...@gmail.com on 3 Jul 2009 at 12:33

GoogleCodeExporter commented 9 years ago
Great, I'm glad you managed to figure out what's going on!

Original comment by csilv...@gmail.com on 6 Jul 2009 at 5:30