wdv4758h / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[freebsd] Process size rapidly grows on multi-threaded process with high memory turnaround #539

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I have a process with 200 threads on FreeBSD-9.1. Each of the threads is 
actively working with memory. They allocate/deallocate a lot, but total 
actually allocated memory stays low.

Process size pretty rapidly grows to some insane value, like 750GB, while total 
allocated size stays at only 500MB. I am confident that allocated size is low 
because the same perftools library shows this number in heap profiles, and also 
because memory stays low with the native FreeBSD allocator.

With native FreeBSD allocator, memory also grows but stays at ~2.5GB. There is 
an option there to reduce the garbage collection interval (ex. 
MALLOC_OPTIONS=3g). In such case, 2.5GB shrinks to ~1GB with some performance 
penalty.

I suspect that this case fails with perftools for the similar reason: maybe 
garbage collection isn't catching up with the multi-threaded process? In this 
case, default for garbage collection frequency in perftools isn't reasonable.

google-perftools-2.0

Original issue reported on code.google.com by yuriv...@gmail.com on 10 Jun 2013 at 1:48

GoogleCodeExporter commented 9 years ago
It would be really nice if you could post some test program

Original comment by alkondratenko on 6 Jul 2013 at 11:38

GoogleCodeExporter commented 9 years ago

Original comment by alkondratenko on 14 Jul 2013 at 3:10

GoogleCodeExporter commented 9 years ago
I tried to reproduce this with a simple test program, but I couldn't. Looks 
like some special unknown pattern of use causes the issue. This happens on a 
large closed-source production app.

So, through API, I am getting very low allocated memory size, but process grows 
to some insane size.

Is there a way I can at some point print the detailed memory use as seen by 
perftools?
What I mean is this: perftools allocates memory either by moving the heap 
limit, or by mmapping. Apparently, some mmaps become nearly empty after a 
while. I would like to get the list of all such blocks and what does perftools 
see as usage number in each of them.

Any way to get such info? Any other suggestion on how to troubleshoot this?

Original comment by yuriv...@gmail.com on 14 Jul 2013 at 7:47

GoogleCodeExporter commented 9 years ago
Have you looked at malloc_extension.h

My guess is that tcmalloc quite brave design with purely per-thread caches can 
indeed make some workloads waste tons of memory in thread caches. I'm thinking 
about threads that may go idle for large periods of time.

There's in fact two methods in malloc_extension.h that may help in that case. 
And a ton of diagnostic methods too.

Original comment by alkondratenko on 13 Sep 2013 at 7:42