pprof results are low compared with top

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Profile a program (with HEAP_PROFILE_MMAP=true)
2. Compare results with /usr/bin/top
3.

What is the expected output? What do you see instead?
The inuse space reported using pprof is about half of that in the DATA 
reported by top.

What version of the product are you using? On what operating system?
1.3 and 1.4. 
2.6.24-etchnhalf.1-amd64 #1 SMP Sat Aug 15 20:38:41 UTC 2009 x86_64 GNU/
Linux
export CFLAGS="-m32 -g"
export CXXFLAGS="-m32 -g"
configure --enable-static --enable-frame-pointers --host=i686

Please provide any additional information below.
Top was configured to show extra columns for SWAP, CODE and DATA. Added 
columns with f  p r s <enter> and then W to write to .toprc. 

Some disparity is expected. 
1. TCmalloc does not report usage by stack, initialized data and BSS.
2. TCMalloc has overheads of a few MB 
3. TCMalloc reports bytes allocated, top will see these rounded up to 
pages 
4. TCMalloc does not return freed memory

I'm requesting an enhancement. Add counters for low level calls from 
TCMalloc to sbrk, mmap, etc. which can then be reported along with the 
program totals.  These new counter results should tally better with top 
and provide more confidence that TCMalloc is logging all program usage.

Original issue reported on code.google.com by pet...@gmail.com on 29 Oct 2009 at 5:01

GoogleCodeExporter commented 9 years ago

In our experience, (4) is often the reason for discrepancies with top. 
Occassionally, you'll also see discrepancies due to internal fragmentation 
within
tcmalloc.

The way to diagnose these is to emit the output of
MallocExtension::instance()->GetStats() (from malloc_extension.h).  This will 
return
a lot of details about memory use, and should cover all the cases you're 
suggesting
counters for.  In any case, give it a try and see how well it works for you.

Original comment by csilv...@gmail.com on 3 Nov 2009 at 3:30

GoogleCodeExporter commented 9 years ago

Original comment by csilv...@gmail.com on 3 Nov 2009 at 3:30

Added labels: Type-Defect, Priority-Medium

GoogleCodeExporter commented 9 years ago

Reviving this thread, as this is still happening and we're trying to
collect more insight.  Version being used is 1.4 release.
Line from 'top':

  PID USER  PR  NI  VIRT  RES S %CPU %MEM   TIME+  P nFLT COMMAND
 5681 root      15   0 8938m 7.7g S  233 12.2   2008:09 7    0 adindexer

Final part of the  summary from 'getStats()' follows; note that the
full 'getStats()' output is attached.

>255   large *     52 spans ~  869.2 MB; 1322.9 MB cum; unmapped:  863.6 MB;  
875.4 MB cum
------------------------------------------------
DevMemSysAllocator: failed_=0
SbrkSysAllocator: failed_=1
MmapSysAllocator: failed_=0
------------------------------------------------
MALLOC:   6282067968 ( 5991.0 MB) Heap size
MALLOC:   3685432760 ( 3514.7 MB) Bytes in use by application
MALLOC:   1387171840 ( 1322.9 MB) Bytes free in page heap
MALLOC:   1195182432 ( 1139.8 MB) Bytes free in central cache
MALLOC:       514816 (    0.5 MB) Bytes free in transfer cache
MALLOC:     13766120 (   13.1 MB) Bytes free in thread caches
MALLOC:      1085167              Spans in use
MALLOC:           20              Thread heaps in use
MALLOC:     75366400 (   71.9 MB) Metadata allocated
------------------------------------------------

As you can see, about 2GB of RSS are completely "missing".  Together
with the page heap and the central cache, we're using about 7.9GB of
RSS for a 3.5GB total heap -- a little wasteful :)

I'll be collecting a lot more data and updating the thread, but I'd
love any pointers or guesses.

Current suspicions:
-  I'm curious if the 'failed_ = 1' for the SbrkSysAllocator is normal
- The PageHeap summary line is also suspicious:

        >255   large *     52 spans ~  869.2 MB; 1322.9 MB cum;
                                   unmapped:  863.6 MB;  875.4 MB cum
  Perhaps the 'unmapped' stuff isn't showing up in the final heap
size, and didn't actually get unmapped properly?  I suspect
  that the 'madvise()' call by TCMalloc isn't unmapping properly on our system.

Original comment by mrab...@gmail.com on 22 Jan 2010 at 1:07

Attachments:

adindexer.mem.stats

GoogleCodeExporter commented 9 years ago

FYI, this is preventing the usage of TCMalloc in production in 90% of our cases 
(almost 
every long-running server is affected by this same issue).  We really want to 
use it 
though :)

Original comment by mrab...@gmail.com on 22 Jan 2010 at 1:11

GoogleCodeExporter commented 9 years ago

I'll ask some experts here if they have time to take a look at this data and 
see if
they can figure out what's going on.

Original comment by csilv...@gmail.com on 3 Feb 2010 at 11:20

GoogleCodeExporter commented 9 years ago

I'm also doing more experimentation and am having trouble reproducing it in a 
tiny 
binary that I can ship over, but it happens in every single one of our complex 
servers.

I'm going to do a test with tcmalloc_minimal to see if it has something to do 
with 
collecting samples and stacks.  Out of curiosity, the heap sample data -- where 
is it 
stored?  Does it show up anywhere in this MallocInstance::getStats() report?  
if so, in 
what field?

Original comment by mrab...@gmail.com on 3 Feb 2010 at 11:42

GoogleCodeExporter commented 9 years ago

The heap sample data does not show up in getStats, as far as I know.  And I 
believe
-- don't quote me on this -- it's stored in memory.

Original comment by csilv...@gmail.com on 4 Feb 2010 at 12:24

GoogleCodeExporter commented 9 years ago

Any update from the experts?  Please let me know if there's any other data or 
statistics  
I can capture and provide, because we can easily reproduce this situation every 
single 
time on our servers.

Original comment by mrab...@gmail.com on 8 Feb 2010 at 7:19

GoogleCodeExporter commented 9 years ago

No, no word from anyone (I was hoping they'd respond directly to this bug 
report).

What happens if you call MallocExtension::instance()->ReleaseFreeMemory() right
before (and/or after) dumping the stats?  Do the numbers change?  Is it 
possible that
the issue is with your kernel not recognizing MADV_DONTNEED?

Original comment by csilv...@gmail.com on 8 Feb 2010 at 11:24

GoogleCodeExporter commented 9 years ago

OK, we recently discovered a problem with accounting that may explain some of 
the 
results you're seeing here.  Basically, some parts of the accounting system are 
using 
the bytes requested, rather than the actual bytes allocated.

I don't really understand when this happens, well enough to know if it might 
explain 
the problem you're seeing.  But it's easy to check: at the top of do_malloc (in 
tcmalloc.cc), add this code:
---
if (size <= kMaxSize) {
  size = Static::sizemap()->ByteSizeForClass(Static::sizemap()->SizeClass(size));
} else {
  size = (size + kPageSize - 1) & ~(kPageSize - 1);
}
---

Does this affect the accuracy of the stats you see?

We should have a more principled fix for this problem in the next release.

Original comment by csilv...@gmail.com on 7 Apr 2010 at 6:32

GoogleCodeExporter commented 9 years ago

I've submitted our fix for sampling error to svn.  Feel free to compile 
perftools from 
svn-head and see if that helps with the problems you've been seeing.  If you do 
that, 
let us know how it works out!

Original comment by csilv...@gmail.com on 12 Apr 2010 at 9:20

GoogleCodeExporter commented 9 years ago

Any more word on this?

Original comment by csilv...@gmail.com on 7 Jun 2010 at 10:47

GoogleCodeExporter commented 9 years ago

unfortunately, this has since dropped from my radar.  You can close for now.

Original comment by mrab...@gmail.com on 8 Jun 2010 at 12:47

GoogleCodeExporter commented 9 years ago

OK, hopefully the sampling fix was the issue.  If you ever get back to this, 
feel free to reopen the bug.

Original comment by csilv...@gmail.com on 8 Jun 2010 at 6:49

Changed state: CannotReproduce

GoogleCodeExporter commented 9 years ago

I see that the fix above is now in tcmalloc.cc

Anyone else having this problem? I see a huge discrepancy between top and 
results from GetStat ```MallocExtension::instance()->GetStats```

Original comment by kjell.he...@gmail.com on 28 Sep 2013 at 3:24

GoogleCodeExporter commented 9 years ago

My understanding is that ticket is about memory profiling, not GetStats() 
method of MallocExtension.

I suggest you to open new ticket with all details you have.

Original comment by alkondratenko on 29 Sep 2013 at 2:39

thegooglecodearchive / gperftools

pprof results are low compared with top #181