heapcheck locks up in pprof

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

Invoking like this, will provide addresses with no symbols, as expected, since 
gperftools doesn't look for pprof in the path for some reason:
env LD_PRELOAD=libtcmalloc.so HEAPCHECK=normal ./my_app

Invoking like this, will hang on program termination, just after printing "The 
N largest leaks:":
env LD_PRELOAD=libtcmalloc.so PPROF_PATH=$(which pprof) HEAPCHECK=normal 
./my_app

What version of the product are you using? On what operating system?
2.3rc, Ubuntu 10.04

Original issue reported on code.google.com by bryce.schober on 10 Nov 2014 at 7:33

GoogleCodeExporter commented 9 years ago

Thanks for reporting it. Couple questions:

* do you see that hang with earlier versions of gperftools. I'm asking to know 
if it's regression introduced recently (which would be easier to track down) or 
it's something that happens with older versions.

* Can you post simple program that reproduces the hang?

* if you cannot, please attach gdb to hang process and give us backtraces of 
all threads. So that we can see exactly where it hangs.

Original comment by alkondratenko on 10 Nov 2014 at 7:50

GoogleCodeExporter commented 9 years ago

For some reason, I have two pids, and gdb doesn't list the other pid as a 
thread... which may be indicative of us doing something strange. For the first 
PID:

(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0xb4e16730 (LWP 4413) 0xb7754430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7754430 in __kernel_vsyscall ()
#1  0xb7097f5b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7716d33 in read (__nbytes=<optimized out>, __buf=<optimized out>, 
__fd=<optimized out>) at /usr/include/bits/unistd.h:45
#3  SymbolTable::Symbolize (this=0xbfe8a444) at src/symbolize.cc:249
#4  0xb771dcca in HeapProfileTable::Snapshot::ReportLeaks (this=0xae528360, 
checker_name=0x121265b8 "_main_", 
    filename=0xae52c7f0 "/tmp/ampm_pc_debug.4413._main_-end.heap", should_symbolize=true) at src/heap-profile-table.cc:587
#5  0xb770cd8b in HeapLeakChecker::DoNoLeaks (this=0x121235c0, 
should_symbolize=HeapLeakChecker::SYMBOLIZE) at src/heap-checker.cc:1857
#6  0xb770d27c in HeapLeakChecker::NoGlobalLeaksMaybeSymbolize 
(should_symbolize=HeapLeakChecker::SYMBOLIZE) at src/heap-checker.cc:2147
#7  0xb770d367 in HeapLeakChecker::DoMainHeapCheck () at 
src/heap-checker.cc:2169
#8  0xb770d535 in HeapLeakChecker_AfterDestructors () at 
src/heap-checker.cc:2317
#9  0xb771017c in HeapLeakCheckerGlobalPrePost::~HeapLeakCheckerGlobalPrePost 
(this=0xb7736ab0, __in_chrg=<optimized out>) at src/heap-checker-bcad.cc:82
#10 0xb6b87318 in __cxa_finalize () from /lib/tls/i686/cmov/libc.so.6
#11 0xb7702024 in __do_global_dtors_aux () from /usr/local/lib/libtcmalloc.so
#12 0xb772b760 in _fini () from /usr/local/lib/libtcmalloc.so
#13 0xb7763266 in ?? () from /lib/ld-linux.so.2
#14 0xb6b86f4f in ?? () from /lib/tls/i686/cmov/libc.so.6
#15 0xb6b86fbf in exit () from /lib/tls/i686/cmov/libc.so.6
#16 0x0805b3ef in jump_ship(<anonymous enum>, bool) 
(reset_or_powerdown=PACKET_OUT_SYSTEM_POWERDOWN, is_host_kb_exit=true) at 
main.cpp:96
#17 0x0805b65d in main (argc=1, argv=0xbfe8b0c4, envp=0x12176000) at 
main.cpp:222

For the second PID:

(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0xb4e16730 (LWP 4424) 0xb77254df in sys_futex (t=<optimized out>, 
v=<optimized out>, o=<optimized out>, a=<optimized out>)
    at ./src/base/linux_syscall_support.h:2097
(gdb) bt
#0  0xb77254df in sys_futex (t=<optimized out>, v=<optimized out>, o=<optimized 
out>, a=<optimized out>) at ./src/base/linux_syscall_support.h:2097
#1  base::internal::SpinLockDelay (w=0xb773e2a0, value=2, loop=2088) at 
./src/base/spinlock_linux-inl.h:88
#2  0xb77251e0 in SpinLock::SlowLock (this=0xb773e2a0) at 
src/base/spinlock.cc:133
#3  0xb7713819 in Lock (this=<optimized out>) at src/base/spinlock.h:71
#4  SpinLockHolder (this=<optimized out>, l=<optimized out>) at 
src/base/spinlock.h:136
#5  tcmalloc::CentralFreeList::InsertRange (this=0xb773e2a0, start=0x1216b950, 
end=0x1216ad20, N=106) at src/central_freelist.cc:232
#6  0xb7717730 in tcmalloc::ThreadCache::ReleaseToCentralCache 
(this=0x113a38e4, src=0x113a3930, cl=4, N=2) at src/thread_cache.cc:236
#7  0xb77177a0 in tcmalloc::ThreadCache::ListTooLong (this=0x113a38e4, 
list=0x113a3930, cl=4) at src/thread_cache.cc:197
#8  0xb7729c2f in Deallocate (cl=<optimized out>, ptr=<optimized out>, 
this=<optimized out>) at src/thread_cache.h:390
#9  do_free_helper (heap_must_be_valid=<optimized out>, heap=<optimized out>, 
invalid_free_fn=<optimized out>, ptr=<optimized out>) at src/tcmalloc.cc:1195
#10 do_free_with_callback (invalid_free_fn=<optimized out>, ptr=<optimized 
out>) at src/tcmalloc.cc:1225
#11 do_free (ptr=<optimized out>) at src/tcmalloc.cc:1234
#12 tc_free (ptr=0x1216b950) at src/tcmalloc.cc:1585
#13 0xb7463e39 in ?? () from /usr/lib/nvidia-current/libGL.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Original comment by bryce.schober on 11 Nov 2014 at 1:18

GoogleCodeExporter commented 9 years ago

I tried gpertools 2.2.1 and 2.1 and had the same behavior. I tried strict and 
draconian modes, and that made no difference.

Then out of curiousity, I tried some apps other than mine. Since I notice the 
opengl library in the backtrace, I tried glxinfo, and it had the same behavior:

$ env LD_PRELOAD=/usr/local/lib/libtcmalloc.so PPROF_PATH=/usr/local/bin/pprof 
HEAPCHECK=normal glxinfo
WARNING: Perftools heap leak checker is active -- Performance may suffer
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
server glx extensions:
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
    GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
    GLX_EXT_swap_control, GLX_EXT_swap_control_tear, 
    GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
    GLX_ARB_create_context_profile, GLX_EXT_create_context_es_profile, 
    GLX_EXT_create_context_es2_profile, GLX_ARB_create_context_robustness, 
    GLX_ARB_multisample, GLX_NV_float_buffer, GLX_ARB_fbconfig_float, 
    GLX_EXT_framebuffer_sRGB, GLX_NV_multisample_coverage
client glx vendor string: NVIDIA Corporation
client glx version string: 1.4
client glx extensions:
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_visual_info, 
    GLX_EXT_visual_rating, GLX_EXT_import_context, GLX_SGI_video_sync, 
    GLX_NV_swap_group, GLX_NV_video_out, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGI_swap_control, GLX_EXT_swap_control, GLX_EXT_swap_control_tear, 
    GLX_ARB_create_context, GLX_ARB_create_context_profile, 
    GLX_NV_float_buffer, GLX_ARB_fbconfig_float, 
    GLX_EXT_fbconfig_packed_float, GLX_EXT_texture_from_pixmap, 
    GLX_EXT_framebuffer_sRGB, GLX_NV_present_video, GLX_NV_copy_image, 
    GLX_NV_multisample_coverage, GLX_NV_video_capture, 
    GLX_EXT_create_context_es2_profile, GLX_ARB_create_context_robustness
GLX version: 1.4
GLX extensions:
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_SGIX_fbconfig, 
    GLX_SGIX_pbuffer, GLX_SGI_video_sync, GLX_SGI_swap_control, 
    GLX_EXT_swap_control, GLX_EXT_swap_control_tear, 
    GLX_EXT_texture_from_pixmap, GLX_ARB_create_context, 
    GLX_ARB_create_context_profile, GLX_EXT_create_context_es2_profile, 
    GLX_ARB_create_context_robustness, GLX_ARB_multisample, 
    GLX_NV_float_buffer, GLX_ARB_fbconfig_float, GLX_EXT_framebuffer_sRGB, 
    GLX_NV_multisample_coverage, GLX_ARB_get_proc_address
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce GT 430/PCIe/SSE2
OpenGL version string: 4.2.0 NVIDIA 304.84
OpenGL shading language version string: 4.20 NVIDIA via Cg compiler
<snipped all the rest of the glxinfo output>

Have memory regions w/o callers: might report false leaks
Leak check _main_ detected leaks of 120 bytes in 1 objects
The 1 largest leaks:
<spinning forever>

I see a similar situation with multiple PIDs for glxinfo:
$ ps --forest -g 3240
  PID TTY          TIME CMD
 3240 pts/0    00:00:12 bash
 4493 pts/0    00:00:00  \_ glxinfo
 4495 pts/0    00:00:00      \_ glxinfo

The backtrace for PID 4493:
(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0xb54cf700 (LWP 4493) 0xb7740430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7740430 in __kernel_vsyscall ()
#1  0xb748ef33 in __read_nocancel () from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7702d33 in read (__nbytes=<optimized out>, __buf=<optimized out>, 
__fd=<optimized out>) at /usr/include/bits/unistd.h:45
#3  SymbolTable::Symbolize (this=0xbfc39e44) at src/symbolize.cc:249
#4  0xb7709cca in HeapProfileTable::Snapshot::ReportLeaks (this=0xb5352180, 
checker_name=0x94d0030 "_main_", filename=0xb5356610 
"/tmp/glxinfo.4493._main_-end.heap", 
    should_symbolize=true) at src/heap-profile-table.cc:587
#5  0xb76f8d8b in HeapLeakChecker::DoNoLeaks (this=0x94cc2c0, 
should_symbolize=HeapLeakChecker::SYMBOLIZE) at src/heap-checker.cc:1857
#6  0xb76f927c in HeapLeakChecker::NoGlobalLeaksMaybeSymbolize 
(should_symbolize=HeapLeakChecker::SYMBOLIZE) at src/heap-checker.cc:2147
#7  0xb76f9367 in HeapLeakChecker::DoMainHeapCheck () at 
src/heap-checker.cc:2169
#8  0xb76f9535 in HeapLeakChecker_AfterDestructors () at 
src/heap-checker.cc:2317
#9  0xb76fc17c in HeapLeakCheckerGlobalPrePost::~HeapLeakCheckerGlobalPrePost 
(this=0xb7722ab0, __in_chrg=<optimized out>) at src/heap-checker-bcad.cc:82
#10 0xb734f318 in __cxa_finalize () from /lib/tls/i686/cmov/libc.so.6
#11 0xb76ee024 in __do_global_dtors_aux () from /usr/local/lib/libtcmalloc.so
#12 0xb7717760 in _fini () from /usr/local/lib/libtcmalloc.so
#13 0xb774f266 in ?? () from /lib/ld-linux.so.2
#14 0xb734ef4f in ?? () from /lib/tls/i686/cmov/libc.so.6
#15 0xb734efbf in exit () from /lib/tls/i686/cmov/libc.so.6
#16 0xb7336bfe in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#17 0x08048e11 in ?? ()
Backtrace stopped: Not enough registers or memory available to unwind further

The backtrace for PID 4495:
(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0xb54cf700 (LWP 4495) 0xb77114df in sys_futex (t=<optimized out>, 
v=<optimized out>, o=<optimized out>, a=<optimized out>)
    at ./src/base/linux_syscall_support.h:2097
(gdb) bt
#0  0xb77114df in sys_futex (t=<optimized out>, v=<optimized out>, o=<optimized 
out>, a=<optimized out>) at ./src/base/linux_syscall_support.h:2097
#1  base::internal::SpinLockDelay (w=0xb772a520, value=2, loop=4624) at 
./src/base/spinlock_linux-inl.h:88
#2  0xb77111e0 in SpinLock::SlowLock (this=0xb772a520) at 
src/base/spinlock.cc:133
#3  0xb76ff819 in Lock (this=<optimized out>) at src/base/spinlock.h:71
#4  SpinLockHolder (this=<optimized out>, l=<optimized out>) at 
src/base/spinlock.h:136
#5  tcmalloc::CentralFreeList::InsertRange (this=0xb772a520, start=0x94d8800, 
end=0x94d8f80, N=13) at src/central_freelist.cc:232
#6  0xb7703730 in tcmalloc::ThreadCache::ReleaseToCentralCache (this=0x8d4d8e4, 
src=0x8d4d93c, cl=5, N=2) at src/thread_cache.cc:236
#7  0xb77037a0 in tcmalloc::ThreadCache::ListTooLong (this=0x8d4d8e4, 
list=0x8d4d93c, cl=5) at src/thread_cache.cc:197
#8  0xb7715c2f in Deallocate (cl=<optimized out>, ptr=<optimized out>, 
this=<optimized out>) at src/thread_cache.h:390
#9  do_free_helper (heap_must_be_valid=<optimized out>, heap=<optimized out>, 
invalid_free_fn=<optimized out>, ptr=<optimized out>) at src/tcmalloc.cc:1195
#10 do_free_with_callback (invalid_free_fn=<optimized out>, ptr=<optimized 
out>) at src/tcmalloc.cc:1225
#11 do_free (ptr=<optimized out>) at src/tcmalloc.cc:1234
#12 tc_free (ptr=0x94d8800) at src/tcmalloc.cc:1585
#13 0xb7664e39 in ?? () from /usr/lib/nvidia-current/libGL.so.1
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I also tried a non-opengl app and instead of locking up on termination, it 
ended with this:
WARNING: Perftools heap leak checker is active -- Performance may suffer
<app output snipped>
Have memory regions w/o callers: might report false leaks
Leak check _main_ detected leaks of 104 bytes in 11 objects
The 3 largest leaks:
Using remote profile at netstat.
Use of uninitialized value $line in substitution (s///) at /usr/local/bin/pprof 
line 3214.
http://netstat/pprof/symbol doesn't exist
Leak of 60 bytes in 5 objects allocated from:
    @ 80505d6 
    @ 8049fd1 
    @ 804a72d 
    @ 804decc 
    @ b75c1bf6 
    @ 8049811 
<etc...>

What caught my eye was the remote profile message and wierd http URL, what's 
going on there?

Original comment by bryce.schober on 11 Nov 2014 at 1:38

GoogleCodeExporter commented 9 years ago

I was unable to reproduce your problem.

But I there's probably something with pprof. Symbolize function which is seen 
in one of stack traces is spawning pprof to convert addresses to function 
names. And given that it's stuck in read it appears that pprof might be still 
alive.

In order to diagnose I propose you to report output of ps aux --forest here.

Also maybe it might make sense try linking your app to libtcmalloc.so instead 
of LD_PRELOAD-ing it.

Original comment by alkondratenko on 29 Nov 2014 at 7:16

stellar2012wxg / gperftools

heapcheck locks up in pprof #656