stellar2012wxg / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

tcmalloc 2.1 hangs in ListAllProcessThreads #560

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
After upgrading our system to gperftools 2.1 today we experienced a hang on our 
integration test machine (stats below). The test which hung is very simple -- 
single threaded short running unit test which doesn't allocate much memory.

ps shows two pids:

jenkins   4627  0.0  0.0  36500  8200 ?        T    18:02   0:00 
/.../string_case-test
jenkins   4628  0.0  0.0      0     0 ?        Z    18:02   0:00 
[string_case-tes] <defunct>

pid 4627's gdb backtrace is as follows:
#0  0x0000000000488743 in wait4 (parameter=<value optimized out>, 
callback=<value optimized out>) at src/base/linux_syscall_support.h:2028
#1  waitpid (parameter=<value optimized out>, callback=<value optimized out>) 
at src/base/linux_syscall_support.h:2030
#2  ListAllProcessThreads (parameter=<value optimized out>, callback=<value 
optimized out>) at src/base/linuxthreads.cc:644
#3  0x000000000047f588 in HeapLeakChecker::IgnoreAllLiveObjectsLocked 
(self_stack_top=0x7fff23d09584) at src/heap-checker.cc:1311
#4  0x000000000047fa05 in HeapLeakChecker::DoNoLeaks (this=0x13e0700, 
should_symbolize=HeapLeakChecker::SYMBOLIZE) at src/heap-checker.cc:1762
#5  0x000000000048013c in HeapLeakChecker::DoMainHeapCheck () at 
src/heap-checker.cc:2162
#6  0x00000000004802bd in HeapLeakChecker_AfterDestructors () at 
src/heap-checker.cc:2309
#7  0x0000003adf035d92 in exit () from /lib64/libc.so.6
#8  0x0000003adf01ece4 in __libc_start_main () from /lib64/libc.so.6
#9  0x00000000004416b9 in _start ()

pid 4628 is not attachable via gdb. It seems to be stuck in a syscall:
[todd@a1221 centos6-kudu]$ sudo cat /proc/4628/stack
[<ffffffff8106ec27>] do_exit+0x5a7/0x860
[<ffffffff8106efe7>] sys_exit+0x17/0x20
[<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

gperftools 2.0 has been running fine on this same test suite for months, so it 
seems to be a regression.

[jenkins@a1221 centos6]$ lsb_release -a
LSB Version:    
:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-
4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.2 (Final)
Release:        6.2
Codename:       Final
[jenkins@a1221 centos6]$ uname -r
2.6.32-220.el6.x86_64

Original issue reported on code.google.com by tlip...@gmail.com on 3 Aug 2013 at 1:47

GoogleCodeExporter commented 9 years ago
thanks for reporting it.

Original comment by alkondratenko on 3 Aug 2013 at 9:44

GoogleCodeExporter commented 9 years ago
If you can test from git, please do. You can grab fix from 
https://github.com/alk/gperftools/tree/wip-issue-560

Alternatively you can apply attached patch (with patch -p1 <THE_PATCH)

Original comment by alkondratenko on 4 Aug 2013 at 5:58

Attachments:

GoogleCodeExporter commented 9 years ago
Uploaded patch to rietveld too: https://codereview.appspot.com/12445043/

Original comment by alkondratenko on 4 Aug 2013 at 6:12

GoogleCodeExporter commented 9 years ago
Thanks. Will give that a try in our build environment and report back.

Original comment by tlip...@gmail.com on 4 Aug 2013 at 6:55

GoogleCodeExporter commented 9 years ago
I've merged a fix. Thanks for pointing this out

Original comment by alkondratenko on 17 Aug 2013 at 4:20