koolhazz / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

NetBSD: self tests don't finish #609

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I've tried running the self tests, but they stop:
gmake[2]: Entering directory '/archive/foreign/gperftools'
PASS: low_level_alloc_unittest
PASS: atomicops_unittest
PASS: stacktrace_unittest
PASS: tcmalloc_minimal_unittest
PASS: tcmalloc_minimal_large_unittest
PASS: tcmalloc_minimal_large_heap_fragmentation_unittest
PASS: addressmap_unittest
PASS: system_alloc_unittest
PASS: packed_cache_test
PASS: frag_unittest
PASS: markidle_unittest
PASS: current_allocated_bytes_test
PASS: malloc_hook_test
PASS: malloc_extension_test
PASS: memalign_unittest
PASS: page_heap_test
PASS: pagemap_unittest
PASS: realloc_unittest
PASS: stack_trace_table_test
PASS: thread_dealloc_unittest
PASS: tcmalloc_minimal_debug_unittest
PASS: malloc_extension_debug_test
PASS: memalign_debug_unittest
PASS: realloc_debug_unittest
rm -f debugallocation_test.sh
cp -p ./src/tests/debugallocation_test.sh debugallocation_test.sh

After this point, nothing happens for hours (>4).

Original issue reported on code.google.com by tk@giga.or.at on 25 Feb 2014 at 8:51

GoogleCodeExporter commented 9 years ago
I think I'm seeing something similar on arm with current master. Might be issue 
with stacktrace grabbing.

May I have backtraces from process that's stuck ?

Also which hw architecture you're running on ?

Original comment by alkondratenko on 25 Feb 2014 at 9:26

GoogleCodeExporter commented 9 years ago
Sorry for not giving details: it's on NetBSD-6.99.32/x86_64.
It seems to get stuck already in the first death test.
gdb debugallocation_test <pid>
shows:

(gdb) thread apply all bt

Thread 1 (LWP 1):
#0  0x00007f7ff603b48a in _sys___nanosleep50 () from /usr/lib/libc.so.12
#1  0x00007f7ff70070dc in __nanosleep50 (rqtp=0x7f7fffffbc90, rmtp=0x0) at 
/archive/foreign/src/lib/libpthread/pthread_cancelstub.c:398
#2  0x00007f7ff782af27 in base::internal::SpinLockDelay (w=<optimized out>, 
value=<optimized out>, loop=<optimized out>) at 
./src/base/spinlock_posix-inl.h:54
#3  0x00007f7ff782ae46 in SpinLock::SlowLock (this=0x7f7ff7a4f560 
<tcmalloc::Static::pageheap_lock_>) at src/base/spinlock.cc:133
#4  0x00007f7ff7823b4d in Lock (this=<optimized out>) at src/base/spinlock.h:71
#5  SpinLockHolder (l=<optimized out>, this=<optimized out>) at 
src/base/spinlock.h:136
#6  tcmalloc::ThreadCache::InitModule () at src/thread_cache.cc:314
#7  0x00007f7ff782ec97 in tcmalloc::ThreadCache::GetCache () at 
src/thread_cache.h:420
#8  0x00007f7ff78369b6 in do_malloc_no_errno (size=20752) at 
src/tcmalloc.cc:1102
#9  do_malloc (size=20752) at src/tcmalloc.cc:1109
#10 Allocate (type=-271733872, size=20704) at src/debugallocation.cc:529
#11 DebugAllocate (type=-271733872, size=20704) at src/debugallocation.cc:1015
#12 do_debug_malloc_or_debug_cpp_alloc (size=20704) at 
src/debugallocation.cc:1215
#13 tc_malloc (size=20704) at src/debugallocation.cc:1221
#14 0x00007f7ff6407a7f in start_fde_sort (count=<optimized out>, 
accu=<optimized out>) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:399
#15 init_object (ob=0x7f7ff6ef6788) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:743
#16 search_object (ob=0x7f7ff6ef6788, pc=0x7f7ff640472e <_Unwind_Backtrace+48>) 
at /archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:933
#17 0x00007f7ff640885d in _Unwind_Find_registered_FDE (bases=0x7f7fffffc348, 
pc=0x7f7ff640472e <_Unwind_Backtrace+48>) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde.c:997
#18 _Unwind_Find_FDE (pc=0x7f7ff640472e <_Unwind_Backtrace+48>, 
bases=0x7f7fffffc348) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2-fde-glibc.c:421
#19 0x00007f7ff6403afa in uw_frame_state_for (context=0x7f7fffffc2a0, 
fs=0x7f7fffffbf60) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2.c:1130
#20 0x00007f7ff6403fb2 in uw_init_context_1 (context=0x7f7fffffc2a0, 
outer_cfa=0x7f7fffffc3d0, outer_ra=0x7f7ff7400fb8 <backtrace+40>) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind-dw2.c:1449
#21 0x00007f7ff640472f in _Unwind_Backtrace (trace=0x7f7ff7400ff0 <tracer>, 
trace_argument=0x7f7fffffc3d8) at 
/archive/foreign/src/external/gpl3/gcc/dist/gcc/unwind.inc:283
#22 0x00007f7ff7400fb8 in backtrace (arr=<optimized out>, len=<optimized out>) 
at /archive/foreign/src/lib/libexecinfo/unwind.c:67
#23 0x00007f7ff782b404 in GetStackTrace_generic (result=0x622c10, max_depth=30, 
skip_count=3) at src/stacktrace_generic-inl.h:68
#24 0x00007f7ff782b6c6 in GetStackTrace (result=<optimized out>, 
max_depth=<optimized out>, skip_count=<optimized out>) at src/stacktrace.cc:228
#25 0x00007f7ff78216b6 in RecordGrowth (growth=1048576) at src/page_heap.cc:498
#26 tcmalloc::PageHeap::GrowHeap (this=0x662c00, n=<optimized out>) at 
src/page_heap.cc:524
#27 0x00007f7ff78219ca in tcmalloc::PageHeap::New (this=0x662c00, n=2) at 
src/page_heap.cc:155
#28 0x00007f7ff7820585 in tcmalloc::CentralFreeList::Populate 
(this=0x7f7ff7a52160 <tcmalloc::Static::central_cache_+7296>) at 
src/central_freelist.cc:329
#29 0x00007f7ff782078f in tcmalloc::CentralFreeList::FetchFromOneSpansSafe 
(this=0x7f7ff7a52160 <tcmalloc::Static::central_cache_+7296>, N=1, 
start=0x7f7fffffc858, end=0x7f7fffffc850)
    at src/central_freelist.cc:284
#30 0x00007f7ff782081d in tcmalloc::CentralFreeList::RemoveRange 
(this=0x7f7ff7a52160 <tcmalloc::Static::central_cache_+7296>, 
start=0x7f7fffffc858, end=0x7f7fffffc850, N=1) at src/central_freelist.cc:264
#31 0x00007f7ff78230eb in tcmalloc::ThreadCache::FetchFromCentralCache 
(this=0x6edc98, cl=6, byte_size=80) at src/thread_cache.cc:165
#32 0x00007f7ff78369c1 in do_malloc_no_errno (size=80) at src/tcmalloc.cc:1102
#33 do_malloc (size=80) at src/tcmalloc.cc:1109
#34 Allocate (type=-271733872, size=32) at src/debugallocation.cc:529
#35 DebugAllocate (type=-271733872, size=32) at src/debugallocation.cc:1015
#36 do_debug_malloc_or_debug_cpp_alloc (size=32) at src/debugallocation.cc:1215
#37 tc_malloc (size=32) at src/debugallocation.cc:1221
#38 0x00007f7ff6107052 in atexit_handler_alloc (dso=0x7f7ff6ef6780 
<__dso_handle>) at /archive/foreign/src/lib/libc/stdlib/atexit.c:112
#39 __cxa_atexit (func=0x7f7ff6cc503a 
<__eh_globals_init::~__eh_globals_init()>, arg=0x7f7ff6f0bef0, 
dso=0x7f7ff6ef6780 <__dso_handle>) at 
/archive/foreign/src/lib/libc/stdlib/atexit.c:145
#40 0x00007f7ff6c65904 in ?? () from /usr/lib/libstdc++.so.7
#41 0x00007f7ff7ff97e0 in ?? ()
#42 0x00007f7ff6c62c99 in _init () from /usr/lib/libstdc++.so.7
#43 0x0000000000000000 in ?? ()

If I continue and press CTRL-C a short while later, the backtrace visually 
doesn't look different.

Original comment by tk@giga.or.at on 25 Feb 2014 at 9:32

GoogleCodeExporter commented 9 years ago
Thanks. That's close to what I expected. It's in deadlock state. Trying to grab 
lock that's already taken.

Which is slightly weird. AFAIK since long time ago libunwind is not using 
malloc directly but doing mmap and it's own memory management. At least that's 
what causes things to work on GNU/Linux.

What is your libunwind version BTW ?

Original comment by alkondratenko on 25 Feb 2014 at 9:35

GoogleCodeExporter commented 9 years ago
Correction: on GNU/Linux and x86. GNU/Linux on arm by default does fopen out of 
libunwind which internally does malloc and causes very similar deadlock.

And I believe that's fixable situation.

Original comment by alkondratenko on 25 Feb 2014 at 9:37

GoogleCodeExporter commented 9 years ago
I'm not using libunwind, I'm using backtrace() from NetBSD's libexecinfo:

# ldd .libs/debugallocation_test
.libs/debugallocation_test:
        -ltcmalloc_debug.5 => not found
        -lexecinfo.0 => /usr/lib/libexecinfo.so.0
        -lelf.0 => /usr/lib/libelf.so.0
        -lc.12 => /usr/lib/libc.so.12
        -lpthread.1 => /usr/lib/libpthread.so.1
        -lstdc++.7 => /usr/lib/libstdc++.so.7
        -lgcc_s.1 => /usr/lib/libgcc_s.so.1
        -lm.0 => /usr/lib/libm.so.0

Does it need its own workaround, or are changes needed in it?

Original comment by tk@giga.or.at on 25 Feb 2014 at 9:40

GoogleCodeExporter commented 9 years ago
Ok. Then it's same is gnu libc's backtrace. Code mentions that it'll call 
malloc causing troubles.

I'm thinking about detecting malloc-called-from-malloc situation using 
thread-local-variable. And when it's detected I plan to use separate simplistic 
"emergency" allocator. I believe it'll work. And it should fix backtrace() on 
GNU/Linux as well as hopefully on bsds.

Right now that's just my plans. I cannot promise you any dates. Feel free to 
take this work yourself.

Alternatively, if all you need is just malloc, then simply build with 
--enable-minimal and it will not have any code for stacktrace capturing.

Original comment by alkondratenko on 25 Feb 2014 at 9:43

GoogleCodeExporter commented 9 years ago
This sounds like it should work.
I was mostly interested in using it for profiling, so I guess I should be fine 
even with the current version.

Thanks for working on gperftools!

Original comment by tk@giga.or.at on 25 Feb 2014 at 9:50