Closed GoogleCodeExporter closed 9 years ago
Interesting. I actually have a 10.6.3 system to test on, and these tests pass
just
fine there.
Is there any way for you to run this in the debugger, and maybe put a
breakpoint at
the point in the code that sets the malloc hooks (in malloc_hook.h)? Maybe
that will
give some insight as to what's going on for you. You'll probably want to
reconfigure
with
./configure CXXFLAGS=-g
Original comment by csilv...@gmail.com
on 20 May 2010 at 11:33
For some reason, it passes all 38 tests with CXXFLAGS=-g. However,
debugallocation_test throws up a crash
dialog, despite passing all tests. The crash seems to only be triggered when
running it through
./debugallocation_test.sh (or via 'make check'). Running ./debugallocation_test
manually does not crash. And the
test passes, so that's weird. Stack trace attached.
I'm really confused why those three tests fail without -g, though...
Original comment by neunon
on 21 May 2010 at 12:33
Attachments:
Also tried a rebuild after doing 'ccache -zC'. I get the same weird result.
CXXFLAGS=-g fixes everything but
debugallocation_test, and things still fail without CXXFLAGS=-g.
Original comment by neunon
on 21 May 2010 at 12:44
} However, debugallocation_test throws up a crash dialog, despite passing all
tests.
That's expected. That test does several things which are supposed to crash.
The
driver .sh script tests that they crash as expected.
See what happens if you run with CXXFLAGS="-g -O1". It seems to be some
trouble due
to optimization, but I'm not sure what it might be.
Original comment by csilv...@gmail.com
on 21 May 2010 at 1:10
The same tests fail with "-g -O1".
Original comment by neunon
on 21 May 2010 at 5:38
Good to know. Is it possible for you to run one of the failing tests under
gdb, and
try to figure out where the code is setting the malloc hooks? Hopefully it
will be
possible to get ok debugging results with -O1 (-O2 can be very difficult to
debug).
Original comment by csilv...@gmail.com
on 21 May 2010 at 6:02
Not being familiar with the perftools internals, where would be the most
revealing locations to set breakpoints? I
assume MallocHook_SetNewHook and MallocHook_SetDeleteHook are the only places
that ever set hooks for
new/delete, right?
Original comment by neunon
on 21 May 2010 at 11:48
Exactly right.
Original comment by csilv...@gmail.com
on 22 May 2010 at 12:15
I doubt this helps at all:
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools-read-only/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Starting tracking the heap
Breakpoint 1, MallocHook_SetNewHook (hook=0x1000d38ab <NewHook(void const*,
unsigned long)>) at
src/malloc_hook.cc:156
156 return new_hook_.Exchange(hook);
(gdb) bt
#0 MallocHook_SetNewHook (hook=0x1000d38ab <NewHook(void const*, unsigned
long)>) at
src/malloc_hook.cc:156
#1 0x00000001000c9f62 in HeapProfilerStart (prefix=0x100584018
"/var/folders/uk/ukYFQiAKHcWpuUFyY0w3kU+++TI/-Tmp-//start_stop") at
src/heap-profiler.cc:511
#2 0x00000001000016ba in std::string::_M_rep () at
/usr/include/c++/4.2.1/bits/basic_string.h:88
#3 0x00000001000016ba in ~basic_string [inlined] () at
/usr/include/c++/4.2.1/bits/basic_string.h:493
#4 ~basic_string [inlined] () at /usr/include/c++/4.2.1/bits/basic_string.h:88
#5 TestHeapProfilerStartStopIsRunning [inlined] () at
/Users/tycho/Development/google-perftools-read-
only/src/tests/heap-profiler_unittest.cc:493
#6 0x00000001000016ba in main (argc=<value temporarily unavailable, due to
optimizations>,
argv=0x7fff5fbfde20) at src/tests/heap-profiler_unittest.cc:130
(gdb) cont
Continuing.
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use
--heap_check="" to avoid
this conflict.
Program received signal SIGABRT, Aborted.
0x00007fff80b9b886 in __kill ()
(gdb) bt
#0 0x00007fff80b9b886 in __kill ()
#1 0x00007fff80c3beae in abort ()
#2 0x00000001000c9a04 in LogPrintf [inlined] () at
/Users/tycho/Development/google-perftools-read-
only/src/base/logging.h:198
#3 0x00000001000c9a04 in RAW_LOG (lvl=-4, pat=<value temporarily unavailable,
due to optimizations>)
at logging.h:217
#4 0x00000001000c9f7d in MallocHook::SetDeleteHook () at
/Users/tycho/Development/google-perftools-
read-only/src/google/malloc_hook.h:514
#5 0x00000001000c9f7d in HeapProfilerStart (prefix=0x100584018
"/var/folders/uk/ukYFQiAKHcWpuUFyY0w3kU+++TI/-Tmp-//start_stop") at
src/heap-profiler.cc:516
#6 0x00000001000016ba in std::string::_M_rep () at
/usr/include/c++/4.2.1/bits/basic_string.h:88
#7 0x00000001000016ba in ~basic_string [inlined] () at
/usr/include/c++/4.2.1/bits/basic_string.h:493
#8 ~basic_string [inlined] () at /usr/include/c++/4.2.1/bits/basic_string.h:88
#9 TestHeapProfilerStartStopIsRunning [inlined] () at
/Users/tycho/Development/google-perftools-read-
only/src/tests/heap-profiler_unittest.cc:493
#10 0x00000001000016ba in main (argc=<value temporarily unavailable, due to
optimizations>,
argv=0x7fff5fbfde20) at src/tests/heap-profiler_unittest.cc:130
(gdb)
So I modified the source to find out which hook was failing (instead of if
(setnewhook || setdeletehook), I split
it into two if statements and printed clearly different messages). The
MallocHook_SetNewHook is definitely the
one that triggered the fatal error.
I notice this:
AtomicPtr<MallocHook::NewHook> new_hook_ = {
reinterpret_cast<AtomicWord>(InitialMallocHook_New) };
AtomicPtr<MallocHook::DeleteHook> delete_hook_ = { 0 };
I'm not entirely certain why new_hook_ has this InitialMallocHook_New. It seems
odd, since
InitialMallocHook_New uninstalls itself on the first call, so why wouldn't
new_hook_ be NULL to start? I tried
initializing new_hook_ to '0', and I get this:
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools-read-only/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Starting tracking the heap
Starting tracking the heap
DONE.
Program exited normally.
(gdb)
Have I simply side-stepped the issue, or was this the actual bug? It seems far
too simple.
Original comment by neunon
on 22 May 2010 at 1:05
Oh, and changing new_hook_ to be initialised to '0' gives me:
===================
All 38 tests passed
===================
Original comment by neunon
on 22 May 2010 at 1:13
} AtomicPtr<MallocHook::NewHook> new_hook_ = {
} reinterpret_cast<AtomicWord>(InitialMallocHook_New) };
Aha! It's a problem with weak symbols. Maybe. That makes sense --
it's the kind of thing one wouldn't be surprised to break from one OS
(or libc) version to another.
InitialMallocHook_New *is* a kind of trivial function in most cases,
but not when linking with heap-checker.cc. This is kind of magical
behavior, which is implemented via this line in malloc_hook.cc:
ATTRIBUTE_WEAK
extern void InitialMallocHook_New(const void* ptr, size_t size);
The comments above this line explain a bit about what's going on.
Now the question is why ATTRIBUTE_WEAK is doing something wonky for
you. First, let's confirm that it is. If you can run in gdb, the
easiest would be to do something like 'b InitialMallocHook_New', and
see what function the breakpoint is put in. That will tell us what
version the linker chose.
If that's not practical, you can put printf statements in
InitialMallocHook_New in both malloc_hook.cc and heap-checker.cc, and
see which printf is actually getting printed.
} The MallocHook_SetNewHook is definitely the one that triggered the
} fatal error.
Actually, your backtrace is showing it's from SetDeleteHook:
} #3 0x00000001000c9a04 in RAW_LOG (lvl=-4, pat=<value temporarily
} unavailable, due to optimizations>)
} at logging.h:217
} #4 0x00000001000c9f7d in MallocHook::SetDeleteHook () at
} /Users/tycho/Development/google-perftools-
} read-only/src/google/malloc_hook.h:514
Am I misreading this? (I must be, if changing new_hook_ initialization fixes
your
problem.) In any case, best to repeat the above (gdb or
printf) with InitialMallocHook_Delete too. Let me know what it shows
up.
Original comment by csilv...@gmail.com
on 24 May 2010 at 3:51
Well, this is, uh. Interesting:
(gdb) b InitialMallocHook_New
Breakpoint 1 at 0x20c49ba5e24515: file atomicops.h, line 212.
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb)
It set the breakpoint in atomicops.h above, but then actually tripped the
breakpoint in malloc_hook.cc:218.
It also looks like it never un-sets itself:
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100580008, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100581000, size=16) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100580008, size=8) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100582000, size=30) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=8) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100583000, size=75) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Breakpoint 1, InitialMallocHook_New (ptr=0x100584000, size=125) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) cont
Continuing.
Starting tracking the heap
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use
--heap_check="" to avoid
this conflict.
Program received signal SIGABRT, Aborted.
0x00007fff80b9b886 in __kill ()
(gdb) bt
#0 0x00007fff80b9b886 in __kill ()
#1 0x00007fff80c3beae in abort ()
#2 0x00000001000ca698 in RAW_LOG (lvl=-4, pat=0x6 <Address 0x6 out of bounds>)
at logging.h:198
#3 0x00000001000caaf9 in HeapProfilerStart (prefix=0x100584018
"/var/folders/uk/ukYFQiAKHcWpuUFyY0w3kU+++TI/-Tmp-//start_stop") at
src/heap-profiler.cc:515
#4 0x0000000100001710 in TestHeapProfilerStartStopIsRunning [inlined] () at
/Users/tycho/Development/google-perftools/src/tests/heap-profiler_unittest.cc:88
#5 0x0000000100001710 in main (argc=<value temporarily unavailable, due to
optimizations>,
argv=0x7fff5fbfe040) at src/tests/heap-profiler_unittest.cc:130
(gdb)
Original comment by neunon
on 24 May 2010 at 4:29
} It set the breakpoint in atomicops.h above, but then actually tripped the
breakpoint
} in malloc_hook.cc:218.
I wouldn't worry too much about that -- that's just optimizer goodness, I'm
guessing.
} It also looks like it never un-sets itself:
Instead of just continuing at the breakpoint, can you hit something like the
following
(gdb) bt
(gdb) s
(gdb) finish
[should print the return value from MallocHook::GetNewHook()]
(gdb) print InitialMallocHook_New
(gdb) n
(gdb) n
This should give us insight into what the code is doing. My guess is that
GetNewHook
is set to some other value rather than InitialMallocHook_New, but is maybe
calling
this InitialMallocHook_New anyway?
Original comment by csilv...@gmail.com
on 24 May 2010 at 7:12
I executed some of your commands rather blindly. I'm somewhat of a gdb flunky
(I was spoiled by Visual C++
for years):
(gdb) bt
#0 InitialMallocHook_New (ptr=0x100580000, size=1) at src/malloc_hook.cc:218
#1 0x00000001000dbfa3 in tc_malloc (size=1) at malloc_hook-inl.h:98
#2 0x00000001000c92ad in TCMallocGuard::TCMallocGuard (this=<value temporarily
unavailable, due to
optimizations>) at src/tcmalloc.cc:766
#3 0x00000001000c9305 in TCMallocGuard::TCMallocGuard (this=<value temporarily
unavailable, due to
optimizations>) at src/tcmalloc.cc:781
#4 0x00000001000c936a in __static_initialization_and_destruction_0
(__initialize_p=<value temporarily
unavailable, due to optimizations>, __priority=<value temporarily unavailable,
due to optimizations>) at
src/tcmalloc.cc:794
#5 0x00000001000c93b6 in global constructors keyed to
_ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmall
oc_large_alloc_repor
t_thresholdE () at src/tcmalloc.cc:1486
#6 0x00007fff5fc0d500 in
__dyld__ZN16ImageLoaderMachO18doModInitFunctionsERKN11ImageLoader11LinkContextE
()
#7 0x00007fff5fc0bcec in
__dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#8 0x00007fff5fc0bc9d in
__dyld__ZN11ImageLoader23recursiveInitializationERKNS_11LinkContextEj ()
#9 0x00007fff5fc0bda6 in
__dyld__ZN11ImageLoader15runInitializersERKNS_11LinkContextE ()
#10 0x00007fff5fc0210e in __dyld__ZN4dyld24initializeMainExecutableEv ()
#11 0x00007fff5fc06981 in __dyld__ZN4dyld5_mainEPK12macho_headermiPPKcS5_S5_ ()
#12 0x00007fff5fc016d2 in
__dyld__ZN13dyldbootstrap5startEPK12macho_headeriPPKcl ()
#13 0x00007fff5fc01052 in __dyld__dyld_start ()
(gdb) s
MallocHook::GetNewHook () at
/Users/tycho/Development/google-perftools/src/malloc_hook-inl.h:212
212 reinterpret_cast<volatile const AtomicWordCastType*>(ptr));
(gdb) finish
Breakpoint 1, InitialMallocHook_New (ptr=0x100580008, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
Value returned is $1 = (void (*)(const void *, size_t)) 0x7fffffe00180
<__memory_barrier>
(gdb) print InitialMallocHook_New
$2 = {void (const void *, size_t)} 0x1000d550c <InitialMallocHook_New(void
const*, unsigned long)>
(gdb) n
InitialMallocHook_New (ptr=<value temporarily unavailable, due to
optimizations>, size=<value temporarily
unavailable, due to optimizations>) at src/malloc_hook.cc:220
220 }
(gdb) n
tc_malloc (size=1) at src/tcmalloc.cc:1303
1303 }
(gdb)
Original comment by neunon
on 24 May 2010 at 7:20
Thanks, that's helpful. The 'finish' output is a bit suspicious: it looks like
it's
returning a pointer to _memory_barrier, not to InitialMallocHook_New. I don't
know
if that's an optimization artifact or not, though. I'd guess not, based on the
fact
the == comparison is failing: code skips from line 218 to 220, never executing
line
219. So the prime theory now is that MallocHook::GetNewHook() is giving weird
(incorrect) results.
When you're in gdb and are at this breakpoint, can you print the value of
MallocHook::new_hook_? Also, try running
(gdb) print MallocHook::GetNewHook()
and if that succeeds, see if the two values match. And might as well print
InitialMallocHook_New, while you're at it.
btw, you're on an intel system, right? I think all recent OS X releases are
intel-
only (i386 or x86_64). That may make a difference, if you're not.
Here's another thing you might try: put a breakpoint at __dyld__dyld_start. In
that
breakpoint, print the value of MallocHook::new_hook_ (if it lets you). Then
you can
continue to your next breakpoint (the one you have now) and see if
MallocHook::new_hook_ has the same value.
Original comment by csilv...@gmail.com
on 24 May 2010 at 8:06
I'm running an Intel mac. It's one of the early 2007 ones, though, so it is
running a 32-bit kernel and 64-bit
userspace. Doubt that affects anything though.
And I seem to be spinning my wheels as far as gdb is concerned:
(gdb) inspect InitialMallocHook_New
$3 = {void (const void *, size_t)} 0x1000d550c <InitialMallocHook_New(void
const*, unsigned long)>
(gdb) inspect new_hook_
No symbol "new_hook_" in current context.
(gdb) inspect MallocHook::new_hook_
There is no field named new_hook_
(gdb) inspect MallocHook::GetNewHook()
Breakpoint 1, InitialMallocHook_New (ptr=0x1000d550c, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
The program being debugged stopped while in a function called from GDB.
When the function (Acquire_Load) is done executing, GDB will silently
stop (instead of continuing to evaluate the expression containing
the function call).
(gdb) cont
Continuing.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5fbfbb50
0x00007fff5fbfbb50 in ?? ()
(gdb) bt
#0 0x00007fff5fbfbb50 in ?? ()
#1 0x758d48087d8b48f0 in ?? ()
Original comment by neunon
on 25 May 2010 at 12:03
Sorry this is a bit of work to track down. But I'm confident we'll get to the
bottom
of it.
} inspect MallocHook::new_hook_
Sorry, I misread how this is declared. It's base::internal::new_hook_, not
MallocHook::new_hook_.
} inspect MallocHook::GetNewHook()
Try disabling the breakpoint ((gdb) disable 1) before running this command.
Original comment by csilv...@gmail.com
on 25 May 2010 at 4:07
I think I'm worthless at this.
(gdb) break InitialMallocHook_New
Breakpoint 1 at 0x20c49ba5e24515: file atomicops.h, line 212.
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) inspect base::internal::new_hook_
$1 = {
data_ = 4295841036
}
(gdb) inspect base::internal::new_hook_.data_
$2 = 4295841036
(gdb) inspect InitialMallocHook_New
$3 = {void (const void *, size_t)} 0x1000d550c <InitialMallocHook_New(void
const*, unsigned long)>
(gdb) disable 1
(gdb) inspect MallocHook::GetNewHook()
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5fbfbb50
0x00007fff5fbfbb50 in ?? ()
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function
(base::internal::AtomicPtr<void (*)(void const*, unsigned
long)>::Get() const) will be abandoned.
(gdb) bt
#0 0x00007fff5fbfbb50 in ?? ()
#1 0x758d48087d8b48f0 in ?? ()
(gdb)
Ugh. So, trying a different way:
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) break MallocHook::GetNewHook()
Note: breakpoint 1 (disabled) also set at pc 0x1000d5515.
Breakpoint 2 (inlined MallocHook::GetNewHook()) at 0x1000d5515: file
atomicops.h, line 211.
(gdb) enable 1
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Segmentation fault
Oops. Somehow actually made gdb crash. Neat. Attempt number 3:
(gdb) b InitialMallocHook_New
Breakpoint 1 at 0x20c49ba5e24515: file atomicops.h, line 212.
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
(gdb) finish
Run till exit from #0 InitialMallocHook_New (ptr=0x100580000, size=1) at
src/malloc_hook.cc:218
tc_malloc (size=1) at src/tcmalloc.cc:1303
1303 }
Value returned is $1 = (void (*)(const void *, size_t)) 0x7fffffe00180
<__memory_barrier>
(gdb) inspect MallocHook::GetNewHook()
Breakpoint 1, InitialMallocHook_New (ptr=0x100580000, size=1) at
src/malloc_hook.cc:218
218 if (MallocHook::GetNewHook() == &InitialMallocHook_New)
The program being debugged stopped while in a function called from GDB.
When the function (base::internal::AtomicPtr<void (*)(void const*, unsigned
long)>::Get() const) is done executing,
GDB will silently
stop (instead of continuing to evaluate the expression containing
the function call).
(gdb) cont
Continuing.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5fbfbb50
0x00007fff5fbfbb50 in ?? ()
(gdb) WHAT.
A syntax error in expression, near `.'.
(gdb)
*sigh* I don't _mind_ debugging this problem at all. I just am terribly bad at
it. I think I need to pick up a book on GDB
somewhere.
Original comment by neunon
on 25 May 2010 at 6:34
It's not you at all -- this is the way gdb behaves when something fundamental
is
wrong with the program being debugged, as is the case here. I don't know what
yet,
but we'll figure it out. Debugging tcmalloc is always tough, and debugging
optimized
code is always tough, so put the two together...
But your last run did turn up something useful:
} (gdb) inspect InitialMallocHook_New
} $3 = {void (const void *, size_t)} 0x1000d550c <InitialMallocHook_New(void
const*,
unsigned long)>
Great, InitialMallocHook_New is at 0x1000d550c
} (gdb) inspect base::internal::new_hook_.data_
} $2 = 4295841036
Converting this value to hex, we see that new_hook_ is set to 0x1000d550c as
well.
So that's all working correctly. This points to GetNewHook as being a suspect
--
especially since it doesn't like being run via gdb. Seems very suspicious...
Since we can't run gdb on it, the next step is to put a printf in
MallocHook::GetNewHook() - just have it do something like printf("GetNewHook:
%p\n",
new_hook_.get()), or whatever the return value from that function is. Then try
running again -- might as well run it in gdb -- and see what it's doing.
It may be that the AtomicPtr mac code has a bug in it that only shows up in opt
mode.
If so, we should see GetNewHook doing something wonky.
Original comment by csilv...@gmail.com
on 25 May 2010 at 7:17
I suspect the problem is that GetNewHook is inlined, and it refuses to run an
inlined function. So perhaps
dropping the inline keyword and then debugging would work. But printf _is_
always easier.
Original comment by neunon
on 25 May 2010 at 7:40
This shows something's definitely wonky:
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
MallocHook::GetNewHook returning 0x000d5266
Starting tracking the heap
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use
--heap_check="" to avoid
this conflict.
Program received signal SIGABRT, Aborted.
0x00007fff80b9b886 in __kill ()
(gdb) inspect InitialMallocHook_New
$1 = {void (const void *, size_t)} 0x1000d5266 <InitialMallocHook_New(void
const*, unsigned long)>
(gdb) inspect base::internal::new_hook_.data_
$2 = 4295834517
(gdb)
Original comment by neunon
on 25 May 2010 at 7:43
Oh, I bet I know what's wrong here. It looks like that pointer is correct,
EXCEPT for the 0x1 being missing. Pointer
truncation?
Original comment by neunon
on 25 May 2010 at 7:46
Yes, that's it! Good catch.
This is definitely a problem with the atomicops code. It's supposed to
auto-identify
whether pointers are 32 bits or 64, but it must be getting it wrong in your
setup.
The fact you have a 32-bit kernel and a 64-bit userspace possibly *is*
relevant,
then.
atomicops-internals-macosx.h has this code:
---
// MacOS uses long for intptr_t, AtomicWord and Atomic32 are always different
// on the Mac, even when they are the same size. Similarly, on __ppc64__,
// AtomicWord and Atomic64 are always different. Thus, we need explicit
// casting.
#ifdef __LP64__
#define AtomicWordCastType base::subtle::Atomic64
#else
#define AtomicWordCastType Atomic32
#endif
---
__LP64__ should be defined for your system, since you have a 64-bit userspace.
Can
you confirm this? Try running
cpp -E -dMM /dev/null | grep LP64
and see what it says.
Original comment by csilv...@gmail.com
on 25 May 2010 at 8:20
Yes, I started investigating this problem, and I know __LP64__ is definitely
defined:
tycho@alcarin ~/Development/google-perftools $ gcc -E -dM -xc /dev/null | grep
LP64
#define __LP64__ 1
#define _LP64 1
I also looked for any weird casts to Atomic32 or anything. I didn't see any. I
checked to see that AtomicWord
was properly typedef'd, and it's intptr_t which should be correct. I tried
changing it to unsigned long long for
the heck of it, but that had no effect. There's got to be a cast to a 32-bit
integer somewhere in there.
I think one of the very first things to look at is why the -O0 -ggdb version
didn't bomb. It certainly should
have, and fixing this would assist with debugging.
Here's -O0 -ggdb:
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
MallocHook::GetNewHook returning 0x000d972a
MallocHook::GetNewHook returning 0x000d972a
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
Starting tracking the heap
MallocHook::GetNewHook returning 0x000d6ffd
MallocHook::GetNewHook returning 0x000d6ffd
MallocHook::GetNewHook returning 0x000d6ffd
[ above lines repeated lots of times. ]
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
Starting tracking the heap
MallocHook::GetNewHook returning 0x000d6ffd
MallocHook::GetNewHook returning 0x000d6ffd
MallocHook::GetNewHook returning 0x000d6ffd
[ above lines repeated lots of times. ]
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
MallocHook::GetNewHook returning 0x00000000
[ above lines repeated lots of times. ]
DONE.
Program exited normally.
(gdb)
Original comment by neunon
on 25 May 2010 at 8:31
Once thing to make sure of -- which has happened to me before -- is that
there's not a
bug in the printf statement. Maybe it's the one truncating to 32 bits somehow?
Original comment by csilv...@gmail.com
on 25 May 2010 at 9:32
Nope, printf is fine. I tested it and thought, "wait a second". I went back to
check the printf statement I used,
and apparently I used 0x%08x instead of %p. It oddly printed no compiler
warning (which usually happens
when the format doesn't match), but I should have noticed that mistake right
off the bat. Back to square one:
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
MallocHook::GetNewHook returning 0x1000d5266
Starting tracking the heap
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use
--heap_check="" to avoid
this conflict.
Program received signal SIGABRT, Aborted.
0x00007fff80b9b886 in __kill ()
(gdb)
More proof I'm made of fail. Anyway, pointer truncation is off the list of
possibilities.
Original comment by neunon
on 25 May 2010 at 9:44
Ah well, it was a good theory while it lasted...
Going back to comment 21, it says:
MallocHook::GetNewHook returning 0x1000d5266
and
$1 = {void (const void *, size_t)} 0x1000d5266 <InitialMallocHook_New(void const*,
unsigned long)>
but
(gdb) inspect base::internal::new_hook_.data_
$2 = 4295834517
which is 0x1000d3b95. Now hew_hook_ is different! Blagh.
I think it's time for more printfs, to see what the program is *really* doing.
Let's instrument InitialMallocHook_New, to printf MallocHook::GetNewHook(),
&InitialMallocHook_New, and also to printf inside the if (if we're setting the
new-
hook to NULL). That will tell us if this is failing to run for some reason.
Original comment by csilv...@gmail.com
on 25 May 2010 at 11:05
You won't believe this. Adding printfs caused it to just _work_:
tycho@alcarin ~/Development/google-perftools/.libs $ DYLD_LIBRARY_PATH="." gdb
./heap-profiler_unittest
GNU gdb 6.3.50-20050815 (Apple version gdb-1461.2) (Fri Mar 5 04:43:10 UTC
2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared
libraries .... done
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
MallocHook::GetNewHook returning 0x1000d5289
MallocHook::GetNewHook returning 0x1000d5289
InitialMallocHook_New: MallocHook::GetNewHook() 0x1000d5289, IMH_N 0x1000d5289
MallocHook::GetNewHook returning 0x1000d5289
InitialMallocHook_New: Trying to MallocHook::SetNewHook(NULL)
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
Starting tracking the heap
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
[ tons of the above repeated ]
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
Starting tracking the heap
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
MallocHook::GetNewHook returning 0x1000d3aa1
[ tons of the above repeated ]
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
[ repeated ad nauseum ]
DONE.
Program exited normally.
(gdb)
Here are my only changes:
diff --git a/src/malloc_hook-inl.h b/src/malloc_hook-inl.h
index a629691..4ce419c 100644
--- a/src/malloc_hook-inl.h
+++ b/src/malloc_hook-inl.h
@@ -90,6 +90,7 @@ extern AtomicPtr<MallocHook::SbrkHook> sbrk_hook_;
} } // namespace base::internal
inline MallocHook::NewHook MallocHook::GetNewHook() {
+ printf("MallocHook::GetNewHook returning %p\n",
base::internal::new_hook_.Get());
return base::internal::new_hook_.Get();
}
diff --git a/src/malloc_hook.cc b/src/malloc_hook.cc
index 4315b86..463c6b1 100644
--- a/src/malloc_hook.cc
+++ b/src/malloc_hook.cc
@@ -215,8 +215,12 @@ MallocHook_SbrkHook
MallocHook_SetSbrkHook(MallocHook_SbrkHook hook) {
// TODO(csilvers): add support for removing a hook from the middle of a chain.
void InitialMallocHook_New(const void* ptr, size_t size) {
- if (MallocHook::GetNewHook() == &InitialMallocHook_New)
+ printf("InitialMallocHook_New: MallocHook::GetNewHook() %p, IMH_N %p\n",
+ MallocHook::GetNewHook(), &InitialMallocHook_New);
+ if (MallocHook::GetNewHook() == &InitialMallocHook_New) {
+ printf("InitialMallocHook_New: Trying to MallocHook::SetNewHook(NULL)\n");
MallocHook::SetNewHook(NULL);
+ }
}
void InitialMallocHook_PreMMap(const void* start,
I verified that doing 'git stash' on my changes (to basically get back to a
clean working tree, temporarily) causes the
bug to resurface.
I think we're looking at a GCC code generation bug, here. How fun.
The interesting thing is that this must be a very long-standing bug because I'm
able to repeat this with a much more
recent, non-Apple GCC. The one I've been using for all the tests above is
"4.2.1 (Apple Inc. build 5659)", but my
MacPorts-installed version of GCC 4.5.0 at least partially exhibits the same
behavior:
tycho@alcarin ~/Development/google-perftools/.libs $ DYLD_LIBRARY_PATH="." gdb
./heap-profiler_unittest
GNU gdb 6.3.50-20050815 (Apple version gdb-1461.2) (Fri Mar 5 04:43:10 UTC
2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared
libraries .
gdb stack crawl at point of internal error:
0 gdb-i386-apple-darwin 0x0000000100107e51 internal_vproblem +
308
1 gdb-i386-apple-darwin 0x000000010010802b internal_verror + 27
2 gdb-i386-apple-darwin 0x00000001001080c9 align_down + 0
3 gdb-i386-apple-darwin 0x00000001000b260c
find_partial_die_in_comp_unit + 79
4 gdb-i386-apple-darwin 0x00000001000be03b find_partial_die +
628
5 gdb-i386-apple-darwin 0x00000001000be088 fixup_partial_die +
55
6 gdb-i386-apple-darwin 0x00000001000be749 scan_partial_symbols
+ 58
7 gdb-i386-apple-darwin 0x00000001000bf60d
dwarf2_build_psymtabs + 2982
8 gdb-i386-apple-darwin 0x00000001001457a4 macho_symfile_read +
292
9 gdb-i386-apple-darwin 0x000000010004b838 syms_from_objfile +
1403
10 gdb-i386-apple-darwin 0x000000010004c2c0
symbol_file_add_with_addrs_or_offsets_using_objfile +
753
11 gdb-i386-apple-darwin 0x000000010004c277
symbol_file_add_with_addrs_or_offsets_using_objfile +
680
12 gdb-i386-apple-darwin 0x000000010004cc45
symbol_file_add_bfd_helper + 84
13 gdb-i386-apple-darwin 0x000000010006e858 catch_errors + 70
14 gdb-i386-apple-darwin 0x0000000100048f77
symbol_file_add_bfd_safe + 187
/SourceCache/gdb/gdb-1461.2/src/gdb/dwarf2read.c:8277: internal-error: could
not find partial DIE in cache
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) y
Er, whoops. Let's try that without GDB:
tycho@alcarin ~/Development/google-perftools/.libs $ DYLD_LIBRARY_PATH="."
./heap-profiler_unittest
MallocHook::GetNewHook returning 0x1000d3794
MallocHook::GetNewHook returning 0x1000d3794
InitialMallocHook_New: MallocHook::GetNewHook() 0x1000d3794, IMH_N 0x1000d3794
MallocHook::GetNewHook returning 0x1000d3794
InitialMallocHook_New: Trying to MallocHook::SetNewHook(NULL)
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
Starting tracking the heap
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
[snip more of the above]
MallocHook::GetNewHook returning 0x0
MallocHook::GetNewHook returning 0x0
Starting tracking the heap
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
MallocHook::GetNewHook returning 0x1000d22e1
Segmentation fault
And without the printfs added:
tycho@alcarin ~/Development/google-perftools/.libs $ DYLD_LIBRARY_PATH="."
./heap-profiler_unittest
Starting tracking the heap
Had other new/delete MallocHook-s set. Are you using the heap leak checker? Use
--heap_check="" to avoid this
conflict.
Abort trap
I'm intrigued.
Original comment by neunon
on 25 May 2010 at 11:34
Bummer. Having the printfs change the behavior is too bad.
} I think we're looking at a GCC code generation bug, here. How fun.
Possibly, but not necessarily. Adding printfs also changes things like
inlining,
which can affect correctness. This is especially true in the presence of the
AtomicWord code, which uses barriers to ensure correct memory access. I think
(assume) function calls serve as memory barriers, so a different inlining
decision
could totally affect correctness.
One way to test this theory is to put __attribute__((noinline)) in front of
InitialMallocHook_New -- I'm not sure of the exact syntax -- and see if that
makes a
difference in the program crashing or not. You can also (or alternately) try
__attribute__((inline)) with the printf's, and see if that makes a difference.
I'm not sure gcc 4.5.0 is so reliable, btw -- testing with 4.4.2 or something
might
give better results.
Original comment by csilv...@gmail.com
on 26 May 2010 at 3:49
Well, I would have tested GCC 4.3.4 or 4.4.4, but both of those give me:
Undefined symbols:
"___emutls_get_address"
Anyway, with regards to the behavior changes. I don't know why you'd want
__attribute__((noinline)) on
InitialMallocHook_New, since noinline would affect the inlining behavior of
InitialMallocHook_New, not the
functions it's calling (which are probably what's being inlined). And throwing
__attribute__((inline)) before the
printfs isn't valid syntax, either:
src/malloc_hook.cc: In function ‘void InitialMallocHook_New(const void*,
size_t)’:
src/malloc_hook.cc:218: error: expected primary-expression before
‘__attribute__’
src/malloc_hook.cc:218: error: expected `;' before ‘__attribute__’
src/malloc_hook.cc:221: error: expected primary-expression before
‘__attribute__’
src/malloc_hook.cc:221: error: expected `;' before ‘__attribute__’
I tried adding __attribute__((noinline)) to InitialMallocHook_New() as well as
MallocHook::GetNewHook(), but
neither had any effect.
Original comment by neunon
on 26 May 2010 at 8:00
The inlining theory was that adding printfs to InitialMallocHook_New made it so
code
that used to inline it (elsewhere in malloc_hook.cc), no longer did. It was
kinda
grasping at straws.
Now we have to get into a bit more black magic. Well, first, let's do some
light-
gray magic: keep the printfs in InitialMallocHook_New, but remove the one in
GetMallocHook. Does the test still pass? Maybe try it the other way around
too.
If the tests still pass, the next thing to do is to keep the version with
printf's in
it, and then use a hex editor (!) on the .so file to replace the printf
statements
with noops. You can figure out what to do that using the 'disassemble
InitialMallocHook_New' command in gdb. This is somewhat tricky surgery, so if
you
want, you can attach the .so to this bug report, and I can try editing it and
sending
it back.
Original comment by csilv...@gmail.com
on 26 May 2010 at 5:45
It passes if I keep the printfs in InitialMallocHook_New, but fails if I remove
them. The one in GetMallocHook
doesn't affect it either way.
I'd nop the printfs myself, but I don't know of a decent disassembler on Mac,
so I wouldn't know where to look
for the printfs calls. On Windows, I'd rely on IDA to give me the location.
(Speaking of which, how would _you_ do
it on Mac?)
Original comment by neunon
on 26 May 2010 at 7:57
Attachments:
Oops, I've run out of time in my day. I'll try this tomorrow. If you want to
give
it a go earlier, feel free.
The disassembler I use is the one built into gdb. You can say
(gdb) disassemble InitialMallocHook_New
That will give you the byte-offsets to hack, but in the form of memory
addresses,
which you'll have to adjust. In linux, I'd do this by running ldd to see where
libtcmalloc.so was mapped into memory. I'd have to figure out how to do it on
OS X
(for Mach). Maybe ldd would work as well. Or otool or something.
Then I'd use something like khexedit to actually do the editing. Or maybe even
emacs. :-)
Original comment by csilv...@gmail.com
on 26 May 2010 at 10:12
Aha, didn't know about GDB's disassembler, but it makes sense.
Anyway, otool wasn't helpful, but gdb helped on both counts:
(gdb) info sharedlib
The DYLD shared library state has been initialized from the executable's shared
library information. All symbols should be present, but the addresses of some
symbols may move when the program is executed, as DYLD may relocate library
load addresses if necessary.
Requested State Current State
Num Basename Type Address Reason | | Source
| | | | | | | |
1 heap-profiler_unittest - - exec Y Y /Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest (offset 0x0)
2 dyld - - init Y Y /usr/lib/dyld at 0xfe24e7331a000 (offset 0xfe24e7331a000) with prefix "__dyld_"
3 libtcmalloc.0.dylib - - init Y Y /opt/perftools/lib/libtcmalloc.0.dylib at 0x20c49ba5e16000 (offset 0x20c49ba5e16000)
(objfile is) /Users/tycho/Development/google-perftools/.libs/libtcmalloc.0.0.0.dylib
(dSYM file is) /Users/tycho/Development/google-
perftools/.libs/libtcmalloc.0.0.0.dylib.dSYM/Contents/Resources/DWARF/libtcmallo
c.0.0.0.dylib
4 libstdc++.6.0.9.dylib - - init Y Y /usr/lib/libstdc++.6.0.9.dylib at 0x3126e978cd0000 (offset 0x3126e978cd0000)
5 libSystem.B.dylib - - init Y Y /usr/lib/libSystem.B.dylib at 0x4189374baa8000 (offset 0x4189374baa8000)
(commpage objfile is) /usr/lib/libSystem.B.dylib[LC_SEGMENT.__DATA.__commpage]
(gdb) disassemble InitialMallocHook_New
Dump of assembler code for function _Z21InitialMallocHook_NewPKvm:
0x0020c49ba5e24440 <_Z21InitialMallocHook_NewPKvm+0>: push %rbp
0x0020c49ba5e24441 <_Z21InitialMallocHook_NewPKvm+1>: mov %rsp,%rbp
0x0020c49ba5e24444 <_Z21InitialMallocHook_NewPKvm+4>: mov %rbx,-0x10(%rbp)
0x0020c49ba5e24448 <_Z21InitialMallocHook_NewPKvm+8>: mov %r12,-0x8(%rbp)
0x0020c49ba5e2444c <_Z21InitialMallocHook_NewPKvm+12>: sub $0x10,%rsp
0x0020c49ba5e24450 <_ZNK4base8internal9AtomicPtrIPFvPKvmEE3GetEv+0>: mov
0x10461(%rip),%r12 # 0x20c49ba5e348b8
<_ZN4base8internal9new_hook_E>
0x0020c49ba5e24457 <MemoryBarrier+0>: callq 0x20c49ba5e2865c
<dyld_stub_OSMemoryBarrier>
0x0020c49ba5e2445c <_Z21InitialMallocHook_NewPKvm+28>: mov 0xfc0d(%rip),%rbx
# 0x20c49ba5e34070
0x0020c49ba5e24463 <_Z21InitialMallocHook_NewPKvm+35>: mov %rbx,%rdx
0x0020c49ba5e24466 <_Z21InitialMallocHook_NewPKvm+38>: mov %r12,%rsi
0x0020c49ba5e24469 <_Z21InitialMallocHook_NewPKvm+41>: lea 0x5bf0(%rip),%rdi
# 0x20c49ba5e2a060 <_ZTS15MallocExtension+25>
0x0020c49ba5e24470 <_Z21InitialMallocHook_NewPKvm+48>: mov $0x0,%eax
0x0020c49ba5e24475 <_Z21InitialMallocHook_NewPKvm+53>: callq 0x20c49ba5e28902
<dyld_stub_printf>
0x0020c49ba5e2447a <_Z21InitialMallocHook_NewPKvm+58>: cmp %rbx,%r12
0x0020c49ba5e2447d <_Z21InitialMallocHook_NewPKvm+61>: jne 0x20c49ba5e24495
<_Z21InitialMallocHook_NewPKvm+85>
0x0020c49ba5e2447f <_Z21InitialMallocHook_NewPKvm+63>: lea 0x5c1a(%rip),%rdi
# 0x20c49ba5e2a0a0 <_ZTS15MallocExtension+89>
0x0020c49ba5e24486 <_Z21InitialMallocHook_NewPKvm+70>: callq 0x20c49ba5e2892c
<dyld_stub_puts>
0x0020c49ba5e2448b <_ZN10MallocHook10SetNewHookEPFvPKvmE+0>: mov $0x0,%edi
0x0020c49ba5e24490 <_ZN10MallocHook10SetNewHookEPFvPKvmE+5>: callq
0x20c49ba5e1aa5e <MallocHook_SetNewHook>
0x0020c49ba5e24495 <_Z21InitialMallocHook_NewPKvm+85>: mov (%rsp),%rbx
0x0020c49ba5e24499 <_Z21InitialMallocHook_NewPKvm+89>: mov 0x8(%rsp),%r12
0x0020c49ba5e2449e <_Z21InitialMallocHook_NewPKvm+94>: leaveq
0x0020c49ba5e2449f <_Z21InitialMallocHook_NewPKvm+95>: retq
End of assembler dump.
I changed the printf calls slightly because GetNewHook was being called twice
(once in the printf(), once in the 'if'). I also removed the printf in
GetNewHook(). So now
it's this:
void InitialMallocHook_New(const void* ptr, size_t size) {
MallocHook::NewHook hook = MallocHook::GetNewHook();
printf("InitialMallocHook_New: MallocHook::GetNewHook() %p, IMH_N %p\n",
hook, &InitialMallocHook_New);
if (hook == &InitialMallocHook_New) {
printf("InitialMallocHook_New: Trying to MallocHook::SetNewHook(NULL)\n");
MallocHook::SetNewHook(NULL);
}
}
The above is behaviourally identical (it passes the test), but it's easier to
nop to death.
The end result of my nopping:
Dump of assembler code for function _Z21InitialMallocHook_NewPKvm:
0x0020c49ba5e24440 <_Z21InitialMallocHook_NewPKvm+0>: push %rbp
0x0020c49ba5e24441 <_Z21InitialMallocHook_NewPKvm+1>: mov %rsp,%rbp
0x0020c49ba5e24444 <_Z21InitialMallocHook_NewPKvm+4>: mov %rbx,-0x10(%rbp)
0x0020c49ba5e24448 <_Z21InitialMallocHook_NewPKvm+8>: mov %r12,-0x8(%rbp)
0x0020c49ba5e2444c <_Z21InitialMallocHook_NewPKvm+12>: sub $0x10,%rsp
0x0020c49ba5e24450 <_ZNK4base8internal9AtomicPtrIPFvPKvmEE3GetEv+0>: mov
0x10461(%rip),%r12 # 0x20c49ba5e348b8
<_ZN4base8internal9new_hook_E>
0x0020c49ba5e24457 <MemoryBarrier+0>: callq 0x20c49ba5e2865c
<dyld_stub_OSMemoryBarrier>
0x0020c49ba5e2445c <_Z21InitialMallocHook_NewPKvm+28>: nop
0x0020c49ba5e2445d <_Z21InitialMallocHook_NewPKvm+29>: nop
0x0020c49ba5e2445e <_Z21InitialMallocHook_NewPKvm+30>: nop
0x0020c49ba5e2445f <_Z21InitialMallocHook_NewPKvm+31>: nop
0x0020c49ba5e24460 <_Z21InitialMallocHook_NewPKvm+32>: nop
0x0020c49ba5e24461 <_Z21InitialMallocHook_NewPKvm+33>: nop
0x0020c49ba5e24462 <_Z21InitialMallocHook_NewPKvm+34>: nop
0x0020c49ba5e24463 <_Z21InitialMallocHook_NewPKvm+35>: nop
0x0020c49ba5e24464 <_Z21InitialMallocHook_NewPKvm+36>: nop
0x0020c49ba5e24465 <_Z21InitialMallocHook_NewPKvm+37>: nop
0x0020c49ba5e24466 <_Z21InitialMallocHook_NewPKvm+38>: nop
0x0020c49ba5e24467 <_Z21InitialMallocHook_NewPKvm+39>: nop
0x0020c49ba5e24468 <_Z21InitialMallocHook_NewPKvm+40>: nop
0x0020c49ba5e24469 <_Z21InitialMallocHook_NewPKvm+41>: nop
0x0020c49ba5e2446a <_Z21InitialMallocHook_NewPKvm+42>: nop
0x0020c49ba5e2446b <_Z21InitialMallocHook_NewPKvm+43>: nop
0x0020c49ba5e2446c <_Z21InitialMallocHook_NewPKvm+44>: nop
0x0020c49ba5e2446d <_Z21InitialMallocHook_NewPKvm+45>: nop
0x0020c49ba5e2446e <_Z21InitialMallocHook_NewPKvm+46>: nop
0x0020c49ba5e2446f <_Z21InitialMallocHook_NewPKvm+47>: nop
0x0020c49ba5e24470 <_Z21InitialMallocHook_NewPKvm+48>: nop
0x0020c49ba5e24471 <_Z21InitialMallocHook_NewPKvm+49>: nop
0x0020c49ba5e24472 <_Z21InitialMallocHook_NewPKvm+50>: nop
0x0020c49ba5e24473 <_Z21InitialMallocHook_NewPKvm+51>: nop
0x0020c49ba5e24474 <_Z21InitialMallocHook_NewPKvm+52>: nop
0x0020c49ba5e24475 <_Z21InitialMallocHook_NewPKvm+53>: nop
0x0020c49ba5e24476 <_Z21InitialMallocHook_NewPKvm+54>: nop
0x0020c49ba5e24477 <_Z21InitialMallocHook_NewPKvm+55>: nop
0x0020c49ba5e24478 <_Z21InitialMallocHook_NewPKvm+56>: nop
0x0020c49ba5e24479 <_Z21InitialMallocHook_NewPKvm+57>: nop
0x0020c49ba5e2447a <_Z21InitialMallocHook_NewPKvm+58>: cmp %rbx,%r12
0x0020c49ba5e2447d <_Z21InitialMallocHook_NewPKvm+61>: jne 0x20c49ba5e24495
<_Z21InitialMallocHook_NewPKvm+85>
0x0020c49ba5e2447f <_Z21InitialMallocHook_NewPKvm+63>: lea 0x5c1a(%rip),%rdi
# 0x20c49ba5e2a0a0 <_ZTS15MallocExtension+89>
0x0020c49ba5e24486 <_Z21InitialMallocHook_NewPKvm+70>: nop
0x0020c49ba5e24487 <_Z21InitialMallocHook_NewPKvm+71>: nop
0x0020c49ba5e24488 <_Z21InitialMallocHook_NewPKvm+72>: nop
0x0020c49ba5e24489 <_Z21InitialMallocHook_NewPKvm+73>: nop
0x0020c49ba5e2448a <_Z21InitialMallocHook_NewPKvm+74>: nop
0x0020c49ba5e2448b <_ZN10MallocHook10SetNewHookEPFvPKvmE+0>: mov $0x0,%edi
0x0020c49ba5e24490 <_ZN10MallocHook10SetNewHookEPFvPKvmE+5>: callq
0x20c49ba5e1aa5e <MallocHook_SetNewHook>
0x0020c49ba5e24495 <_Z21InitialMallocHook_NewPKvm+85>: mov (%rsp),%rbx
0x0020c49ba5e24499 <_Z21InitialMallocHook_NewPKvm+89>: mov 0x8(%rsp),%r12
0x0020c49ba5e2449e <_Z21InitialMallocHook_NewPKvm+94>: leaveq
0x0020c49ba5e2449f <_Z21InitialMallocHook_NewPKvm+95>: retq
End of assembler dump.
And the runtime result:
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Starting tracking the heap
Starting tracking the heap
DONE.
Program exited normally.
(gdb)
Woo, now what? (Attached the nopped version)
Original comment by neunon
on 27 May 2010 at 12:36
Attachments:
Oh, and here's the disassembly of a version compiled without any of the printfs:
(gdb) disassemble InitialMallocHook_New
Dump of assembler code for function _Z21InitialMallocHook_NewPKvm:
0x0020c49ba5e2450c <_Z21InitialMallocHook_NewPKvm+0>: push %rbp
0x0020c49ba5e2450d <_Z21InitialMallocHook_NewPKvm+1>: mov %rsp,%rbp
0x0020c49ba5e24510 <_Z21InitialMallocHook_NewPKvm+4>: push %rbx
0x0020c49ba5e24511 <_Z21InitialMallocHook_NewPKvm+5>: sub $0x8,%rsp
0x0020c49ba5e24515 <_ZNK4base8internal9AtomicPtrIPFvPKvmEE3GetEv+0>: mov
0x1037c(%rip),%rbx
# 0x20c49ba5e34898 <_ZN4base8internal9new_hook_E>
0x0020c49ba5e2451c <MemoryBarrier+0>: callq 0x20c49ba5e286f8
<dyld_stub_OSMemoryBarrier>
0x0020c49ba5e24521 <_Z21InitialMallocHook_NewPKvm+21>: cmp -0x1c(%rip),%rbx
#
0x20c49ba5e2450c <_Z21InitialMallocHook_NewPKvm>
0x0020c49ba5e24528 <_Z21InitialMallocHook_NewPKvm+28>: jne 0x20c49ba5e24534
<_Z21InitialMallocHook_NewPKvm+40>
0x0020c49ba5e2452a <_ZN10MallocHook10SetNewHookEPFvPKvmE+0>: mov $0x0,%edi
0x0020c49ba5e2452f <_ZN10MallocHook10SetNewHookEPFvPKvmE+5>: callq
0x20c49ba5e1ab2a
<MallocHook_SetNewHook>
0x0020c49ba5e24534 <_Z21InitialMallocHook_NewPKvm+40>: add $0x8,%rsp
0x0020c49ba5e24538 <_Z21InitialMallocHook_NewPKvm+44>: pop %rbx
0x0020c49ba5e24539 <_Z21InitialMallocHook_NewPKvm+45>: leaveq
0x0020c49ba5e2453a <_Z21InitialMallocHook_NewPKvm+46>: retq
End of assembler dump.
(gdb)
Original comment by neunon
on 27 May 2010 at 12:38
} Woo, now what? (Attached the nopped version)
Well, you figured that part out already. :-) Now we just need to compare the
working
and non-working versions. I'm not asm expert, but I have to say, they both
look fine
to me. :-( Let me see if I can find someone around here who can look more
deeply.
Of course, it's possible that adding the printfs causes routines outside of
InitialMallocHook_New to be compiled differently. The way to test that would
be to
taking a working libtcmalloc adn replace its InitialMallocHook_New with the
'broken'
assembly (along with noops at the end to keep the function the same size).
That's
pretty delicate surgery, though, since all the offsets will change. Also,
you'd have
to do it in hex. :-} Maybe it's best to have someone look at this asm first
and see
if anything looks amiss.
Original comment by csilv...@gmail.com
on 27 May 2010 at 6:32
OK, several people who have looked at the noop code are confused: they claim
that %rbx
is never set to anything. "Are you sure you didn't go too crazy with the
nops?" one
asked.
If they're right, it's surprising the nop-version of the code ever worked. Can
you
double-check that part?
Original comment by csilv...@gmail.com
on 27 May 2010 at 8:50
Yes, I did go too crazy with the nops. I had to rebuild the code so some
addresses may have changed.
This is probably more correct:
(gdb) disassemble InitialMallocHook_New
Dump of assembler code for function _Z21InitialMallocHook_NewPKvm:
0x0020c49ba5e24440 <_Z21InitialMallocHook_NewPKvm+0>: push %rbp
0x0020c49ba5e24441 <_Z21InitialMallocHook_NewPKvm+1>: mov %rsp,%rbp
0x0020c49ba5e24444 <_Z21InitialMallocHook_NewPKvm+4>: mov %rbx,-0x10(%rbp)
0x0020c49ba5e24448 <_Z21InitialMallocHook_NewPKvm+8>: mov %r12,-0x8(%rbp)
0x0020c49ba5e2444c <_Z21InitialMallocHook_NewPKvm+12>: sub $0x10,%rsp
0x0020c49ba5e24450 <_ZNK4base8internal9AtomicPtrIPFvPKvmEE3GetEv+0>: mov
0x10461(%rip),%r12
# 0x20c49ba5e348b8 <_ZN4base8internal9new_hook_E>
0x0020c49ba5e24457 <MemoryBarrier+0>: callq 0x20c49ba5e2865c
<dyld_stub_OSMemoryBarrier>
0x0020c49ba5e2445c <_Z21InitialMallocHook_NewPKvm+28>: mov 0xfc0d(%rip),%rbx
#
0x20c49ba5e34070
0x0020c49ba5e24463 <_Z21InitialMallocHook_NewPKvm+35>: nop
0x0020c49ba5e24464 <_Z21InitialMallocHook_NewPKvm+36>: nop
0x0020c49ba5e24465 <_Z21InitialMallocHook_NewPKvm+37>: nop
0x0020c49ba5e24466 <_Z21InitialMallocHook_NewPKvm+38>: nop
0x0020c49ba5e24467 <_Z21InitialMallocHook_NewPKvm+39>: nop
0x0020c49ba5e24468 <_Z21InitialMallocHook_NewPKvm+40>: nop
0x0020c49ba5e24469 <_Z21InitialMallocHook_NewPKvm+41>: nop
0x0020c49ba5e2446a <_Z21InitialMallocHook_NewPKvm+42>: nop
0x0020c49ba5e2446b <_Z21InitialMallocHook_NewPKvm+43>: nop
0x0020c49ba5e2446c <_Z21InitialMallocHook_NewPKvm+44>: nop
0x0020c49ba5e2446d <_Z21InitialMallocHook_NewPKvm+45>: nop
0x0020c49ba5e2446e <_Z21InitialMallocHook_NewPKvm+46>: nop
0x0020c49ba5e2446f <_Z21InitialMallocHook_NewPKvm+47>: nop
0x0020c49ba5e24470 <_Z21InitialMallocHook_NewPKvm+48>: nop
0x0020c49ba5e24471 <_Z21InitialMallocHook_NewPKvm+49>: nop
0x0020c49ba5e24472 <_Z21InitialMallocHook_NewPKvm+50>: nop
0x0020c49ba5e24473 <_Z21InitialMallocHook_NewPKvm+51>: nop
0x0020c49ba5e24474 <_Z21InitialMallocHook_NewPKvm+52>: nop
0x0020c49ba5e24475 <_Z21InitialMallocHook_NewPKvm+53>: nop
0x0020c49ba5e24476 <_Z21InitialMallocHook_NewPKvm+54>: nop
0x0020c49ba5e24477 <_Z21InitialMallocHook_NewPKvm+55>: nop
0x0020c49ba5e24478 <_Z21InitialMallocHook_NewPKvm+56>: nop
0x0020c49ba5e24479 <_Z21InitialMallocHook_NewPKvm+57>: nop
0x0020c49ba5e2447a <_Z21InitialMallocHook_NewPKvm+58>: cmp %rbx,%r12
0x0020c49ba5e2447d <_Z21InitialMallocHook_NewPKvm+61>: jne 0x20c49ba5e24495
<_Z21InitialMallocHook_NewPKvm+85>
0x0020c49ba5e2447f <_Z21InitialMallocHook_NewPKvm+63>: nop
0x0020c49ba5e24480 <_Z21InitialMallocHook_NewPKvm+64>: nop
0x0020c49ba5e24481 <_Z21InitialMallocHook_NewPKvm+65>: nop
0x0020c49ba5e24482 <_Z21InitialMallocHook_NewPKvm+66>: nop
0x0020c49ba5e24483 <_Z21InitialMallocHook_NewPKvm+67>: nop
0x0020c49ba5e24484 <_Z21InitialMallocHook_NewPKvm+68>: nop
0x0020c49ba5e24485 <_Z21InitialMallocHook_NewPKvm+69>: nop
0x0020c49ba5e24486 <_Z21InitialMallocHook_NewPKvm+70>: nop
0x0020c49ba5e24487 <_Z21InitialMallocHook_NewPKvm+71>: nop
0x0020c49ba5e24488 <_Z21InitialMallocHook_NewPKvm+72>: nop
0x0020c49ba5e24489 <_Z21InitialMallocHook_NewPKvm+73>: nop
0x0020c49ba5e2448a <_Z21InitialMallocHook_NewPKvm+74>: nop
0x0020c49ba5e2448b <_ZN10MallocHook10SetNewHookEPFvPKvmE+0>: mov $0x0,%edi
0x0020c49ba5e24490 <_ZN10MallocHook10SetNewHookEPFvPKvmE+5>: callq
0x20c49ba5e1aa5e
<MallocHook_SetNewHook>
0x0020c49ba5e24495 <_Z21InitialMallocHook_NewPKvm+85>: mov (%rsp),%rbx
0x0020c49ba5e24499 <_Z21InitialMallocHook_NewPKvm+89>: mov 0x8(%rsp),%r12
0x0020c49ba5e2449e <_Z21InitialMallocHook_NewPKvm+94>: leaveq
0x0020c49ba5e2449f <_Z21InitialMallocHook_NewPKvm+95>: retq
End of assembler dump.
(gdb)
Now to test...
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Starting tracking the heap
Starting tracking the heap
DONE.
Program exited normally.
(gdb)
Huh.
Original comment by neunon
on 27 May 2010 at 9:34
Attachments:
Some of the asm experts here are starting to think about workarounds. One is
to put
this instruction at the top of InitialMallocHook_New:
__asm__ __volatile__ ("" : : "r" (&InitialMallocHook_New));
The other is to rewrite the function like this:
void InitialMallocHook_New(const void* ptr, size_t size) {
volatile MallocHook_NewHook self = &InitialMallocHook_New;
if (MallocHook::GetNewHook() == self)
MallocHook::SetNewHook(NULL);
}
Both have the effect of forcing the address of InitialMallocHook_New to be in a
register, which avoids the buggy gcc optimization path.
The bug, btw, is that gcc isn't indirecting the address of
InitialMallocHook_New
properly. So instead of checking if new_hook_ points to InitialMallocHook_New,
it
checks if the 8 bytes of new_hook_ equal the first 8 bytes of the
InitialMallocHook_New function (!)
Let me know how either/both fixes work for you.
Original comment by csilv...@gmail.com
on 28 May 2010 at 9:07
I won't be able to check it until late this evening.
So it's definitely a GCC bug. Grand. We need to figure out what exactly is
going on
here. If it happens here, it could happen other places, and that means we can't
trust
what GCC is doing. Ultimately, this needs to be reported to the GCC team. I
wonder how
we can get a reduced testcase out of this...
Original comment by neunon
on 28 May 2010 at 11:51
A gcc is plausible, but by no means definite. The code could still have
undefined
behavior in it somewhere, or something. We'd have to go through what gcc is
doing,
to be sure. That said, there are definitely (hopefully reliable) workarounds
that
can get the test to pass.
We've tried creating a reduced testcase around here, with no luck so far. But
assuming we can get a good test case, we'll look to report it to the gcc folks.
Original comment by csilv...@gmail.com
on 29 May 2010 at 2:53
Awesome.
tycho@alcarin ~/Development/google-perftools/.libs $ git diff
diff --git a/src/malloc_hook.cc b/src/malloc_hook.cc
index 4315b86..915500b 100644
--- a/src/malloc_hook.cc
+++ b/src/malloc_hook.cc
@@ -215,8 +215,10 @@ MallocHook_SbrkHook MallocHook_SetSbrkHook(MallocHook_SbrkH
// TODO(csilvers): add support for removing a hook from the middle of a chain.
void InitialMallocHook_New(const void* ptr, size_t size) {
- if (MallocHook::GetNewHook() == &InitialMallocHook_New)
+ volatile MallocHook::NewHook self = &InitialMallocHook_New;
+ if (MallocHook::GetNewHook() == self) {
MallocHook::SetNewHook(NULL);
+ }
}
void InitialMallocHook_PreMMap(const void* start,
tycho@alcarin ~/Development/google-perftools/.libs $ DYLD_LIBRARY_PATH="." gdb
./heap-profiler_unittest
GNU gdb 6.3.50-20050815 (Apple version gdb-1461.2) (Fri Mar 5 04:43:10 UTC
2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared
libraries .... done
(gdb) run
Starting program:
/Users/tycho/Development/google-perftools/.libs/heap-profiler_unittest
Reading symbols for shared libraries +++. done
Starting tracking the heap
Starting tracking the heap
DONE.
Program exited normally.
(gdb)
Original comment by neunon
on 29 May 2010 at 7:00
Did 'make check' and only have one broken test now on Mac OS X. I'll open a new
issue for this other breakage.
Original comment by neunon
on 29 May 2010 at 7:02
Nevermind, the test that failed is just the profiler test randomly failing. No
biggie.
Original comment by neunon
on 29 May 2010 at 8:54
We'll put the second fix (volatile variable) into the next release.
Original comment by csilv...@gmail.com
on 7 Jul 2010 at 9:23
This should be fixed in perftools 1.6, just released. (We ended up fixing it a
slightly different way, using atomic intrinsics.)
Original comment by csilv...@gmail.com
on 5 Aug 2010 at 8:51
Original issue reported on code.google.com by
neunon
on 20 May 2010 at 11:15Attachments: