linshibo / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

patch for heap profiling on windows #83

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is a patch file that takes steps towards finishing the 
windows port of the heap profiler.  It doesn't address CPU profiling 
or heap checking.  It has been tested against some fairly large profilees 
on windows.  Some of the changes in the patch are fixes to issues that 
would affect tcmalloc even in the absence of profiling.

It's possible I may have lost some changes when creating the patch (we 
have some other modifications I am not sharing at this time). 

Please ask questions.

Changes: 
- Use the HANDLE API to dump the profile instead of file descriptors. 

- Fix a couple bugs where there were dependences on the global 
constructor order, which is undefined 

- Make sure the pad member of the Central_Freelist is not of length 0 

- Do not unpatch the windows functions, ever.  This is dangerous since 
you would be depending on global destructor order. 

- Do not report large allocations using printf (or at all for now) 
since printf isn't so safe to call in some processes 

- If we are asked to free something allocated by the CRT heap, then do 
a nop.  This is necessary because a bunch of stuff gets allocated 
before our hooks are in place (global c++ constructors, parts of 
environ, other things?) 

- Hook _msize and _expand.  Otherwise, very subtle bugs can occur. 
Even in programs that don't use these, since the linker/compiler 
introduces calls to _msize for global c++ constructors occurring in 
dlls. 

- Make realloc work if it is given something that was originally 
allocated by the CRT heap. 

- Make pprof forgiving of carriage returns.  Editing pprof is the sum 
total of my life experience using perl, so I've probably done 
something stupid. 

- Added stack walker using RtlCaptureStackBackTrace, which doesn't 
work so well with FPO (I turned this walker on by default) 

- Wrote ad-hoc stack walker, which sometimes works with FPO, but 
sometimes gets a snapshot from an earlier time due to left over stack junk.  
It is possible that this one is faster than the previous one if you turn 
off the heuristics -- I didn't test that. 

- Leak all spinlocks since you cannot count on the global destructor 
order. 

- Hook all instances of the CRT in the process, not just an arbitrary 
one 

- Hook LoadLibraryExW in order to detect dynamic loads of the CRT 

- WART: Using FreeLibrary to free the CRT is dangerous if you later 
LoadLibrary it again.  So we aren't perfect yet. 

- Fix some compile errors with VC8 

- Wrote nm and addr2line programs that use PDB symbols.  Did not add 
these to any build scripts.  You must download Debugging Tools For 
Windows and copy dbghelp.dll and symsrv.dll into the directory of 
nm.exe and addr2line.exe for them to function.  There exist old 
versions of these dlls that do not function. 

- Modified pprof to fall back on these pdb versions of nm and 
addr2line if they are siblings of pprof.  Processes linking in a 
mixture of gcc compiled and cl compiled dlls seem to get the union of 
the symbols. 

Original issue reported on code.google.com by dvi...@gmail.com on 16 Oct 2008 at 11:27

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by csilv...@gmail.com on 16 Oct 2008 at 11:31

GoogleCodeExporter commented 9 years ago
This probably fixes at least the symptom of Issue 50:
http://code.google.com/p/google-perftools/issues/detail?id=50

but it's hard to know whether it is for the right reason.

Original comment by dvi...@gmail.com on 16 Oct 2008 at 11:32

GoogleCodeExporter commented 9 years ago
This is a big patch, so I'm going to try to deal with it one file at a time.  I
decided to start with pprof. :-)  Most of it is straightforward enough.  I have 
a few
questions though:

1) You added this check for cygwin:
+  # On Windows, the map portion of the profile includes a region
+  # for the executable, so it's already taken care of.

I'd like to do this check more programatically.  My plan is to see what regions 
are
read from the profile, and figure out if any is a whole-program region.  Can 
you send
some output from the cygwin ldd or equivalent, so I can see what this
already-included region looks like?

Alternately, here is the code I want to put in, in the main ParseLibraries() 
loop:

    push(@{$result}, [$lib, $start, $finish, $offset]);

    if ($start le $zero_offset && $finish ge $max_pc &&
    $offset eq $zero_offset) {
      # The entry for the main program will encompass all possible pc's
      $seen_entry_for_main_program = 1;
    }

Will this work?

2) I'm confused about the necessity for the new addr2line_pdb and nm_pdb 
entries in
objtools.  It seems to me that either the system will come with addr2line and 
nm, in
which case we should use the system version, or they won't, in which case we 
should
use the version you supply in your patch (that is, require the user to install 
those
binaries in the same place they install pprof).  In the first case,
$obj_tool_map{"addr2line"} (and "nm") will already be set up correctly.  In the
second case, shouldn't we just change $obj_tool_map{"addr2line"} to point to
$0/addr2line?  Why the need for a separate $obj_tool_map{"addr2line_pdb"} entry?

If the need is just because the commandline interface is different: it seems to 
me we
control the commandline interface, since we're providing the source for this 
binary.
 Can't we just make it take a compatible set of commandline flags?

Original comment by csilv...@gmail.com on 17 Oct 2008 at 1:34

GoogleCodeExporter commented 9 years ago
ack, sorry about repling to the group instead of the issue :).

I thought of a potential issue this morning with naming nm_pdb nm.exe (and same 
with 
addr2line): users might get it confused with the cygwin nm, especially if it 
gets in 
their PATH.  I've attached a patch to pprof that expects the executables to be 
called 
nm_pdb.exe and addr2line_pdb.exe.

Original comment by dvi...@gmail.com on 17 Oct 2008 at 2:16

Attachments:

GoogleCodeExporter commented 9 years ago
Ok, based on your comments in the newsgroup (sorry, I accidentally redirected
discussion there instead of the bug report), I will take out the cygwin check: 
it
looks like the code works fine if that region gets inserted twice.

I will also change the binary-to-look-for to be nm_pdb and addr2line_pdb.  I may
change the logic around there a little, but not in any meaningful way.  One 
question
I have is this:

}   if ($file_type =~ /MS Windows/) {

[file_type is what is returned by the 'file' command.]  I'm guessing "MS 
Windows" ==
PDB format?  Is there another string we should be looking for to make sure it's 
PDB
in particular, or is "MS Windows" enough?

Original comment by csilv...@gmail.com on 17 Oct 2008 at 9:56

GoogleCodeExporter commented 9 years ago
Sorry, one more question:

You added the second condition on this "if":

}      if (exists $symbols->{$address[$i]} &&
}     exists $symbols->{$address[$i-1]}) {

I'm worried about the change in behavior (we leave out the arc rather than, as
before, assuming addresses[$i-1] is 0).  Was there a particular problem you were
seeing, that motivated this change?  If so, do you happen to have an example 
pprof
file that executes the new logic you added, so we can understand better what 
effects
this change will have?

Original comment by csilv...@gmail.com on 17 Oct 2008 at 10:12

GoogleCodeExporter commented 9 years ago
On the first question:
I think `file' returns that for all executables that work on windows.  The file 
output does not in any way reflect the format of the debugging information.

On the second question:
perl was producing a large number of uninitialized variable warnings.  I 
assumed this 
was a bug independent of my changes (I think I've seen the warnings on linux 
too).  I 
assumed using the uninitialized value was unintentional, but if it is 
intentional, 
then perhaps we should initialize the slot to zero if it does not exist (there 
were 
several secondary warnings down the line that were also fixed by this change).  
In 
addition to uninitialized variable warnings, I think this fixed an issue where 
--text 
--lines would report a line with a completely blank source location (not even 
??:0).

I admit I did not take the time to understand the root of this problem, and I 
should 
have.  Thoughts?

Original comment by dvi...@gmail.com on 18 Oct 2008 at 2:01

GoogleCodeExporter commented 9 years ago
1. Is there any way to tell, on windows, what the file format is?  Or maybe it
doesn't matter that much?  I'd just feel more comfortable if we ran a 
PDB-specific
nm/addr2line on stuff we knew was actually PDB (if that even makes sense; I 
don't
know about PDB at all).

2. It would be nice to get to the root cause here.  We don't see uninitialized 
var
warnings using pprof at google, but maybe it has to do with the nature of the
executable?  Any chance you can find an executable + profile that causes this
warning?  Maybe we can figure out what's going on.

I don't know what the "right" thing to do here is, either.  I don't really 
understand
much about what might cause anything here to be uninitialized, in practice, so I
don't know the best way to recover when it happens.

Original comment by csilv...@gmail.com on 18 Oct 2008 at 2:23

GoogleCodeExporter commented 9 years ago
Even if we knew whether the executable used PDB info, we would still need to 
check
the dlls.  As it stands, cygwin's nm fails for PDB files and we fallback on the 
other
one.

I'll see about tracking down the uninitialized variable.  I think what's 
happening
here is the $address[$i-1] entry of $symbols doesn't exist.  So this concats 
uninit
strings:

    my $destination = $symbols->{$address[$i-1]}->[1] . ":" . 
                          $symbols->{$address[$i-1]}->[0];

Original comment by dvi...@gmail.com on 18 Oct 2008 at 1:55

GoogleCodeExporter commented 9 years ago
I agree that $address[$-1] doesn't exist.  But I'm not sure why, or what the 
right
thing to do when it doesn't exist.

Original comment by csilv...@gmail.com on 20 Oct 2008 at 5:46

GoogleCodeExporter commented 9 years ago
OK, while we're figuring out the $address[$i-1] thing, I've been starting on the
other files.  I'll ask the high-level questions I noticed right away first; 
there may
be more detailed questions later.  Sorry these jump around so much:

1) heap-profiler:

Why did you need to change FDWrite to use handles and windows functions instead 
of
file descriptors and libc functions?  I'd like to avoid all these extra ifdefs 
if
possible.  Also, having the windows kernel allocate memory in CreateFile is 
indeed
potentially dangerous -- that's why we had to write FDWrite to begin with 
rather than
using fwrite -- so I'd be happy to avoid the whole thing if I can, and just 
leave
FDWrite as it is now.  What was wrong with the way it was before?

2) tcmalloc.cc:

Why the changes to pad_?  Does pad_[0] cause MSVC to barf?  It's been working 
so far,
so I'm confused by this.  What problem are you trying to sovle here?

}    /* dvitek: You can't do this unless you can assure me that this is
}     * the very last destructior called in the process.
}     */
}    //UnpatchWindowsFunctions();

In fact, we can assure this, because destructors run in the opposite order of
construction, and we go through a lot of effort to make sure this constructor is
called first in the program.  If it's not, lots of trouble happens (which maybe
you've been seeing).

I'm not comfortable taking this out, and with the other, related changes you 
make to
leak memory in tcmalloc, and to silently drop attempts to free memory that was
allocated by libc but freed by tcmalloc.  The last particularly worries me.  It 
could
mask serious bugs and be almost impossible for a user to debug.

So I'd rather address this issue by figuring out how to make sure this *is* the 
last
destructor called in a process.  In unix-land, we do that by requiring that 
tcmalloc
be the last library linked into an application.  (We already add relevant
dependencies -- including some code you've added -- to ensure that within
libtcmalloc, this is the first constructor run.)  Is there something equivalent 
we
can do in windows-land?  I'm happy to document that requirement.

Anyway, it would be great if you could try to do that, and uncomment this 
change and
the other related ones (the !span check near tcmalloc.cc:2775, the SpinLock
destructor in port.h, maybe others?) to see if you can get the code working.

}  /* This is dangerous on windows if printf isn't working for one
}   * reason or another.

Can you elaborate on this?  Is it that printf needs to call malloc?  What 
problems
were you seeing here?  I have no problem turning off large-malloc reporting, 
btw.  I
may want to do it for everyone in any case.  It's kinda distracting.

Thanks for adding _msize and _expand, btw!  I didn't know about these 
functions.  I'm
thinking of moving their definitions to patch_functions.cc, though, since they 
are
windows-specific.  Will that work ok, do you think, or will there be other 
problems?

3) addr2line.c:

Instead of downloading to c:\websymbols, should we download to 
TMPDIR\websymbols? 
Doesn't windows have some tmpdir function?

Also, whenever we create a new file, I worry about security concerns.  I don't 
know
how this works in windows-land, but what I would normally do is have something 
like
   char* dir = mkdtemp("TMPDIR\websymbols_XXXXXX") || fail;
   and then download to dir\websymbols
I think mkstemp() would work as well.  tmpfile() would be best of all, though 
not
usable in this context it looks like.  I don't know if there are windows 
equivalents
for any of these.

This avoids attack vectors where an attacker makes c:\\websymbols a symlink to 
some
location (like command.com), and then when you run this binary, it overwrites 
the
pointed-to location with your permissions.

4) port.cc:

Some of the stacktrace code reminds me of code in stacktrace_x86-inl.h.  Does 
it make
sense to try to merge them somehow?  Maybe it's too much work.

In general, I'm not a windows expert enough to evaluate this code.  I'll try to 
find
some folks around here who are.

5) port.h:

} #if defined(_M_IX86)
} /* Without this, the profile dumper will try to use the 64-bit
} * directive, which causes junk to be written to the profile.
} */
} #ifndef PRIxPTR
} #define PRIxPTR  "lx"
} #endif
} #endif

Can you explain this a bit more?  What do you mean by "try to use the 64-bit
directive?"  (And where do they get that directive from, whatever it is?)  And 
why do
you do this just for _M_IX86?  What do we do in other situations (what other
situations might there be?)

6) patch_functions.cc:

It seems like you've totally rewritten this.  I'm sure it's much better!, but 
again,
I don't know enough to evaluate it.  I see that you spend a lot of effort to 
deal
with having multiple libc's in a single executable.  Does that really happen?  
Gosh.

One thing that occurs to me is that instead of having lots of arrays that look 
like
   foo[MAX_LIBCS]
it would make more sense to have a struct that included all the pointers, and 
have an
array of that struct.  That would get rid of the need for the macros you have, I
think for 0-8, and also let you extract out some common code into functions 
easier. 
Are there any problems you forsee with that rewrite?

As a more general note, the code you write is in its own, consistent style, but 
a
style that's quite different from the rest of perftools.  My plan, as I 
incorporate
this functionality in, is to rewrite most of it to existing perftools style.  
Does
that sound ok to you?

Original comment by csilv...@gmail.com on 22 Oct 2008 at 7:14

GoogleCodeExporter commented 9 years ago
(1)

Microsoft has slightly different symbols for everything, even using libc:
http://msdn.microsoft.com/en-us/library/z0kc8e3z(VS.71).aspx

As you can see, you basically need underscores before every identifier.  When I 
saw 
the old code I thought you were probably using file descriptors (instead of 
something 
like iostream) to avoid acquiring locks and allocating memory.  If you look at 
the MS 
CRT implementation (source comes with the non-free versions of vc++) it 
acquires 
locks inside _open at the very least.  It seemed safer to use something closer 
to a 
direct syscall given that it needed to be changed anyway.

(2)
MS VC does barf on [0].  This may have been happening when I had added a field 
to 
spinlock (which I later removed).  It seems more maintainable in any case if it 
can 
withstand these sorts of changes without compilation errors.

I tried to find a way to get the tcmalloc constructors to go first, but I gave 
up.  
The closest thing I could find to a solution was:
http://blogs.msdn.com/vcblog/archive/2006/10/20/crt-initialization.aspx

However, looking at the MS libc source (which comes with VC++ pro), I can see 
that 
environ (and probably other things too) are malloced before any of those 
initialization functions are called.  See Vc7/crt/src/crt0.c for the important 
bits.  
putenv crashing was the first issue I ran into before I altered do_free and 
realloc 
to be forgiving.  There are other problems -- I suspect DLLMain events can 
happen 
before static constructors and after destructors.

I have an alternate proposal to fix the unsavoriness of ignoring strange 
pointers.  
We can hook delete and delete[] in the same fashion that we hook free, which 
would 
allow us to fall back on the CRT implementation of free/realloc/delete/delete[] 
inside do_free when we see that we don't know anything about the pointer in 
question.  
This is only slightly unsavory because I would be hardcoding the mangled 
operator 
delete identifier for VC into the source.  do_free would take an extra 
parameter 
(similar to realloc) in order to achieve this.

Moving _msize and _expand to port.cc is fine so long as all the classes and 
globals 
they need are in scope there (and not privately declared in tcmalloc.cc).

There are processes that set stdout to NULL, which will cause printf attempts 
to null 
pointer dereference.  It can also lead to the windows equivalent of SIGPIPE 
under 
some circumstances, if your process has been disconnected for some reason.

(3)

You want to make sure you use the same directory every time you run -- 
downloading 
symbols is not always a quick process.  NTFS only supports directory to 
directory 
links (it calls them junctions).  They are probably closer to hardlinks than 
symlinks.  The dbghelp library will fail to download any pdb file if a file 
with the 
same name already exists (it just tries to read the existing file as if it is a 
pdb 
file).  If there is something preventing the dbghelp library from creating one 
of the 
directories it wants to create, then it will silently create a 'sym' directory 
as a 
sibling to symsrv.dll and put things there (surprising, I know).

(4)

Some of it is similar, but the heuristics (and knowledge of stack top/bottom) 
do make 
it different enough that I'm not sure it's worth it.

(5)

basictypes.h has this, which is not correct on 32-bit platforms:

#ifndef PRIxPTR
#define PRIxPTR  PRIx64
#endif

My addition should probably check all arch symbols that indicate 32-bitness, 
_M_IX86 
is the one I know about.  The other two windows platforms are 64-bit (x86_64 
and 
ia64).

(6)

Multiple libcs don't happen with a single executable, but they do happen with a 
single process.  Executables and DLLs are bound to a specific libc at 
build-time.  
Many DLLs shipped by MS are bound to msvcrt.dll -- for instance, if you link 
with 
ADVAPI32.DLL (exports functions for working with the windows registry such as 
RegCreateKey), it links with WINTRUST.DLL, which links with msvcrt.dll.  
msvcrt.dll 
is the VC6 CRT.

Using a struct of the libc functions sounds like a good idea to me.  I'm not 
entirely 
certain that you can completely eliminate the need for 8 invocations of some 
macro 
without using more machine-code-generating code, but maybe I'm not seeing it.  
Change 
the style at will.

Let me know if I neglected to respond to any of your questions.

Original comment by dvi...@gmail.com on 22 Oct 2008 at 4:42

GoogleCodeExporter commented 9 years ago
Thanks for your responses.  I'm afraid some of these may take a bit of back and
forth, but I think we're getting closer to resolving many of them.

} It seemed safer to use something closer to a 
} direct syscall given that it needed to be changed anyway.

Why did it need to be changed anyway?  Were you seeing any problems that 
prompted
this change, or was it just prophylactic?

It's fine for the OS to acquire locks in this code -- what we're worried about 
is
allocating memory (since that might cause a recursive call into tcmalloc).  As 
long
as _open/_read/_etc don't allocate memory, we're fine (and possibly even if 
they do,
depending on how they do it).

You shouldn't need to worry about changing the open to _open and the read to 
_read;
that should all be done for you in port.h.  So ideally this code shouldn't 
require
any changes at all.  If it does, hopefully you can make those changes in port.h
rather than in this file.  Can you give that a try?

} MS VC does barf on [0]

This surprises me greatly, because even before your patch, we were running 
tcmalloc
on windows, and this code compiled fine.  Why would it suddenly start to break? 
 Can
you try changing this code back and seeing if it works now?  If not, can you 
report
the error?

} I tried to find a way to get the tcmalloc constructors to go first, but I 
gave up.

Hmm, windows is *so* annoying here. :-(

libc doesn't use new does it, so we only need to patch free, not delete?  As 
long as
we can guarantee tcmalloc runs before any other user libraries, we should be 
fine.  I
think with the crt.$init stuff we can at least guarantee that.  (Though maybe 
not, if
other libraries are also using the same trick?)

We're already using the crt-init in port.cc to set up locking, I think, so it 
might
not be so bad to hook in there.

Anyway, I like your alternate idea of hacking free/delete to pass back to msvc's
free/delete if they don't know the pointer.  We still have to modify tcmalloc to
not-die if we see free() on a pointer we don't recognize.  I'm worried that 
will slow
down free in the common case that we *do* recognize the pointer.  We're doing an
extra comparison now in every free call, right?  I'm not sure if that will be 
tough
to get buy-in for, or not.  I agree we'll want a separate do_free function that
returns a bool if the free succeeded or not, for windows.

} There are processes that set stdout to NULL, which will cause printf attempts 
to
} null pointer dereference.

Is that the only problem?  If so, can we change the printf statements to be
   if (stdout) printf(...)
?

} You want to make sure you use the same directory every time you run -- 
downloading 
} symbols is not always a quick process.

OK, maybe we don't need to (or can't) do fancy permissions stuff here.  But I 
still
would feel better downloading this to TEMPDIR than to the root directory.  Does 
that
not make sense?

} basictypes.h has this, which is not correct on 32-bit platforms:

Right.  These basictypes definitions should hopefully never be used, because the
system should define all these things for us.  But windows doesn't define any of
these for us.  Hmm, I'm surprised PRIxPTR is all you need.  Don't you have 
trouble
with say, PRId64 as well, which I see is used in heap-profiler.cc?  Or is that
already defined for us somewhere?

I think the right fix is to make basictypes.h do something better.  Is there a
__WORDSIZE macro or something equivalent in MSVC header files, that you know 
of? 
It's safer to use that than to try to switch on architecture versions.

} Executables and DLLs are bound to a specific libc at build-time.
} Many DLLs shipped by MS are bound to msvcrt.dll

In practice, does anyone link to a different libc than msvcrt.dll?  That is, 
how much
do we need to worry about this in real life?

} Let me know if I neglected to respond to any of your questions.

That's certainly enough to start on, thanks!

Original comment by csilv...@gmail.com on 23 Oct 2008 at 12:13

GoogleCodeExporter commented 9 years ago
For one reason or another the $address[$i-1] thing isn't manifesting on the 
small 
tests I've tried today.  I'm having it run some more expensive tests, but I 
won't 
have a definitive answer until next week on that, assuming nothing happens.  
Maybe 
the warnings were ultimately caused by the carriage returns, which are now 
fixed.

) _open and CreateFile
So I had not gone into the _open source to confirm that MS is doing something 
dirty 
before, but sure enough, they are.  Two problems with using _open:
1. _open calls _alloc_osfhnd which calls malloc (while holding a global lock!!).
2. _open calls CreateFile which calls HeapAlloc.

Here is a potential deadlock:
Thread 1 enters _open and acquires an _open lock
Thread 2 enters RecordAlloc and aquires heap_lock
Thread 2 enters _open on behalf of tcmalloc and waits on the _open lock
Thread 1 enters RecordAlloc on behalf of _open and deadlocks

Using CreateFile takes the _open locks out of the picture, but we will still 
see the 
recursive call to RecordAlloc (there is no documented API that will avoid 
this).  If 
this is a problem, then I suggest introducing:
static int recursive_depth = 0;
into the RecordAlloc function, in addition to continued use of CreateFile.

) MS VC does barf on [0]
The result of the modulus changed because I had added a field to spinlock (I 
believe 
I said this in my previous response?).  The modulus result probably did not 
work out 
to zero before, nor does it right now.  Isn't it a good thing that the code 
will 
compile even if someone modifies one of the structures in the future?

) As long as we can guarantee tcmalloc runs before any other user libraries, we 
should be fine.
More problems with unpatching:
1. At least one libc may have been unloaded via FreeLibrary, in which case the 
attempt to unpatch will segv
2. DLLMain exit messages come in after the destructors are done.
3. Other libraries may use these magic CRT sections.

Now, many users can probably live with these issues.  But what is the practical 
benefit of unpatching things?  It seems like a risk without any upside.

) I like your alternate idea of hacking free/delete....
The extra comparison would be the cost, but only on windows.  Beats crashing 
though.  
Even with the magic crt sections, I think we need this to work in the presence 
of 
DLLMain messages and calls to putenv.

) if (stdout) printf(...)
I can think of some other issues:
1. SIGPIPE could happen
2. the buffer could get full and block forever
3. the output could get interleaved with other stderr output

Maybe we could make the fprintf calls optional (I know we would want them off 
by 
default here)?  It isn't really a platform specific problem.

) OK, maybe we don't need to (or can't) do fancy permissions stuff here.

I can  think of a few options:
1.  Drink the MS kool-aid and use C:\Documents and Settings\dvitek\Local 
Settings\Temp\pprof_symbols.  This may be a problem because of path length 
limitations on NTFS and spaces in the path though.  It is entirely trivial to 
query 
the OS for that location (need to read the registry).
2.  Check if $TEMP or $TMP is set.  I don't think a clean windows install sets 
any of 
these.  We would need to fall back on something else.
3.  Use c:\temp or c:\windows\temp
4.  Stick with c:\websymbols, which is that Microsoft's documentation suggests 
using.  
The advantage of this is that many developers already have a symbol cache here, 
and 
we can share it.  A user can already override c:\websymbols by setting up 
_NT_SYMBOL_PATH (or whatever the variable is) properly.

) PRIxPTR
PRId64 probably is broken too.  Although for some reason it isn't actually 
causing 
noticable problems.  Probably just getting lucky with the positioning on the 
stack of 
the sprintfs.

I don't know of any WORDSIZE macro.  We always use configure scripts 
internally.  It 
is possible that one may exist, but I'm not sure how to find it in the 
documentation.

) does anyone link to a different libc than msvcrt.dll
I can think of several:
1. msvcrtd.dll - VC6 debug version of libc
2. msvcr70.dll - VC7
2. msvcr71.dll - VC7.1
2. msvcr80.dll - VC8
2. msvcr80.dll - VC8 with service pack 1 (yes, they have the same name)

I almost always use msvcr80.dll.  VC6/msvcrt.dll is pretty ancient.

Original comment by dvi...@gmail.com on 23 Oct 2008 at 2:37

GoogleCodeExporter commented 9 years ago
Ahem, where I said:
It is entirely trivial to query 
the OS for that location (need to read the registry).

I meant to say that it is not trivial :).

Original comment by dvi...@gmail.com on 23 Oct 2008 at 2:41

GoogleCodeExporter commented 9 years ago
} For one reason or another the $address[$i-1] thing isn't manifesting on the 
small 
} tests I've tried today.

No rush.  It'll be a while before we're all done with this process anyway. :-}  
Just
let us know when you have something to report.

} Using CreateFile takes the _open locks out of the picture

OK, that makes sense.  I'll comment that when applying this part of the patch.

} we will still see the recursive call to RecordAlloc

That's fine -- we already have the 'dumping' variable to prevent recursive 
calls to
DumpProfileLocked.

} The result of the modulus changed because I had added a field to spinlock

Ah right, you did say that before.  I should be able to fix this a different way
(using templates to get rid of pad_ entirely in the case size is a multiple of 
64).

} But what is the practical 
} benefit of unpatching things?  It seems like a risk without any upside.

Interesting question.  If we have code that delegates free() to crt-free() when
appropriate, maybe there's no upside.  Right now, we need to unpatch so 
microsoft
library destructors can call the right free() when they destroy.

} The extra comparison would be the cost, but only on windows. 

Ok, fair enough.  Let's go that route.

} Maybe we could make the fprintf calls optional

I'm looking at the documentation for this, and it looks like we should only be
reporting allocs larger than 1G.  That is, it probably shouldn't trigger much, 
if at
all, in real life.  We could make the default even larger, but I'm tempted to 
just
not worry about this.  I don't think it should be a problem in practice.  Have 
you
been seeing it?

The same printf problem presumably exists when the heap-profiler prints to 
stdout
(stderr?) every time it writes another heap file.  Would we need to solve the 
same
problem there?

} Stick with c:\websymbols, which is that Microsoft's documentation suggests 
using.

Ah, I didn't realize this was the official recommended microsoft location.  That
makes sense to use, then.

btw, for getting a temporary directory, I was thinking you'd call GetTempPathA. 
 Does
that not work?

} PRId64 probably is broken too.  Although for some reason it isn't actually 
causing 
} noticable problems.

Actually, it looks like the definition we give for PRId64 is right for windows. 
 I'll
just change PRIxPTR to be right for windows as well.  Nobody else should be 
using
this part of basictypes.h anyway, since they get these values via inttypes.h.

Original comment by csilv...@gmail.com on 23 Oct 2008 at 7:09

GoogleCodeExporter commented 9 years ago
I believe open issues currently:
1) Still waiting to figure out if $address[$i-1] is an issue
2) Rewriting free (and delete?) to delegate to the CRT if need be
3) Deciding if we really want to quiet the big-alloc calls, given they only 
trigger
when an alloc is >1G

Everything else that has been brought up so far, I'm currently working on and 
plan to
fix.  (1) is just waiting, and (3) I've decided I'll probably just up the 
default
from 1G to like 8G.  Then it should never(!) trigger by default, but you can 
set it
lower with an environment flag.  For (2), I think I'm waiting for a patch from 
you
that tries this?  No rush; just making sure we're on the same page.

I'm now working on addr2line-pdb and nm-pdb.  A few questions:

In addr2line-pdb, you use SymFromAddr() to get the first line of output you 
print. 
The msdn documentation says this gives the "name" of the symbol.  Is that the
function name this address is found in?  (I hope it is!)

In nm-pdb, you have a comment "Everything leaked".  What does that mean, 
exactly?

Overall, these were pretty easy to follow.  I'm making progress...

craig

Original comment by csilv...@gmail.com on 25 Oct 2008 at 6:05

GoogleCodeExporter commented 9 years ago
Ah, one more thing about nm-pdb.  I tried compiling on VC 7.1 (which I'd like 
to be
able to support if possible), and it didn't compile because many fields of
IMAGEHLP_MODULE64 are missing in VC 7.1 (such as SourceIndexed).

I looked at what output you actually use in the pprof patch, and it's only 
image name
and loaded image name, which we *do* have in VC 7.1.  So I've commented the 
rest of
them out.  Is that ok?

If there's some way to distinguish VC 8 and up from VC 7.1 and below, I'm happy 
to
put that in so we only omit those fields on older compilers.

Original comment by csilv...@gmail.com on 25 Oct 2008 at 6:40

GoogleCodeExporter commented 9 years ago
We're on the same page on 1-3.

On SymFromAddr: Yes, I'm pretty sure it gets the symbol containing the code 
address
you hand it.

"Everything leaked" refers to the fact that the utility is not calling free on 
some
things it malloced.  So if one were turning this code into a library or 
something,
one would want to deal with that.

About VC7.1: Feel free to ditch any parts of that function that report basic
information about the module.  It was for debugging.  I am showing the user the 
PDB
we use in case the dbghelp library chooses to use the wrong PDB (or finds none 
at all).

Original comment by dvi...@gmail.com on 25 Oct 2008 at 1:45

GoogleCodeExporter commented 9 years ago
The $address[$i-1] thing appears to be unnecessary -- I'm guessing the carriage 
returns were causing it since that's the only thing that changed since its 
introduction.

Original comment by dvi...@gmail.com on 28 Oct 2008 at 3:28

GoogleCodeExporter commented 9 years ago
Great!  I'll check that one off the list then.

I'm making good progress getting the patch reviewed by local windows experts.  
Of
course, I've left the most challenging bits until last...  One question about
nm-pdb.c: you have some of the printf's #if 0'ed out.  What is the reason for 
that? 
Should I leave it #if 0'ed out (with a comment as to why), re-enable it, or 
take it
out entirely?

Original comment by csilv...@gmail.com on 29 Oct 2008 at 11:25

GoogleCodeExporter commented 9 years ago
Just remove them

Original comment by dvi...@gmail.com on 29 Oct 2008 at 11:27

GoogleCodeExporter commented 9 years ago
OK, I'm puzzling over AdhocStackTrace now.  I know it's commented out, but I'd 
still
like to make sure the code is good.

Can you add comments to GetStackTrace (the one used by adhoc-stack-trace)?  In
particular, I don't really understand what t and b are (are they like top and 
bottom?
but different?), and what the for loop in the !sp case is all about.

You have this code:

}     if (!sp) {
}       uintptr_t diff = (uintptr_t)t - (uintptr_t)sp;

But isn't the whole point sp is 0?  So diff will always be == t?  What am I 
missing?

You have a label looks_good, and code that reads:
}        if (sp) {
}          goto looks_good;

But puzzling through the logic, it looks like it would be equivalent to have a
'break' here.  Is that true?  If not, what am I missing?

It also looks like, in the PROBABLY_CODE case, you're adding to result twice: 
once in
the PROBABLY_CODE case itself, and then once again after you break out of the 
for
loop.  Is that intentional?  Maybe another comment here would be helpful.

Also, in RtlCaptureBackTrace, why are you calling DebugBreak if we can't find 
the
right function to call?  Shouldn't we just fall back to one of the other 
options, or
else return NULL?  I don't think breaking is a good idea in the case of 
tcmalloc. 
Maybe this code is cut and paste from some other context?

craig

Original comment by csilv...@gmail.com on 30 Oct 2008 at 1:38

GoogleCodeExporter commented 9 years ago
I've attached some comments and a fix to the uintptr_t diff thing, which was
introduced during some refactoring I guess.  The diff check can potentially 
speed
things up a bit, but certainly isn't crucial.

break; at the goto would break out of the for, hitting the other "break;" right 
after
the for, which would leave the outer loop.  (We need to break out of the 
if(!sp) block)

The PROBABLY_CODE case only breaks out if we've accumulated enough frames, in 
which
case we break out of the inner loop, and there is the second break out of the 
outer
loop immediately succeeding the inner loop, so it never hits the other addition 
to
result.  See the comments I added.

DebugBreak is just a poor man's abort here.  Feel free to use RAW_CHECK or 
something
else that is more appropriate, but safe to use in this context (i.e., won't 
recurse
into malloc).

Original comment by dvi...@gmail.com on 30 Oct 2008 at 2:14

Attachments:

GoogleCodeExporter commented 9 years ago
OK, I understand the control flow a little better now.  I may try to refactor 
it to
be less confusing.

I still don't know what t and b are.  Is it possible to use more descriptive 
names?

} DebugBreak is just a poor man's abort here.  Feel free to use RAW_CHECK or 
something
} else that is more appropriate, but safe to use in this context (i.e., won't 
recurse
} into malloc).

I don't think we should abort here.  Typically when we can't get the stacktrace 
for
some reason, we just return 0.  I'll make that change unless there's a good 
reason
not to.

Original comment by csilv...@gmail.com on 30 Oct 2008 at 2:32

GoogleCodeExporter commented 9 years ago
Hmm, I take it back: I'm still confused by the control flow.  If I'm reading 
this
right, then in the PROBABLY_CODE case, we'll add ptr to the result array, and 
then
continue on with the for loop.  That means if we have four pointers in a row 
that
look like the point to code, all four will be added to result.  Is that right?  
The
comment you added says:
}      // Start scanning memory for a code pointer until we either find
}      // one or hit the top of the stack.

which leads me to believe we should be breaking out of the for loop after the 
first
PROBABLY_CODE pointer.  But the code seems to act otherwise.

Also, consider the case where traversing the stack looks like this:
   <ptr into code><ptr into code><something that looks like an sp><pointer into code>

The way the code works now, the first two will get put into result, then we'll 
see sp
and follow that like a normal stack pointer.  Is that right?

I guess the rationale is that in the above case, the first <ptr into code> 
comes at
the end of the top stack frame, the next comes at the next stack frame, and 
then the
sp comes at the next stack frame.  The first two functions are using FPO and 
don't
have an sp, but the third function is not and does have an sp.  Is that 
accurate?

Finally, I have concluded that t and b stand for top_of_stack and 
bottom_of_stack
respectively. :-)  But I only see you setting this on the i386 case.  In the 
x86_64
case, you only seem to set sp.  Is that right?

Original comment by csilv...@gmail.com on 30 Oct 2008 at 2:45

GoogleCodeExporter commented 9 years ago
OK, based on my understanding of what is going on, here is my proposed version 
of
GetStackTrace for the adhoc method.  Can you take a look and see if it is 
correct? 
Mostly I just changed the control flow by making us go back to the outer loop 
every
time we see an FPO return value, even though we could, correctly, stay in the 
inner
loop.  I think by leaving the inner loop every time we assign to result, the 
control
flow is a bit clearer.

My comments may be totally off-base; please let me know if they are, and how to 
make
them comprehensible and correct.

I'll just put the code inline rather than attaching:

---
int GetStackTrace(void** result, int max_depth, int skip_count) {
  void **sp, **top_of_stack, **bottom_of_stack;

#if defined(_MSC_VER) && defined(_M_IX86)
  // Stack frame format:
  //    sp[0]   pointer to previous frame
  //    sp[1]   caller address
  //    sp[2]   first argument
  //    ...
  sp = (void **)&result - 2;

  // Get the top and bottom of the stack.
  __asm {
      push eax
      mov eax, FS:[0x04]
      mov [top_of_stack], eax
      mov eax, FS:[0x08]
      mov [bottom_of_stack], eax
      pop eax
  };
#elif defined(_MSC_VER) && defined(_M_X64)
  DWORD c_rbp;
  // Move the value of the register %rbp into the local variable rbp.
  __asm mov [c_rbp], rbp
  // Arguments are passed in registers on x86-64, so we can't just
  // offset from &result
  sp = (void **) c_rbp;
  // TODO: fill top_of_stack and bottom_of_stack as well
#else
#error "Figure out how to implement stack end computation for this architecture"
#endif

  MaybeInitializeCode();

  int n = 0;
  void **prev = sp;

  while (n < max_depth) {
    void** return_address_ptr = NULL;

    // If the current frame has a stack pointer, things are easy: sp+1
    // is the return address, that we want to store, and *sp is the
    // next stack frame.  If the current frame does *not* have a stack
    // pointer, then probably it's for a function that was compiled
    // with FPO.  In that case, we take any old pointer we see that
    // looks like it's pointing into code, and take that as the return
    // address.  Then we assume that ptr+1 is the next stack frame (what
    // else can we do?)
    if (!sp) {
      uintptr_t diff = (uintptr_t)top_of_stack - (uintptr_t)prev;
      if (diff < 0x30 && !*(prev+1)) {
        // Probably really is the oldest frame if we are this close to
        // the top end of the stack.  Stop now to avoid using the code
        // scanning heuristic, below.
        return n;
      }
      // Scan memory starting at the top of the current stack frame
      // until we a) run out of stack, b) find something that looks
      // like a stack pointer, or c) find a pointer-into-code.  In
      // case (c), we assume this is a return address that ends the
      // current frame, which was compiled with FPO.
      for (void** ptr = prev + 1; ptr < top_of_stack; ptr++) {
        sp = NextStackFrame(ptr, t, b);
        if (sp) {
          break;   // great, found something that looks like a stack frame!
        } else if (PROBABLY_CODE(*ptr)) {
          return_address_ptr = ptr;
          break;
        }
      }
    }
    // At this point we should either have an sp or an explicit return_address
    if (sp) {
      if (skip_count > 0) {
        skip_count--;
      } else {
        result[n++] = *(sp+1);
      }
      prev = sp;
      sp = NextStackFrame(sp, t, b);
    } else if (return_address_ptr) {
      if (skip_count > 0) {
        skip_count--;
      } else {
        result[n++] = *return_address_ptr;
      }
      prev = return_address_ptr;
      sp = NULL;
    } else {
      break;           // didn't find anything? time to give up
    }
  }
  return n;
}

Original comment by csilv...@gmail.com on 30 Oct 2008 at 2:57

GoogleCodeExporter commented 9 years ago
) which leads me to believe we should be breaking out of the for loop after the 
first
) PROBABLY_CODE pointer.  But the code seems to act otherwise.
The comment isn't correct -- it just keeps going on the assumption that we just 
have
multiple FPO frames in sequence until it hits something like a non-fpo frame.  
Your
understanding of what's going on sounds solid.

For x86_64, I left a "#error" in about implementing stack end detection (I 
think?).

The modifications look OK, although I would certainly test them :).

Original comment by dvi...@gmail.com on 30 Oct 2008 at 3:12

GoogleCodeExporter commented 9 years ago
} For x86_64, I left a "#error" in about implementing stack end detection (I 
think?).

Maybe I'm not understanding windows preprocessor defines?  You have this:

#elif defined(_MSC_VER) && defined(_M_X64)
  DWORD c_rbp;
  // Move the value of the register %rbp into the local variable rbp.
  __asm mov [c_rbp], rbp
  // Arguments are passed in registers on x86-64, so we can't just
  // offset from &result
  sp = (void **) c_rbp;
#else
#error ...

Don't we need to fill b and t the same time we set sp here?

Original comment by csilv...@gmail.com on 30 Oct 2008 at 5:58

GoogleCodeExporter commented 9 years ago
The original diff at 
http://google-perftools.googlecode.com/issues/attachment?aid=753171857806843247&
name=google-perftools-0.99.2.msvc8_i386_hprof.patch

has:
+#elif defined(_MSC_VER) && defined(_M_X64)
+    // __builtin_frame_address(0) can return the wrong address on gcc-4.1.0-k8
+    DWORD c_rbp;
+    // Move the value of the register %rbp into the local variable rbp.
+    __asm mov [c_rbp], rbp
+    // Arguments are passed in registers on x86-64, so we can't just
+    // offset from &result
+    sp = (void **) c_rbp;
+
+#error "Figure out how to implement stack end computation"
+#endif

Did I post another one later?  at any rate, it needs a #error above the #else.

Original comment by dvi...@gmail.com on 30 Oct 2008 at 6:02

GoogleCodeExporter commented 9 years ago
Ah, I understand.  I think I moved the #error when I was editing, not realizing 
your
intent.  OK, i've changed it back.

Original comment by csilv...@gmail.com on 30 Oct 2008 at 6:25

GoogleCodeExporter commented 9 years ago
Here is a patch (should be composed with my previous patch) that makes free and 
company fall back on the appropriate libc instance.

I added some handling of more kinds of jump instructions that I encountered to 
the 
patcher.  There are also some bug fixes that improve things when using 
statically 
linked libc and a libc dll in the same process.

It may be desirable to move some prototypes for functions from tcmalloc.cc into 
a 
header file that patch_functions.cc can see.

Original comment by dvi...@gmail.com on 31 Oct 2008 at 12:34

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for the new patch!  I will look through it when I have a chance.  I am 
still
trying to work through the stacktrace code, but I may put that aside since I 
think
this part of the system is more useful to fix.

Original comment by csilv...@gmail.com on 31 Oct 2008 at 1:08

GoogleCodeExporter commented 9 years ago
OK, I've finally had some time to look at the windows-patching code.  One 
question I
have is why you are patching new, new[], delete, and delete[].  I didn't have 
to do
that before -- I think because windows lets you override them (but not malloc 
and
friends) just by defining a new version.  So why patch?  Is it because you are
worried about multiple libc's?  Is new defined in mscvrt?

I ask because you don't seem to completely patch new, which is an overloaded
operator.  We need to support overloading the nothrow version of new as well.  
There
may be other versions as well (though I'm happy not to worry about them because
tcmalloc doesn't).  It's easier if we can just avoid the whole deal.  Can we?

Original comment by csilv...@gmail.com on 8 Nov 2008 at 4:14

GoogleCodeExporter commented 9 years ago
Here is an example that demonstrates why just defining the operators won't work:

/* killit_dll.cpp -- compile me into killit.dll */
extern "C" {
__declspec(dllexport) char *killit( char *p )
{
    delete p;
    return new char;
}
}

/* test.cpp -- compile me into test.exe */
#include <stdio.h>
#include <assert.h>
#include <windows.h>

typedef char *(*killit_fn_t)(void *p);

void* operator new(size_t s)
{
    printf( "New %u!\n", s );
    return (void*)0x12345678;
}

void operator delete(void *p)
{
    assert( p == (void*)0x12345678 );
    printf( "Delete %p!\n", p );
}

int main(int argc, char *argv[])
{
    delete new char;
    HMODULE libby = LoadLibrary( "killit.dll" );
    assert( libby );
    killit_fn_t killit = (killit_fn_t)GetProcAddress( libby, "killit" );
    assert( killit );
    delete killit(new char);
    return 0;
}

The invocations of new and delete from inside the DLL will not be intercepted 
by the 
user defined operators.

As for the nothrow versions, no version of libc seems to implement them on 
windows.  
The only exported symbols for new are "operator new" and "operator new[]".  You 
can 
use the depends.exe utility on any libc dll to see what symbols it exports.

Original comment by dvi...@gmail.com on 9 Nov 2008 at 4:20

GoogleCodeExporter commented 9 years ago
Right, I believe that if you overload new/delete in an .exe, it won't be 
exported to
the dll.  But what matters for perftools is the other way around: whe you 
overload
new/delete in a .dll and try to use it in the .exe (or in another dll).  In unix
those two situations are different, so I wouldn't be surprised if that was true 
for
windows too.

The other point that if windows new/new[] just calls malloc(), and delete just 
calls
free(), then we don't need to override new/delete, because we already override 
malloc
and free.

tcmalloc_minimal_unittest tests that tcmalloc correctly intercepts new and 
delete
calls.  Right now that test is passing on windows, even with the current 
version of
tcmalloc that doesn't try to patch new/delete.  So I suspect it is not 
necessary to
patch them.  Have you seen a situation where it's actually necessary -- that is,
where the current tcmalloc isn't intercepting new/delete?

The way we're going to do some of the windows_patch.cc changes, life will be a 
lot
easier if we don't have to worry about new and delete, so I'd like to make sure 
it's
really necessary to patch them.

Original comment by csilv...@gmail.com on 9 Nov 2008 at 6:52

GoogleCodeExporter commented 9 years ago
) ... overload new/delete in a .dll and try to use it in the .exe (or in 
another dll)

Unless the user has control of the link lines for every dll in their process 
that
uses new or delete, then it is not possible to intercept all calls to new/delete
without patching them.  This can be a big problem if you use any third-party
components, especially with inline c++ classes.

Even if we are willing to rely on the fact that libc happens to implement new by
calling malloc, the FPO optimized frames introduced by libc's new frame(s) may 
screw
up call stacks.

Original comment by dvi...@gmail.com on 9 Nov 2008 at 7:34

GoogleCodeExporter commented 9 years ago
But FPO optimized frames are going to screw up call stacks no matter what, 
right?  Or
maybe this kind of screw-up is worse somehow?  I don't know much about how FPO
affects stack frames.

I'm really pushing back on overloading new because right now patch-functions.cc
reimplements a lot of the code in tcmalloc.cc, and I want to move away from 
that. 
The more we patch, the harder the task gets.  Re-implementing the C++ operators 
will
present its own challenges, I suspect, and I really would be much happier if we 
could
get away without doing it.

How bad do you think it will be if we don't patch new and delete?

Original comment by csilv...@gmail.com on 10 Nov 2008 at 11:04

GoogleCodeExporter commented 9 years ago
) FPO optimized frames are going to screw up call stacks no matter what, right?
If we patch new and delete, then they are only screwed up when the callee of 
new or 
delete is FPO optimized.  The other way, the stack will just end somewhere 
inside 
libc even with debug code -- you won't even be able to identify the module 
immediately calling new.

) reimplements a lot of the code in tcmalloc.cc
The bits of code for dealing with new and delete in patch_functions.cc are just 
1 or 
2 line wrapper functions dispatching to tcmalloc.cc.  Example:
static void *Perftools_new(
    size_t s ) {
    void *p = cpp_alloc(s, false);
    MallocHook::InvokeNewHook(p, s);
    return p;
}

It isn't duplicating any code from tcmalloc.cc, it is just calling it.

) How bad do you think it will be if we don't patch new and delete?
I do not think I would adopt the change here.  It might be OK for a restricted 
audience working on smaller applications, but for large applications with lots 
of 
dlls I wouldn't make the compromise.  The thing about windows is, it's easy to 
suck a 
lot of dlls into your process without realizing it.

There is another issue with needing to link all dlls with the perftools dll: It 
means 
that I cannot create a DLL that will function correctly in both processes that 
do and 
do not use tcmalloc.  This can be a usage problem since it means we need to 
link two 
copies of many things and remember which one to use where.  I prefer to avoid 
needing 
to make everyone potentially using tcmalloc here understand this complication.  
As 
is, a lot of processes are transparently getting tcmalloc and it "just works" 
without 
any extra work from our developers.

Original comment by dvi...@gmail.com on 10 Nov 2008 at 11:42

GoogleCodeExporter commented 9 years ago
The code "cpp_alloc" + InvokeMallocHook is a copy of the code for operator new 
in
tcmalloc.cc.  Sure it's simple code, but it's still very possible the code will
evolve over time, and now we'll have to remember to keep such changes in sync 
between
tcmalloc.cc and patch_functions.cc.  This will be especially tricky since the 
two
implementations are used on different platforms, so we can't write a test or 
anything
to warn us if they get out of sync.

That said, it sounds like patching new/delete is important.  I'll see what we 
can
figure out to make it happen.

Original comment by csilv...@gmail.com on 11 Nov 2008 at 12:01

GoogleCodeExporter commented 9 years ago
OK, I'm making progress on patch_functions again.

Can you clarify a bit why we need to define _expand?  Here is what I've written 
now:
---
void* Perftools__expand(void *ptr, size_t size) __THROW {
  // We need to define this because internal windows functions like to
  // call into it(?).  _expand() is like realloc but doesn't move the
  // pointer.  We punt, which I guess the internal callers can handle.
  return NULL;
}
---

Is this correct?  Is there any more info I can add?  Do we have any reason to 
believe
clients can handle a NULL return ok (as opposed to, say, just dying or throwing 
an
exception), or is it just that it seems to work in practice?

Original comment by csilv...@gmail.com on 13 Nov 2008 at 3:26

GoogleCodeExporter commented 9 years ago
Sorry, one more question: I see you patch LoadLibraryExW.  Do we also need to 
patch
at library-unload time?  If not, can you explain briefly why not, so I can 
understand
this stuff better?  Thanks. :-)

Original comment by csilv...@gmail.com on 13 Nov 2008 at 3:31

GoogleCodeExporter commented 9 years ago
_expand is identical to realloc except that it cannot relocate the memory 
block.  It
returns NULL if the size of the block cannot be altered without moving it.  I 
do not
see how any client could assume that this function will succeed unless it knows 
the
new size is exactly identical to the old size, in which case there would be no 
reason
for it to call this function in the first place.  All uses I've encountered so 
far
fall back on realloc or malloc when it fails.

See
http://msdn.microsoft.com/en-us/library/wfzt8b7y.aspx

There is a danger in not hooking FreeLibrary if either of these things happen:
(1) We exhaust MAX_LIBCS because the user repeatedly loads and unloads libc
(2) Libc gets unloaded and then reloaded at the same address it used to be 
located
at.  Because it is at the same spot, we will think it is already patched when 
it is
not, which means the crt heap gets used.

There is an assertion that will detect LoadLibrary calls after FreeLibrary of 
libc
unless (2) is happening:
CHECK_EQ(libcs[i].still_loaded,true);

We will never call into a freed library since our little Perftools_* thunks will
never get called once the corresponding libc has been freed.

Original comment by dvi...@gmail.com on 13 Nov 2008 at 4:29

GoogleCodeExporter commented 9 years ago
Ok, thanks for these clarifications.  It's all making sense now.  One more 
question
before I go to bed:

You have this comment in update_libc():
---
  /* Count down so that the global namespace functions get hooked               
   * last.  We do this because as we count, we may discover that the            
   * global namespace functions are duplicates.  The global namespace           
   * functions are always in slot 0.                                            
   */
---

Can you talk a bit more about the problems you experienced, that prompted this
comment?  I ask because you're not hooking the global namespace functions last;
you're just hooking them last out of all the modules that exist at program
start-time.  When a new LoadLibraryExW event is triggered, it might load in new
modules with new libc's, but by that point the main routines have already been
patched.

It seems to me that's a problem situation the way you have the code now: at
LoadLibraryExW time, we'll call update_libc(), which will go through the new
modules and patch them.  If one of them uses the same addresses as the global
namespace (for malloc, new, etc), then this code will trigger:
                  libcs[0].Windows_ ## f = NULL;                \
                  libcs[0].origstub_ ## f =                     \
                      libcs[i].origstub_ ## f;                  \

But is that safe?  Especially given that libcs[0] has already been patched.

My guess is in practice this doesn't come up, because modules loaded via
LoadLibraryExW aren't going to use the same addresses for their functions that 
the
global namespace does.  But that's just a guess, which is why I'm asking for 
your
experience, and what prompted this code segment.

One practical question that comes out of this: would it make sense/be more 
correct to
only do this check against libcs[0] when orig_num_libcs is 0?

o

Original comment by csilv...@gmail.com on 13 Nov 2008 at 10:46

GoogleCodeExporter commented 9 years ago
Your guess is correct: A module loaded by LoadLibraryEx isn't going to alias 
the 
global namespace functions.  This is because the library that implements the 
global 
namespace functions will always be loaded by the OS before the program starts 
executing, so the very first call to update_libcs() will see them both.  Unless 
there 
is a second patcher at work (some other piece of software similar to tcmalloc), 
I 
don't think you will ever see a jump from one library into another.

) One practical question that comes out of this: would it make sense/be more 
correct 
to only do this check against libcs[0] when orig_num_libcs is 0?

This sounds OK.

Original comment by dvi...@gmail.com on 13 Nov 2008 at 3:31

GoogleCodeExporter commented 9 years ago

Original comment by csilv...@gmail.com on 14 Nov 2008 at 2:05

GoogleCodeExporter commented 9 years ago
OK, one more question about overriding new and delete.  In the file
src/windows/vc7and8.def, there's a comment that says it's possible to override 
them
just by defining our own version dllexport (via the .def file).  I guess that's 
not
accurate?  Given that, and given that we're patching these things in
patch_functions.cc now, does this mean we can get rid of vc7and8.def, as no 
longer
being needed?

Original comment by csilv...@gmail.com on 14 Nov 2008 at 7:02

GoogleCodeExporter commented 9 years ago
Yeah, I think so.

Original comment by dvi...@gmail.com on 14 Nov 2008 at 11:09

GoogleCodeExporter commented 9 years ago
ok, I've got a version of patch_functions.cc working.  It ended up being quite
different from the version that you submitted in your patch, though it's based 
on
that.  (For instance, I use templates rather than macros, and do more in static
initializers.)  I wanted to give you a chance to take a look at it, and see if 
it
looks like it should work for you.

Unfortunately, it's not a drop-in replacement for the current 
patch-functions.cc,
because it depends on some changes to tcmalloc.cc (the new 
do_realloc_with_callback
function, for instance, which lets you pass in the function to call when the 
realloc
fails).  Unfortunately, those changes are built on top of other changes, some of
which aren't quite working right, so I can't include the whole megillah.  
Instead,
I'm just attaching this one file to look at.  What do you think?

This reminds me: it would be great to have a test program, if we can, that 
exercises
the code that patches different libc's from different shared libraries.  Is that
plausible to do?

Original comment by csilv...@gmail.com on 14 Nov 2008 at 11:40

Attachments:

GoogleCodeExporter commented 9 years ago
While doing some further testing in release mode, I found some problems with the
patch_functions.cc I attached in the last comment.  I'm attaching a newer 
version
here; it replaces the last one.

Original comment by csilv...@gmail.com on 15 Nov 2008 at 3:57

Attachments: