mzhaom / gperftools

Fast, multi-threaded malloc() and nifty performance analysis tools
https://code.google.com/p/gperftools/
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Suggestion: google perftools should also support walltime profiles #105

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Google perftools currently supports the generation of CPU profiles using 
the SIGPROF signal (which is generated by using setitimer() with 
ITIMER_PROF).

My suggestion is to also support walltime profiles - which will basically
be generated by using setitimer() with ITIMER_REAL. This will generate a 
SIGALRM periodically - independent of whether the process is actually 
runnable or not.

The motivation for this is as follows. Very often, one also needs to find 
out why a program is blocked when it should really be running. This 
information cannot be found by using SIGPROF as the latter is only 
delivered when the process is running. Supporting profile generating using 
ITIMER_REAL with setitimer() will, thus, be really useful for this.

Original issue reported on code.google.com by mohit.a...@gmail.com on 5 Feb 2009 at 7:40

GoogleCodeExporter commented 9 years ago
In general, oprofile is better for that kind of analysis, if you're able to run 
it on
the binary.  that said, ITIMER_REAL may work as well.  We'll keep this under
consideration.

In the meantime, of course, you're welcome to modify the source code yourself 
for
your own projects -- maybe you can report back here how well it works!

Original comment by csilv...@gmail.com on 6 Feb 2009 at 3:14

GoogleCodeExporter commented 9 years ago
I talked to an expert on profiling here, and he had the following to say:
---
Assuming multi-threaded and a modern thread system on linux (NPTL), no that
won't work.  It *would* work for single-threaded apps.  It would also work
for LinuxThreads (but "performance tip #1 for people using LinuxThreads" is,
IMO, "STOP!!!" 8-).

The problem is that under POSIX-compliant threading systems (NPTL is,
LinuxThreads isn't), the interval timers are shared by all the threads in
the process.

This works OK for ITIMER_PROF (and ITIMER_VIRTUAL): you get one tick per
"<interval> CPU seconds consumed" which is exactly what you want.  The
thread which causes the timer to run down to 0 gets hit with the signal, and
(over a period of time, unless the threads are doing something Interesting,
e.g., periodic behaviour that syncs up with the profiler's period) the
profiler ticks will be distributed across multiple threads in proportion to
their CPU usage.  (Using a per-thread timer on Linux might produce more
accurate results in some cases, but it has some disadvantages too, e.g.,
only supported in recent kernels and won't necessarily account for
short-lived threads).

For ITIMER_REAL, you're getting one tick per "<interval> real-time seconds
consumed".  If you have multiple threads, you then have to "distribute" that
tick to the rest of the threads, and collect data from them.  This isn't
particularly easy, as there's no standard mechanism to enumerate all threads
in a process.  (One might try to use
"<interval>/N" where N is the number of threads instead of <interval>,
hoping for a uniform distribution... but that's just bogus.  First, you need
to keep track of N, and second the signal distribution won't be uniform. If
any thread in the process is active, signals like SIGALRM delivered by
ITIMER_REAL will almost certainly be delivered to the running thread.)

Note also that use of ITIMER_REAL/SIGALRM will screw up any other uses of
SIGALRM in the process.  (Some library functions use it, some people write
code to use it directly.)  (SIGPROF has the same issue... but no library
functions use SIGPROF, and very few people use it for "other stuff" on my
experience.)
---

This last point is a showstopper, I think.  Too many apps use SIGALRM for me to 
be
comfortable using it in a basic library like this (even if not by default).

There are ways around all these problems, and a wall-time profiler is possible, 
but
it works by spawning a new thread to do the timing, and is basically a totally
separate design from the profiler we have now.  In other words: it would be a 
lot of
work. :-/  I'll keep this in mind, but am lowering the priority in light of the 
fact
there's not much synergy in doing this as part of the existing perftools 
codebase.

Original comment by csilv...@gmail.com on 17 Feb 2009 at 11:05

GoogleCodeExporter commented 9 years ago
there's no oprofile for doze ... sniff, sniff

Original comment by rogerpack2005 on 21 Jan 2010 at 5:14

GoogleCodeExporter commented 9 years ago
Alas, we're a long time away supporting a cpu profiler under windows in any 
case (I
don't think it supports unix-style timer interrupts at all).  But we did add
ITIMER_REAL support in perftools 1.5 (or was in 1.4?), so I guess I can close 
this
bug actually!

Original comment by csilv...@gmail.com on 3 Feb 2010 at 10:33