theRockLiu / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

openmp functions have incorrect counts #529

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
openmp functions have incorrect counts when using the cpu profiler. This seems 
to have been a long standing issue: 
http://openmp.org/forum/viewtopic.php?f=3&t=1213 . Nearest I can tell, this is 
because they are being improperly identified.

As an example:
$ cat temp.cpp
int main() {
    #pragma omp parallel
    {
        int x = 0;
    }
    return 0;
}
$ g++ -fopenmp -g -lprofiler temp.cpp
$ CPUPROFILE=out.prof ./a.out 
PROFILE: interrupts/evictions/bytes = 27/1/384
$ pprof --text a.out out.prof 
Using local file a.out.
Using local file out.prof.
Removing _L_unlock_16 from all stack traces.
Total: 27 samples
      27 100.0% 100.0%       27 100.0% omp_get_num_procs
       0   0.0% 100.0%       27 100.0% GOMP_taskwait
       0   0.0% 100.0%       27 100.0% __clone
       0   0.0% 100.0%       27 100.0% start_thread

What is the expected output? What do you see instead?
I do not expect omp_get_num_procs to dominate the traces.

What version of the product are you using? On what operating system?
$ md5sum gperftools-2.0.tar.gz 
13f6e8961bc6a26749783137995786b6  gperftools-2.0.tar.gz
$ uname -a 
Linux ghc33.ghc.andrew.cmu.edu 2.6.32-279.22.1.el6.x86_64 #1 SMP Sun Jan 13 
09:21:40 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

Please provide any additional information below.
If I look at GOMP_taskwait (the functions that calls omp_get_num_procs), I see 
the following output in the assembly that leads me to believe that the 
functions are not getting properly identified:

$ pprof --disas=GOMP_taskwait a.out out.prof 
ROUTINE ====================== GOMP_taskwait
     0     27 samples (flat, cumulative) 100.0% of total
... <snip> ...
     .     16        84ef: callq  9ca0 <omp_get_num_procs+0x540>
     .      .        84f4: nopl   0x0(%rax)
     .      .        84f8: mov    %fs:0x10(%rbx),%r13
     .      .        84fd: mov    %r12,%rdi
     .      .        8500: callq  *%rbp
     .      .        8502: lea    0x80(%r13),%rdi
     .     11        8509: callq  9b40 <omp_get_num_procs+0x3e0>
     .      .        850e: mov    %r14,%rdi
     .      .        8511: callq  9ca0 <omp_get_num_procs+0x540>
... <snip> ...

Also, I have to confess that I'm compiling with -static-libstdc++, but I don't 
have an easy way to avoid that at the moment and I don't believe it is related. 

Original issue reported on code.google.com by awre...@gmail.com on 11 May 2013 at 2:38

GoogleCodeExporter commented 9 years ago
It looks like I spoke too soon. It appears that the library doesn't have 
symbols, which would explain why the cpu profiler was unable to find them.

$ ldd a.out | grep gomp
    libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x00007f674c22f000)
$ nm /usr/lib64/libgomp.so.1
nm: /usr/lib64/libgomp.so.1: no symbols
$ objdump -d /usr/lib64/libgomp.so.1 
... <snip> ...
   8270:       e9 fb 16 00 00          jmpq   9970 <omp_get_num_procs+0x210>
    8275:       80 7f 5c 00             cmpb   $0x0,0x5c(%rdi)
    8279:       48 c7 47 08 00 00 00    movq   $0x0,0x8(%rdi)
    8280:       00 
    8281:       0f 84 f0 fe ff ff       je     8177 <GOMP_taskwait+0x3d7>
    8287:       48 83 c7 60             add    $0x60,%rdi
    828b:       44 89 e8                mov    %r13d,%eax
    828e:       f0 44 0f b1 3f          lock cmpxchg %r15d,(%rdi)
    8293:       0f 84 de fe ff ff       je     8177 <GOMP_taskwait+0x3d7>
    8299:       e8 b2 15 00 00          callq  9850 <omp_get_num_procs+0xf0>
    829e:       e9 d4 fe ff ff          jmpq   8177 <GOMP_taskwait+0x3d7>
... <snip> ...

I'd probably hope that if this were encountered, some sort of 'failure to 
identify function name' or the function address would be emitted, but the lack 
of symbols does not appear to the fault of the profiler.

Original comment by awre...@gmail.com on 11 May 2013 at 3:11

GoogleCodeExporter commented 9 years ago
Your test program appears too short to identify any bottlenecks.

Can you please retry something longer ?

Original comment by alkondratenko on 6 Jul 2013 at 8:20

GoogleCodeExporter commented 9 years ago
The test program is intentionally short to demonstrate the issue and has no
bottlenecks.

Please see my previous comment: the issue is that the library doesn't have
symbols.

Original comment by awre...@gmail.com on 6 Jul 2013 at 9:34

GoogleCodeExporter commented 9 years ago
Not at all. nm is not supposed to work on shared libraries. And your test is 
too short to provide any relevant profile. Please consider reading how sampling 
profiling works.

Running somebody's program at link you provided gives me this:

    2944  97.8%  97.8%     2944  97.8% CALC_VALS._omp_fn.0
      64   2.1% 100.0%       64   2.1% omp_get_num_procs
       1   0.0% 100.0%        1   0.0% INITIALISE

Which clearly shows that majority of time is spent in user program. 
omp_get_num_procs indeed seems a bit off. But it appears that there's not much 
we can do about that in gperftools.

Original comment by alkondratenko on 6 Jul 2013 at 11:10

GoogleCodeExporter commented 9 years ago
omp_get_num_procs isn't a bit off, it *isn't* used by the program. Other 
functions are being misattributed to it:

$ pprof --disas=GOMP_taskwait solver out.prof 
ROUTINE ====================== GOMP_taskwait
     1   1330 samples (flat, cumulative) 96.2% of total
... <snip> ...
     .     16        84ef: callq  9ca0 <omp_get_num_procs+0x540>
     .      .        84f4: nopl   0x0(%rax)
     .      .        84f8: mov    %fs:0x10(%rbx),%r13
     .      .        84fd: mov    %r12,%rdi
     .    695        8500: callq  *%rbp
     .      .        8502: lea    0x80(%r13),%rdi
     .    391        8509: callq  9b40 <omp_get_num_procs+0x3e0>
     .      .        850e: mov    %r14,%rdi
     .    156        8511: callq  9ca0 <omp_get_num_procs+0x540>
... <snip> ...

Also, gdb can't get the function name for omp either: 
https://gist.github.com/awreece/5941661 (it has ?? for the name). I could try 
to come up with a better example, but I think the snippet above clearly 
demonstrates the problem. Sorry about not using the right flag to nm, nm -D 
clearly shows some symbols. Its no longer clear to me the problem is "it 
doesn't export symbols" (I'll have to update the blog post later).

For a longer explanation: 
http://codearcana.com/posts/2013/05/10/why-is-omp_get_num_procs-so-slow.html

Original comment by awre...@gmail.com on 7 Jul 2013 at 12:05