lexliy / gperftools

Automatically exported from code.google.com/p/gperftools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

How do use gperftools to profile mpi program? #422

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. mpicc -o t.x t.c -g
2. mpirun_rsh --hostfile=hfile -n 2 
LD_PRELOAD=/home/niuq/tools/gperftools-2.0/gcc-build/lib/libtcmalloc.so 
HEAPPROFILE=./tprofile ./t.x
3. I can get tprofile.0004.heap  tprofile.0017.heap  tprofile.0030.heap but the 
thing is there not each processor seperate heap file. We can not distinguish 
which processor heap file comes from?
What is the expected output? What do you see instead?
Expected heap file seperately for each mpi processor. 

What version of the product are you using? On what operating system?
niuq@node020:~/code$ uname -a
Linux node020.cluster 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 
2011 x86_64 x86_64 x86_64 GNU/Linux
gperftools-2.0

Please provide any additional information below.

niuq@node020:~/code$ cat t.c
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
#include <string.h> 
#include <mpi.h> 
int main(int argc, char *argv[]) { 
    MPI_Init(&argc,&argv);
    int myrank;
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    if(myrank==1)
    {
        size_t x=sizeof(int);
        printf( "size = %lu\n", x ); 
        int i;
        for(i=0;i<10000000;i++)
        {
            unsigned long s=100;
            int* sum=(int*)malloc(s*sizeof(int));
            int k;
            for(k=1;k<s;k++) sum[k]=sum[k-1]+5;
        }
    }
    MPI_Finalize();
    return 0;
} 

mpicc -o t.x t.c -g

Original issue reported on code.google.com by niuqingp...@gmail.com on 2 Apr 2012 at 6:35

GoogleCodeExporter commented 9 years ago
Waiting on further comment. We had a possible solution over an email chat but 
it turns out that it doesn't cover all MPI implementations.

Original comment by chapp...@gmail.com on 4 May 2012 at 12:51

GoogleCodeExporter commented 9 years ago
The attached patch includes the "MPI rank" in the output filename, at least for
the openmpi MPI implementation.

Background: MPI provides methods for parallel computing beneath
multithreading, i.e. parallel processing without shared memory. A particular
job may run on many processors distributed among many nodes simultanously,
therefore getpid() does not provide a unique id.
In the attached patch, I include the MPI rank (in the openmpi MPI
implementation) explicitely when generating file names for the profiler.

A method independent of the specific MPI implementation would require explicit 
calls to the MPI library in order to obtain the process rank, which in turn 
requires initializing MPI. This has two disadvantages: for one, libtcmalloc 
would then depend on MPI (which is not that common outside the high-performance 
computing (HPC) context); for another, interacting with the MPI implementation 
from within libtcmalloc will violate assumptions made by the program being 
debugged.

My patch evaluates an environment variable defined by the openmpi 
implementation, which according to their FAQ [1] is guaranteed to be stable in 
future releases.
The patch is written in such a way that it is easy to add environment variables 
related to other MPI implementations; intel mpi provides $PMI_RANK [2].
I implemented only for openmpi, as thats the implementation I can test on our 
HPC cluster.

[1] http://www.open-mpi.org/faq/?category=running#mpi-environmental-variables
[2] https://software.intel.com/de-de/forums/topic/284007

--
c u
henning

Original comment by henning....@googlemail.com on 4 Aug 2015 at 6:35

Attachments: