score-p / scorep_binding_python

Allows tracing of python code using Score-P
Other
34 stars 11 forks source link

Tracing a C++ application with a python interface #1

Closed ocaisa closed 6 years ago

ocaisa commented 6 years ago

I have a C++ application (that I didn't write) that is intended to be interacted with via it's python interface. I can instrument the C++ code, I was wondering if I should be using this binding for the measurement?

AndreasGocht commented 6 years ago

The question is: in which way they are working together? Does the C++ program involve python or does python involve C++? Or does python just start the application?

If we are talking about a python program that involves a C++ library you may want to use the python binding with a few modifications. I like to cite therefore a mail from the Score-P dev list I recently sent:

Dear Alan, the python bindings are intended to be used with python. I have never tried with a mixture of c/c++ and python code. However, with a few modifications it might work. First you probably need to change the setup.py and replace "--nocompiler" with "--compiler" and reinstall it. Next you need to instrument your library using Score-P. Now you should be able to find the c++ functions in the related profile or trace. To get rid of the python modules you'll need to filter them. Please have a look to the Score-P manual and "SCOREP_FILTERING_FILE". As I told, I never tried. So I can't guarantee that this will work. Best, Andreas

However, I we are talking about a C++ application that involves Python (i.e. the C++ application has a main function), instrumenting the application will be sufficient. Finally if your Python application just starts the C++ application, than again instrumenting the C++ application is sufficient.

ocaisa commented 6 years ago

Thanks...that was me spamming you with the mail too :P

It's python calling a C++ library like you described. I'm currently working on your recommendation and am most of the way there I think. Once I have the final filter file I will add it here just in case anyone else happens to come across this issue.

ocaisa commented 6 years ago

For the master branch things are ok and I can get a profile for a single core. With more than 1 core, I was getting

[Score-P] src/measurement/scorep_runtime_management.c:295: Error: File does already exist: POSIX: Can't create experiment directory "..."

so I switched to the MPI2 branch. Here I get a some warnings at the beginning like

ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_opencl_mgmt_static.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_cuda_mgmt.so' from LD_PRELOAD cannot be preloaded: ignored.
...
[Score-P] src/adapters/mpi/SCOREP_Mpi_Env.c:246: Warning: MPI environment initialization request and provided level exceed MPI_THREAD_FUNNELED!
...

but for 2, 4, 8 cores I can get a measurement. However, once I try 16 I (eventually) get

[Score-P] src/measurement/profiling/scorep_profile_event_base.c:187: Error: Inconsistent profile. Stop profiling: Exit event for other than current region occurred at location 0: Expected exit for region espressopp::CellGrid::CellGrid. Exited region espressopp.pmi:__call__
[Score-P] src/measurement/profiling/scorep_profile_debug.c:223: Fatal: Cannot continue profiling. Activating core files (export SCOREP_PROFILING_ENABLE_CORE_FILES=1) might provide more insight.
[Score-P] Please report this to support@score-p.org. Thank you.
[Score-P] Try also to preserve any generated core dumps.
AndreasGocht commented 6 years ago

Dear ocaisa,

For the master branch things are ok and I can get a profile for a single core. With more than 1 core, I was getting

[Score-P] src/measurement/scorep_runtime_management.c:295: Error: File does already exist: POSIX: Can't create experiment directory "..."

so I switched to the MPI2 branch.

Well done.

so I switched to the MPI2 branch. Here I get a some warnings at the beginning like

ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_opencl_mgmt_static.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_cuda_mgmt.so' from LD_PRELOAD cannot be preloaded: ignored.

I fixed this. Should not happen with the latest commit.

...
[Score-P] src/adapters/mpi/SCOREP_Mpi_Env.c:246: Warning: MPI environment initialization request and provided level exceed MPI_THREAD_FUNNELED!
...

I think you can ignore that. For deatils please have a look at https://bitbucket.org/mpi4py/mpi4py/issues/80/mpi_thread_multiple-vs-mpi_thread_single

but for 2, 4, 8 cores I can get a measurement. However, once I try 16 I (eventually) get

[Score-P] src/measurement/profiling/scorep_profile_event_base.c:187: Error: Inconsistent profile. Stop profiling: Exit event for other than current region occurred at location 0: Expected exit for region espressopp::CellGrid::CellGrid. Exited region espressopp.pmi:__call__
[Score-P] src/measurement/profiling/scorep_profile_debug.c:223: Fatal: Cannot continue profiling. Activating core files (export SCOREP_PROFILING_ENABLE_CORE_FILES=1) might provide more insight.
[Score-P] Please report this to support@score-p.org. Thank you.
[Score-P] Try also to preserve any generated core dumps.

That is a bit tricky. Are you using threads in your code? It looks like the function espressopp::CellGrid::CellGrid is executed, while the call to python interface already returns (espressopp.pmi:__call__). Without a closer look to your code this is hard to debug.

ocaisa commented 6 years ago

To my knowledge they are not using threads. The code is available at https://github.com/espressopp/espressopp and the example I am currently testing is https://github.com/espressopp/espressopp/blob/master/examples/lennard_jones/lennard_jones.py

I don't know the code but after getting your response I wanted to see exactly how far I was getting in the example. I found that the problem specifically comes from https://github.com/espressopp/espressopp/blob/master/examples/lennard_jones/lennard_jones.py#L259 and if I comment out this line, I get the measurement. I'll ask the developers what the implication of this is. Thanks for your help!

AndreasGocht commented 6 years ago

This sounds like a good Idea. It will take a while, till I have the time to look at this closely.

Best,

Andreas

ocaisa commented 6 years ago

Just in case someone comes across this issue, I include here the filter file I am using (the application uses Boost.Python):

SCOREP_REGION_NAMES_BEGIN
  EXCLUDE boost*
  EXCLUDE numpy*
  EXCLUDE _sti*
  EXCLUDE std*
  EXCLUDE *std::*
  EXCLUDE *__gnu_cxx*
  EXCLUDE *boost*converter*
  EXCLUDE *boost*python*
  EXCLUDE log4espp*
  EXCLUDE *setupLogging
  EXCLUDE logging*
SCOREP_REGION_NAMES_END
AndreasGocht commented 6 years ago

May you give me a short outline how you build your application(to get mpi support as well)? I might have some time this weekend to look into it.

Best,

Andreas

AndreasGocht commented 6 years ago

I just realised, that there are different exceptions in the code. Unfortunately Score-P is not able to handle these.

So if something is throwing an exception and someone else is catching it, it would result in exactly this error message. However, I'll have a look if this is the case.

Best,

Andreas

ocaisa commented 6 years ago

Here are the build steps on JURECA (apologies for the verbosity, we use an automated build tool, I'm pretty sure you can ignore all the envvars apart from the CMAKE specific ones at the end):

module load intel-para/2017b
module load Score-P/3.1
module load FFTW/3.3.6
module load Python/2.7.14
module load Boost/1.65.1-Python-2.7.14
module load mpi4py/2.0.0-Python-2.7.14
module load SciPy-Stack/2017b-Python-2.7.14
module load CMake/3.9.4
module load git/2.14.2
  export BLACS_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export BLACS_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export BLACS_MT_STATIC_LIBS="libmkl_blacs_intelmpi_lp64.a"
  export BLACS_STATIC_LIBS="libmkl_blacs_intelmpi_lp64.a"
  export BLAS_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export BLAS_LAPACK_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export BLAS_LAPACK_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export BLAS_LAPACK_MT_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
  export BLAS_LAPACK_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
  export BLAS_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export BLAS_MT_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
  export BLAS_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
  export CC="mpicc"
  export CC_SEQ="icc"
  export CFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
  export CPPFLAGS="-I/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/include"
  export CXX="mpicxx"
  export CXXFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
  export CXX_SEQ="icpc"
  export F77="mpif77"
  export F77_SEQ="ifort"
  export F90="mpif90"
  export F90FLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
  export F90_SEQ="ifort"
  export FC="mpif90"
  export FCFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
  export FC_SEQ="ifort"
  export FFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
  export FFTW_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export FFTW_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export FFTW_STATIC_LIBS="libfftw3xc_intel.a,libfftw3x_cdft_lp64.a,libmkl_cdft_core.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
  export FFTW_STATIC_LIBS_MT="-fftw3xc_intel -fftw3x_cdft_lp64 -mkl_cdft_core -mkl_blacs_intelmpi_lp64 -mkl_intel_lp64 -mkl_sequential -mkl_core"
  export FFT_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export FFT_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export FFT_STATIC_LIBS="libfftw3xc_intel.a,libfftw3x_cdft_lp64.a,libmkl_cdft_core.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
  export FFT_STATIC_LIBS_MT="libfftw3xc_intel.a,libfftw3x_cdft_lp64.a,libmkl_cdft_core.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
  export LAPACK_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export LAPACK_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export LAPACK_MT_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
  export LAPACK_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
  export LDFLAGS="-L/usr/local/software/jureca/Stages/Devel-2017b/software/icc/2018.0.128-GCC-5.4.0/lib/intel64 -L/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64 -L/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/mpi4py/2.0.0-ipsmpi-2017b-Python-2.7.14/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/git/2.14.2-GCCcore-5.4.0/lib"
  export LIBBLACS="-Wl,-Bstatic -Wl,--start-group -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -Wl,-Bdynamic"
  export LIBBLACS_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -Wl,-Bdynamic"
  export LIBBLAS="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
  export LIBBLAS_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
  export LIBFFT="-Wl,-Bstatic -Wl,--start-group -lfftw3xc_intel -lfftw3x_cdft_lp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
  export LIBFFT_MT="-Wl,-Bstatic -Wl,--start-group -lfftw3xc_intel -lfftw3x_cdft_lp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
  export LIBLAPACK="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
  export LIBLAPACK_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
  export LIBLAPACK_MT_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
  export LIBLAPACK_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
  export LIBS="-liomp5 -lpthread"
  export LIBSCALAPACK="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
  export LIBSCALAPACK_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
  export LIBSCALAPACK_MT_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
  export LIBSCALAPACK_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -Wl,--end-group -Wl,-Bdynamic"
  export MPICC="mpicc"
  export MPICH_CC="icc"
  export MPICH_CXX="icpc"
  export MPICH_F77="ifort"
  export MPICH_F90="ifort"
  export MPICH_FC="ifort"
  export MPICXX="mpicxx"
  export MPIF77="mpif77"
  export MPIF90="mpif90"
  export MPIFC="mpif90"
  export MPI_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/psmpi/5.2.0-1-iccifort-2018.0.128-GCC-5.4.0/include"
  export MPI_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/psmpi/5.2.0-1-iccifort-2018.0.128-GCC-5.4.0/lib"
  export MPI_LIB_SHARED="/usr/local/software/jureca/Stages/Devel-2017b/software/psmpi/5.2.0-1-iccifort-2018.0.128-GCC-5.4.0/lib/libmpich.so"
  export MPI_LIB_STATIC=""
  export OPTFLAGS="-O2 -xHost"
  export PRECFLAGS="-ftz -fp-speculation=safe -fp-model source"
  export SCALAPACK_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
  export SCALAPACK_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
  export SCALAPACK_MT_STATIC_LIBS="libmkl_scalapack_lp64.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
  export SCALAPACK_STATIC_LIBS="libmkl_scalapack_lp64.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export CMAKE_INCLUDE_PATH="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include:/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/include:/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/include:/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/include:/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/include:/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/include"
export CMAKE_LIBRARY_PATH="/usr/local/software/jureca/Stages/Devel-2017b/software/icc/2018.0.128-GCC-5.4.0/lib/intel64:/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64:/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/mpi4py/2.0.0-ipsmpi-2017b-Python-2.7.14/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/git/2.14.2-GCCcore-5.4.0/lib"
cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/software/jureca/Stages/Devel-2017b/software/ESPResSo++/1.9.5-intel-para-2017b-Python-2.7.14-instrumented -DCMAKE_C_COMPILER='scorep-mpicc' -DCMAKE_Fortran_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_COMPILER='scorep-mpicxx' -DCMAKE_Fortran_COMPILER='scorep-mpif90' -DCMAKE_C_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_VERBOSE_MAKEFILE=ON -DEXTERNAL_MPI4PY=ON -DEXTERNAL_BOOST=ON
SCOREP_WRAPPER=off cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/software/jureca/Stages/Devel-2017b/software/ESPResSo++/1.9.5-intel-para-2017b-Python-2.7.14-instrumented -DCMAKE_C_COMPILER='scorep-mpicc' -DCMAKE_Fortran_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_COMPILER='scorep-mpicxx' -DCMAKE_Fortran_COMPILER='scorep-mpif90' -DCMAKE_C_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_VERBOSE_MAKEFILE=ON -DEXTERNAL_MPI4PY=ON -DEXTERNAL_BOOST=ON
make -j48
AndreasGocht commented 6 years ago

I finally got some time to have a look. Unfortunate I cannot reproduce the Issue with 16 processes.

It might help to filter the region espressopp::storage::DomainDecomposition::cellAdjust()

Does the program work without Score-P and 16 processes?

Best,

Andreas

ocaisa commented 6 years ago

I just retried things and I'm having a hard time reproducing the crash myself. I do have some more information though.

With a much more aggressive filter file (I filter all USR regions):

SCOREP_REGION_NAMES_BEGIN
  EXCLUDE *boost*
  EXCLUDE numpy*
  EXCLUDE _sti*
  EXCLUDE std*
  EXCLUDE *std::*
  EXCLUDE *__gnu_cxx*
  EXCLUDE *boost*converter*
  EXCLUDE *boost*python*
  EXCLUDE log4espp*
  EXCLUDE *setupLogging
  EXCLUDE logging*
  EXCLUDE *espressopp::*
  EXCLUDE *espressopp.*
  EXCLUDE operator*
SCOREP_REGION_NAMES_END

This produces a scorep.score (excerpt) like

Estimated aggregate size of event trace:                   18GB
Estimated requirements for largest trace buffer (max_buf): 787MB
Estimated memory requirements (SCOREP_TOTAL_MEMORY):       789MB
(hint: When tracing set SCOREP_TOTAL_MEMORY=789MB to avoid intermediate flushes
 or reduce requirements using USR regions filters.)

flt     type  max_buf[B]      visits  time[s] time[%] time/visit[us]  region
         ALL 824,293,278 494,228,965 35320.72   100.0          71.47  ALL
         MPI 564,658,396 254,757,369  7135.79    20.2          28.01  MPI
         USR 259,791,870 239,471,476    47.28     0.1           0.20  USR
         COM         130         120 28137.65    79.7   234480418.57  COM

         MPI 346,085,330  64,861,552   292.83     0.8           4.51  MPI_Recv
         MPI 282,122,194  64,861,552    86.58     0.2           1.33  MPI_Send
         USR 258,728,002 238,508,725    42.90     0.1           0.18  
         MPI  89,634,200  31,635,600  4256.20    12.1         134.54  MPI_Bcast
         MPI  31,624,788  29,192,112    15.48     0.0           0.53  MPI_Comm_rank
         MPI  30,767,438  28,400,712    21.01     0.1           0.74  MPI_Comm_test_inter
         MPI  13,631,592  12,582,985    10.25     0.0           0.81  MPI_Comm_size
         MPI  13,631,566  12,582,984     4.23     0.0           0.34  MPI_Comm_get_attr
         MPI  10,605,348   9,789,552  2008.67     5.7         205.19  MPI_Probe
         MPI   2,394,212     845,016   437.28     1.2         517.48  MPI_Allreduce

The uninstrumented and instrumented builds both work. The instrumented one it is surprisingly slower (24 minutes as compared to 7 minutes on 24 cores) despite all the filtering.

Removing the last 3 excludes in the filter file, the measurement is hugely slower. To get it to run in a reasonable time I changed the python script for a smaller problem size (without instrumentation it runs in ~1min). With instrumentation it takes about 15 minutes.

AndreasGocht commented 6 years ago

The instrumented one it is surprisingly slower (24 minutes as compared to 7 minutes on 24 cores) despite all the filtering.

That is normal. I assume you didn't do any compile time filtering? If not than Score-P is still called for every enter and every exit of a function. Than it has to check if the function is in the filter list, and finally it will return if the function is filtered. You won't be able to avoid that for python code, but for the C/C++ part you can do compile time filtering.

Compile time filtering will instrument just non filtered functions. If you are using gcc you can just specify --instrument-filter=:

  --instrument-filter=<file>
                  Specifies the filter file for filtering functions during
                  compile-time. Not supported by all instrumentation methods.
                  It applies the same syntax, as the one used by Score-P during
                  run-time.

For Intel I would recommend you to have a look at https://software.intel.com/en-us/node/522948 .

Best, Andreas

ocaisa commented 6 years ago

Ok, I switched to GCC to make life simpler for the filtering and I do indeed see things are much more reasonable with compile time filtering. There is probably still some work to be done to get the overhead down further, I'll take a proper look after Christmas.

Thanks for all the advice, have a good Christmas!

AndreasGocht commented 6 years ago

Thank you, you too 😄

AndreasGocht commented 6 years ago

As the last activity has been before Christmas I'll close the ticket now 😉 . Feel free to open a new one if needed.

Best,

Andreas

AndreasGocht commented 6 years ago

Btw: Tracing C and C++ libs from python should now be supported by default.