Closed ocaisa closed 6 years ago
The question is: in which way they are working together? Does the C++ program involve python or does python involve C++? Or does python just start the application?
If we are talking about a python program that involves a C++ library you may want to use the python binding with a few modifications. I like to cite therefore a mail from the Score-P dev list I recently sent:
Dear Alan, the python bindings are intended to be used with python. I have never tried with a mixture of c/c++ and python code. However, with a few modifications it might work. First you probably need to change the setup.py and replace "--nocompiler" with "--compiler" and reinstall it. Next you need to instrument your library using Score-P. Now you should be able to find the c++ functions in the related profile or trace. To get rid of the python modules you'll need to filter them. Please have a look to the Score-P manual and "SCOREP_FILTERING_FILE". As I told, I never tried. So I can't guarantee that this will work. Best, Andreas
However, I we are talking about a C++ application that involves Python (i.e. the C++ application has a main function), instrumenting the application will be sufficient. Finally if your Python application just starts the C++ application, than again instrumenting the C++ application is sufficient.
Thanks...that was me spamming you with the mail too :P
It's python calling a C++ library like you described. I'm currently working on your recommendation and am most of the way there I think. Once I have the final filter file I will add it here just in case anyone else happens to come across this issue.
For the master branch things are ok and I can get a profile for a single core. With more than 1 core, I was getting
[Score-P] src/measurement/scorep_runtime_management.c:295: Error: File does already exist: POSIX: Can't create experiment directory "..."
so I switched to the MPI2 branch. Here I get a some warnings at the beginning like
ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_opencl_mgmt_static.so' from LD_PRELOAD cannot be preloaded: ignored.
ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_cuda_mgmt.so' from LD_PRELOAD cannot be preloaded: ignored.
...
[Score-P] src/adapters/mpi/SCOREP_Mpi_Env.c:246: Warning: MPI environment initialization request and provided level exceed MPI_THREAD_FUNNELED!
...
but for 2, 4, 8 cores I can get a measurement. However, once I try 16 I (eventually) get
[Score-P] src/measurement/profiling/scorep_profile_event_base.c:187: Error: Inconsistent profile. Stop profiling: Exit event for other than current region occurred at location 0: Expected exit for region espressopp::CellGrid::CellGrid. Exited region espressopp.pmi:__call__
[Score-P] src/measurement/profiling/scorep_profile_debug.c:223: Fatal: Cannot continue profiling. Activating core files (export SCOREP_PROFILING_ENABLE_CORE_FILES=1) might provide more insight.
[Score-P] Please report this to support@score-p.org. Thank you.
[Score-P] Try also to preserve any generated core dumps.
Dear ocaisa,
For the master branch things are ok and I can get a profile for a single core. With more than 1 core, I was getting
[Score-P] src/measurement/scorep_runtime_management.c:295: Error: File does already exist: POSIX: Can't create experiment directory "..."
so I switched to the MPI2 branch.
Well done.
so I switched to the MPI2 branch. Here I get a some warnings at the beginning like
ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_opencl_mgmt_static.so' from LD_PRELOAD cannot be preloaded: ignored. ERROR: ld.so: object '/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib/libscorep_adapter_cuda_mgmt.so' from LD_PRELOAD cannot be preloaded: ignored.
I fixed this. Should not happen with the latest commit.
... [Score-P] src/adapters/mpi/SCOREP_Mpi_Env.c:246: Warning: MPI environment initialization request and provided level exceed MPI_THREAD_FUNNELED! ...
I think you can ignore that. For deatils please have a look at https://bitbucket.org/mpi4py/mpi4py/issues/80/mpi_thread_multiple-vs-mpi_thread_single
but for 2, 4, 8 cores I can get a measurement. However, once I try 16 I (eventually) get
[Score-P] src/measurement/profiling/scorep_profile_event_base.c:187: Error: Inconsistent profile. Stop profiling: Exit event for other than current region occurred at location 0: Expected exit for region espressopp::CellGrid::CellGrid. Exited region espressopp.pmi:__call__ [Score-P] src/measurement/profiling/scorep_profile_debug.c:223: Fatal: Cannot continue profiling. Activating core files (export SCOREP_PROFILING_ENABLE_CORE_FILES=1) might provide more insight. [Score-P] Please report this to support@score-p.org. Thank you. [Score-P] Try also to preserve any generated core dumps.
That is a bit tricky. Are you using threads in your code? It looks like the function
espressopp::CellGrid::CellGrid
is executed, while the call to python interface already returns (espressopp.pmi:__call__
). Without a closer look to your code this is hard to debug.
To my knowledge they are not using threads. The code is available at https://github.com/espressopp/espressopp and the example I am currently testing is https://github.com/espressopp/espressopp/blob/master/examples/lennard_jones/lennard_jones.py
I don't know the code but after getting your response I wanted to see exactly how far I was getting in the example. I found that the problem specifically comes from https://github.com/espressopp/espressopp/blob/master/examples/lennard_jones/lennard_jones.py#L259 and if I comment out this line, I get the measurement. I'll ask the developers what the implication of this is. Thanks for your help!
This sounds like a good Idea. It will take a while, till I have the time to look at this closely.
Best,
Andreas
Just in case someone comes across this issue, I include here the filter file I am using (the application uses Boost.Python):
SCOREP_REGION_NAMES_BEGIN
EXCLUDE boost*
EXCLUDE numpy*
EXCLUDE _sti*
EXCLUDE std*
EXCLUDE *std::*
EXCLUDE *__gnu_cxx*
EXCLUDE *boost*converter*
EXCLUDE *boost*python*
EXCLUDE log4espp*
EXCLUDE *setupLogging
EXCLUDE logging*
SCOREP_REGION_NAMES_END
May you give me a short outline how you build your application(to get mpi support as well)? I might have some time this weekend to look into it.
Best,
Andreas
I just realised, that there are different exceptions in the code. Unfortunately Score-P is not able to handle these.
So if something is throwing an exception and someone else is catching it, it would result in exactly this error message. However, I'll have a look if this is the case.
Best,
Andreas
Here are the build steps on JURECA (apologies for the verbosity, we use an automated build tool, I'm pretty sure you can ignore all the envvars apart from the CMAKE specific ones at the end):
module load intel-para/2017b
module load Score-P/3.1
module load FFTW/3.3.6
module load Python/2.7.14
module load Boost/1.65.1-Python-2.7.14
module load mpi4py/2.0.0-Python-2.7.14
module load SciPy-Stack/2017b-Python-2.7.14
module load CMake/3.9.4
module load git/2.14.2
export BLACS_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export BLACS_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export BLACS_MT_STATIC_LIBS="libmkl_blacs_intelmpi_lp64.a"
export BLACS_STATIC_LIBS="libmkl_blacs_intelmpi_lp64.a"
export BLAS_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export BLAS_LAPACK_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export BLAS_LAPACK_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export BLAS_LAPACK_MT_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
export BLAS_LAPACK_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export BLAS_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export BLAS_MT_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
export BLAS_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export CC="mpicc"
export CC_SEQ="icc"
export CFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
export CPPFLAGS="-I/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/include -I/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/include"
export CXX="mpicxx"
export CXXFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
export CXX_SEQ="icpc"
export F77="mpif77"
export F77_SEQ="ifort"
export F90="mpif90"
export F90FLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
export F90_SEQ="ifort"
export FC="mpif90"
export FCFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
export FC_SEQ="ifort"
export FFLAGS="-O2 -xHost -ftz -fp-speculation=safe -fp-model source"
export FFTW_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export FFTW_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export FFTW_STATIC_LIBS="libfftw3xc_intel.a,libfftw3x_cdft_lp64.a,libmkl_cdft_core.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export FFTW_STATIC_LIBS_MT="-fftw3xc_intel -fftw3x_cdft_lp64 -mkl_cdft_core -mkl_blacs_intelmpi_lp64 -mkl_intel_lp64 -mkl_sequential -mkl_core"
export FFT_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export FFT_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export FFT_STATIC_LIBS="libfftw3xc_intel.a,libfftw3x_cdft_lp64.a,libmkl_cdft_core.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export FFT_STATIC_LIBS_MT="libfftw3xc_intel.a,libfftw3x_cdft_lp64.a,libmkl_cdft_core.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export LAPACK_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export LAPACK_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export LAPACK_MT_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
export LAPACK_STATIC_LIBS="libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export LDFLAGS="-L/usr/local/software/jureca/Stages/Devel-2017b/software/icc/2018.0.128-GCC-5.4.0/lib/intel64 -L/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64 -L/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/mpi4py/2.0.0-ipsmpi-2017b-Python-2.7.14/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/lib -L/usr/local/software/jureca/Stages/Devel-2017b/software/git/2.14.2-GCCcore-5.4.0/lib"
export LIBBLACS="-Wl,-Bstatic -Wl,--start-group -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -Wl,-Bdynamic"
export LIBBLACS_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -Wl,-Bdynamic"
export LIBBLAS="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
export LIBBLAS_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
export LIBFFT="-Wl,-Bstatic -Wl,--start-group -lfftw3xc_intel -lfftw3x_cdft_lp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
export LIBFFT_MT="-Wl,-Bstatic -Wl,--start-group -lfftw3xc_intel -lfftw3x_cdft_lp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
export LIBLAPACK="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
export LIBLAPACK_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
export LIBLAPACK_MT_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
export LIBLAPACK_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
export LIBS="-liomp5 -lpthread"
export LIBSCALAPACK="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic"
export LIBSCALAPACK_MT="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
export LIBSCALAPACK_MT_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -Wl,--end-group -Wl,-Bdynamic -liomp5 -lpthread"
export LIBSCALAPACK_ONLY="-Wl,-Bstatic -Wl,--start-group -lmkl_scalapack_lp64 -Wl,--end-group -Wl,-Bdynamic"
export MPICC="mpicc"
export MPICH_CC="icc"
export MPICH_CXX="icpc"
export MPICH_F77="ifort"
export MPICH_F90="ifort"
export MPICH_FC="ifort"
export MPICXX="mpicxx"
export MPIF77="mpif77"
export MPIF90="mpif90"
export MPIFC="mpif90"
export MPI_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/psmpi/5.2.0-1-iccifort-2018.0.128-GCC-5.4.0/include"
export MPI_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/psmpi/5.2.0-1-iccifort-2018.0.128-GCC-5.4.0/lib"
export MPI_LIB_SHARED="/usr/local/software/jureca/Stages/Devel-2017b/software/psmpi/5.2.0-1-iccifort-2018.0.128-GCC-5.4.0/lib/libmpich.so"
export MPI_LIB_STATIC=""
export OPTFLAGS="-O2 -xHost"
export PRECFLAGS="-ftz -fp-speculation=safe -fp-model source"
export SCALAPACK_INC_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include"
export SCALAPACK_LIB_DIR="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64"
export SCALAPACK_MT_STATIC_LIBS="libmkl_scalapack_lp64.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_intel_thread.a,libmkl_core.a,libiomp5.a,libpthread.a"
export SCALAPACK_STATIC_LIBS="libmkl_scalapack_lp64.a,libmkl_blacs_intelmpi_lp64.a,libmkl_intel_lp64.a,libmkl_sequential.a,libmkl_core.a"
export CMAKE_INCLUDE_PATH="/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/include:/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/include:/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/include:/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/include:/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/include:/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/include"
export CMAKE_LIBRARY_PATH="/usr/local/software/jureca/Stages/Devel-2017b/software/icc/2018.0.128-GCC-5.4.0/lib/intel64:/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/mkl/lib/intel64:/usr/local/software/jureca/Stages/Devel-2017b/software/imkl/2018.0.128-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/Score-P/3.1-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/FFTW/3.3.6-ipsmpi-2017b/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/Python/2.7.14-GCCcore-5.4.0/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/Boost/1.65.1-ipsmpi-2017b-Python-2.7.14/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/mpi4py/2.0.0-ipsmpi-2017b-Python-2.7.14/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/SciPy-Stack/2017b-intel-para-2017b-Python-2.7.14/lib:/usr/local/software/jureca/Stages/Devel-2017b/software/git/2.14.2-GCCcore-5.4.0/lib"
cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/software/jureca/Stages/Devel-2017b/software/ESPResSo++/1.9.5-intel-para-2017b-Python-2.7.14-instrumented -DCMAKE_C_COMPILER='scorep-mpicc' -DCMAKE_Fortran_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_COMPILER='scorep-mpicxx' -DCMAKE_Fortran_COMPILER='scorep-mpif90' -DCMAKE_C_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_VERBOSE_MAKEFILE=ON -DEXTERNAL_MPI4PY=ON -DEXTERNAL_BOOST=ON
SCOREP_WRAPPER=off cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/software/jureca/Stages/Devel-2017b/software/ESPResSo++/1.9.5-intel-para-2017b-Python-2.7.14-instrumented -DCMAKE_C_COMPILER='scorep-mpicc' -DCMAKE_Fortran_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_CXX_COMPILER='scorep-mpicxx' -DCMAKE_Fortran_COMPILER='scorep-mpif90' -DCMAKE_C_FLAGS='-O2 -xHost -ftz -fp-speculation=safe -fp-model source' -DCMAKE_VERBOSE_MAKEFILE=ON -DEXTERNAL_MPI4PY=ON -DEXTERNAL_BOOST=ON
make -j48
I finally got some time to have a look. Unfortunate I cannot reproduce the Issue with 16 processes.
It might help to filter the region espressopp::storage::DomainDecomposition::cellAdjust()
Does the program work without Score-P and 16 processes?
Best,
Andreas
I just retried things and I'm having a hard time reproducing the crash myself. I do have some more information though.
With a much more aggressive filter file (I filter all USR regions):
SCOREP_REGION_NAMES_BEGIN
EXCLUDE *boost*
EXCLUDE numpy*
EXCLUDE _sti*
EXCLUDE std*
EXCLUDE *std::*
EXCLUDE *__gnu_cxx*
EXCLUDE *boost*converter*
EXCLUDE *boost*python*
EXCLUDE log4espp*
EXCLUDE *setupLogging
EXCLUDE logging*
EXCLUDE *espressopp::*
EXCLUDE *espressopp.*
EXCLUDE operator*
SCOREP_REGION_NAMES_END
This produces a scorep.score (excerpt) like
Estimated aggregate size of event trace: 18GB
Estimated requirements for largest trace buffer (max_buf): 787MB
Estimated memory requirements (SCOREP_TOTAL_MEMORY): 789MB
(hint: When tracing set SCOREP_TOTAL_MEMORY=789MB to avoid intermediate flushes
or reduce requirements using USR regions filters.)
flt type max_buf[B] visits time[s] time[%] time/visit[us] region
ALL 824,293,278 494,228,965 35320.72 100.0 71.47 ALL
MPI 564,658,396 254,757,369 7135.79 20.2 28.01 MPI
USR 259,791,870 239,471,476 47.28 0.1 0.20 USR
COM 130 120 28137.65 79.7 234480418.57 COM
MPI 346,085,330 64,861,552 292.83 0.8 4.51 MPI_Recv
MPI 282,122,194 64,861,552 86.58 0.2 1.33 MPI_Send
USR 258,728,002 238,508,725 42.90 0.1 0.18
MPI 89,634,200 31,635,600 4256.20 12.1 134.54 MPI_Bcast
MPI 31,624,788 29,192,112 15.48 0.0 0.53 MPI_Comm_rank
MPI 30,767,438 28,400,712 21.01 0.1 0.74 MPI_Comm_test_inter
MPI 13,631,592 12,582,985 10.25 0.0 0.81 MPI_Comm_size
MPI 13,631,566 12,582,984 4.23 0.0 0.34 MPI_Comm_get_attr
MPI 10,605,348 9,789,552 2008.67 5.7 205.19 MPI_Probe
MPI 2,394,212 845,016 437.28 1.2 517.48 MPI_Allreduce
The uninstrumented and instrumented builds both work. The instrumented one it is surprisingly slower (24 minutes as compared to 7 minutes on 24 cores) despite all the filtering.
Removing the last 3 excludes in the filter file, the measurement is hugely slower. To get it to run in a reasonable time I changed the python script for a smaller problem size (without instrumentation it runs in ~1min). With instrumentation it takes about 15 minutes.
The instrumented one it is surprisingly slower (24 minutes as compared to 7 minutes on 24 cores) despite all the filtering.
That is normal. I assume you didn't do any compile time filtering? If not than Score-P is still called for every enter and every exit of a function. Than it has to check if the function is in the filter list, and finally it will return if the function is filtered. You won't be able to avoid that for python code, but for the C/C++ part you can do compile time filtering.
Compile time filtering will instrument just non filtered functions. If you are using gcc you can just specify --instrument-filter=
:
--instrument-filter=<file>
Specifies the filter file for filtering functions during
compile-time. Not supported by all instrumentation methods.
It applies the same syntax, as the one used by Score-P during
run-time.
For Intel I would recommend you to have a look at https://software.intel.com/en-us/node/522948 .
Best, Andreas
Ok, I switched to GCC to make life simpler for the filtering and I do indeed see things are much more reasonable with compile time filtering. There is probably still some work to be done to get the overhead down further, I'll take a proper look after Christmas.
Thanks for all the advice, have a good Christmas!
Thank you, you too 😄
As the last activity has been before Christmas I'll close the ticket now 😉 . Feel free to open a new one if needed.
Best,
Andreas
Btw: Tracing C and C++ libs from python should now be supported by default.
I have a C++ application (that I didn't write) that is intended to be interacted with via it's python interface. I can instrument the C++ code, I was wondering if I should be using this binding for the measurement?