votca / xtp

GW-BSE for excited state Quantum Chemistry in a Gaussian Orbital basis, electronic spectroscopy with QM/MM, charge and energy dynamics in complex molecular systems
29 stars 16 forks source link

MKL backend not used in Eigen #463

Closed baumeier closed 4 years ago

baumeier commented 4 years ago

It appears as if the step of Diagonalizing the two-particle Hamiltonian in the exact FAA approach for the self-energy is currently only running in serial. Seen both with gcc and native Eigen backend, as well as with icc and mkl.

JensWehner commented 4 years ago

the rest is parallel? It sounds a bit like a problem with the linking to MKL. With native Eigen matrix diagonalisation is not parallelized.

baumeier commented 4 years ago

Rest is running in parallel

JensWehner commented 4 years ago

hmmm.. what kind of threading is mkl using?

baumeier commented 4 years ago

cat CMakeCache.txt | grep MKL

//The thread layer to choose for MKL
MKL_THREAD_LAYER:STRING=Intel OpenMP
MKL_ThreadLayer_LINK_LIBRARY:FILEPATH=/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
MKL_ThreadLayer_STATIC_LINK_LIBRARY:FILEPATH=/opt/intel/mkl/lib/intel64/libmkl_intel_thread.a
MKL_ThreadingLibrary_LINK_LIBRARY:FILEPATH=/usr/lib/x86_64-linux-gnu/libiomp5.so
MKL_ThreadingLibrary_STATIC_LINK_LIBRARY:FILEPATH=/opt/intel/lib/intel64/libiomp5.a

Is it a problem that it uses the system libiomp5.so?

JensWehner commented 4 years ago

yes that could be, I do not know.

JensWehner commented 4 years ago

where is it coming from that is not the openmp library?

baumeier commented 4 years ago

I manually set

MKL_ThreadingLibrary_LINK_LIBRARY:FILEPATH=/opt/intel/mkl/lib/intel64/libiomp5.so

and checked with ldd

    libvotca_xtp.so.7 => /home/bbaumeie/work/votca_intel/lib/libvotca_xtp.so.7 (0x00007f730f3f1000)
    libvotca_csg.so.7 => /home/bbaumeie/work/votca_intel/lib/libvotca_csg.so.7 (0x00007f730f24b000)
    libvotca_tools.so.7 => /home/bbaumeie/work/votca_intel/lib/libvotca_tools.so.7 (0x00007f730f108000)
    libboost_program_options.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.71.0 (0x00007f730f04d000)
    libmkl_core.so => /opt/intel//mkl/lib/intel64/libmkl_core.so (0x00007f730ad2d000)
    libmkl_intel_lp64.so => /opt/intel//mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f730a1bf000)
    libmkl_intel_thread.so => /opt/intel//mkl/lib/intel64/libmkl_intel_thread.so (0x00007f7307c53000)
    libiomp5.so => /opt/intel//lib/intel64/libiomp5.so (0x00007f7307863000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f730785d000)
    libfftw3.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007f7307657000)
    libhdf5_cpp.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_cpp.so.103 (0x00007f73075da000)
    libhdf5_serial.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007f730725b000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7307238000)
    libsz.so.2 => /usr/lib/x86_64-linux-gnu/libsz.so.2 (0x00007f7307233000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f7307217000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f73070c8000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7306ee7000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7306eca000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7306cd8000)
    libxc.so.5 => /usr/lib/x86_64-linux-gnu/libxc.so.5 (0x00007f730668d000)
    libboost_filesystem.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.71.0 (0x00007f730666f000)
    libboost_system.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.71.0 (0x00007f730666a000)
    libboost_timer.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_timer.so.1.71.0 (0x00007f7306660000)
    libboost_chrono.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.71.0 (0x00007f7306650000)
    libimf.so => /opt/intel//lib/intel64/libimf.so (0x00007f73060b2000)
    libsvml.so => /opt/intel//lib/intel64/libsvml.so (0x00007f73046b7000)
    libirng.so => /opt/intel//lib/intel64/libirng.so (0x00007f730434d000)
    libintlc.so.5 => /opt/intel//lib/intel64/libintlc.so.5 (0x00007f73040d6000)
    libgromacs.so.4 => /opt/gromacs-2019/lib/libgromacs.so.4 (0x00007f7302be3000)
    libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f7302bb5000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f730fc1d000)
    libaec.so.0 => /usr/lib/x86_64-linux-gnu/libaec.so.0 (0x00007f7302bac000)
    libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f730296f000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7302964000)
    libfftw3f.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3f.so.3 (0x00007f7302754000)
    libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f730238e000)
    liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f7301cc6000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f7301c84000)
    libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f7301c77000)
    libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f7301c6c000)
    libgfortran.so.5 => /usr/lib/x86_64-linux-gnu/libgfortran.so.5 (0x00007f73019a8000)
    libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f730195e000)

Funny thing is that it also seem to link to a system blas/lapack?

JensWehner commented 4 years ago

Can you upload the whole cmake log? I think our FindOpenmp does sth wrong

baumeier commented 4 years ago

What is the flag to build without GMX?

JensWehner commented 4 years ago

I do not know @junghans

JensWehner commented 4 years ago

yes most likely the blas calls are overloaded with the wrong library.

baumeier commented 4 years ago

That's why I thought it is coming from the linked GMX.

junghans commented 4 years ago

-DCMAKE_DISABLE_FIND_PACKAGE_GROMACS=ON.

baumeier commented 4 years ago

Even without the link to GMX, still FAA is serial.

JensWehner commented 4 years ago

hmm what does ldd say?

junghans commented 4 years ago

I honestly think we should enable MKL by injecting -mkl in the flags instead of trying to detect the right library.

JensWehner commented 4 years ago

Yes I agree, but I somehow feel, we should have the openmp detection before mkl.

JensWehner commented 4 years ago

I honestly think we should enable MKL by injecting -mkl in the flags instead of trying to detect the right library.

Which flags?

junghans commented 4 years ago

Just -mkl see https://www.its.hku.hk/services/research/hpc/software/mkl

baumeier commented 4 years ago

@JensWehner

ldd ~/work/votclinux-vdso.so.1 (0x00007ffeedbcc000)
    libvotca_xtp.so.7 => /home/bbaumeie/work/votca_intel/lib/libvotca_xtp.so.7 (0x00007f913d87a000)
    libvotca_csg.so.7 => /home/bbaumeie/work/votca_intel/lib/libvotca_csg.so.7 (0x00007f913d6db000)
    libvotca_tools.so.7 => /home/bbaumeie/work/votca_intel/lib/libvotca_tools.so.7 (0x00007f913d598000)
    libboost_program_options.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.71.0 (0x00007f913d4dd000)
    libmkl_core.so => /opt/intel//mkl/lib/intel64/libmkl_core.so (0x00007f91391bd000)
    libmkl_intel_lp64.so => /opt/intel//mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f913864f000)
    libmkl_intel_thread.so => /opt/intel//mkl/lib/intel64/libmkl_intel_thread.so (0x00007f91360e3000)
    libiomp5.so => /opt/intel//lib/intel64/libiomp5.so (0x00007f9135cf3000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9135ced000)
    libfftw3.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007f9135ae7000)
    libhdf5_cpp.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_cpp.so.103 (0x00007f9135a6a000)
    libhdf5_serial.so.103 => /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103 (0x00007f91356eb000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f91356c8000)
    libsz.so.2 => /usr/lib/x86_64-linux-gnu/libsz.so.2 (0x00007f91356c3000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f91356a7000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9135558000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9135377000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f913535a000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9135168000)
    libxc.so.5 => /usr/lib/x86_64-linux-gnu/libxc.so.5 (0x00007f9134b1d000)
    libboost_filesystem.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.71.0 (0x00007f9134aff000)
    libboost_system.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.71.0 (0x00007f9134afa000)
    libboost_timer.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_timer.so.1.71.0 (0x00007f9134af0000)
    libboost_chrono.so.1.71.0 => /usr/lib/x86_64-linux-gnu/libboost_chrono.so.1.71.0 (0x00007f9134ae0000)
    libimf.so => /opt/intel//lib/intel64/libimf.so (0x00007f9134542000)
    libsvml.so => /opt/intel//lib/intel64/libsvml.so (0x00007f9132b47000)
    libirng.so => /opt/intel//lib/intel64/libirng.so (0x00007f91327dd000)
    libintlc.so.5 => /opt/intel//lib/intel64/libintlc.so.5 (0x00007f9132566000)
    libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f9132538000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f913e0a6000)
    libaec.so.0 => /usr/lib/x86_64-linux-gnu/libaec.so.0 (0x00007f913252f000)
JensWehner commented 4 years ago

Concerning injecting the flag, that seems only to work with icc and not the other compilers then.

baumeier commented 4 years ago

It is really just in this line:

  XTP_LOG(Log::error, _log)
      << TimeStamp() << " Diagonalizing two-particle Hamiltonian "
      << std::flush;
  Eigen::SelfAdjointEigenSolver<Eigen::MatrixXd> es(C);  // Uses lower triangle
  XTP_LOG(Log::error, _log)
      << TimeStamp() << " Diagonalization done " << std::flush;

Any ideas for how to fix this? Maybe @felipeZ has seen this before?

JensWehner commented 4 years ago

as I said MKL should overload this function call and the mkl should be paralleiized.

So first to check is if it links to the proper mkl library. The eigen version is not parallelized.

baumeier commented 4 years ago

That much is clear. Care to elaborate HOW to check this?

JensWehner commented 4 years ago

it is relatively easy with intel vtune. With perf you should also get an idea. So basically a profiler will tell you all function calls

baumeier commented 4 years ago

perf report

Overhead  Command      Shared Object                       Symbol
  46,15%  xtp_tools    libvotca_xtp.so.7                   [.] votca::xtp::Sigma_Exact::CalcCorrelationDiagElement
  11,87%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::gebp_kernel<double, double, long, Eigen::internal::blas_data_mapper<double, lo
  11,59%  xtp_tools    libc-2.31.so                        [.] _int_malloc
   4,72%  xtp_tools    libc-2.31.so                        [.] _int_free
   4,61%  xtp_tools    libvotca_xtp.so.7                   [.] _INTERNALc0808f19::Eigen::internal::tridiagonal_qr_step<0, double, double, long>
   2,70%  xtp_tools    libc-2.31.so                        [.] malloc
   2,18%  xtp_tools    libiomp5.so                         [.] _INTERNAL1f496181::__kmp_wait_template<kmp_flag_64<false, true>, true, false, true>
   1,81%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::MatrixBase<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, -1, -1, false> >::appl
   1,62%  xtp_tools    libc-2.31.so                        [.] unlink_chunk.isra.0
   1,41%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::general_matrix_vector_product<long, double, Eigen::internal::const_blas_data_m
   1,27%  xtp_tools    libc-2.31.so                        [.] cfree@GLIBC_2.2.5
   0,97%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::redux_impl<Eigen::internal::scalar_sum_op<double, double>, Eigen::internal::re
   0,92%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::SelfAdjointView<Eigen::Block<Eigen::Matrix<double, -1, -1, 0, -1, -1>, -1, -1, false>, 1
   0,82%  xtp_tools    libvotca_xtp.so.7                   [.] votca::xtp::Sigma_Exact::CalcCorrelationOffDiagElement
   0,79%  xtp_tools    libvotca_xtp.so.7                   [.] votca::xtp::TCMatrix::FillThreeCenterRepBlock
   0,65%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::selfadjoint_matrix_vector_product<double, long, 0, 1, false, false, 0>::run
   0,44%  xtp_tools    libvotca_xtp.so.7                   [.] votca::xtp::Sigma_Exact::CalcResidues
   0,43%  xtp_tools    libiomp5.so                         [.] _INTERNAL1f496181::__kmp_hyper_barrier_gather
   0,34%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::gemm_pack_rhs<double, long, Eigen::internal::const_blas_data_mapper<double, lo
   0,32%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::Tensor<double, 3, 0, long>::operator()<long>
   0,29%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::PlainObjectBase<Eigen::Array<double, -1, 1, 0, -1, 1> >::PlainObjectBase<Eigen::CwiseBin
   0,28%  xtp_tools    libvotca_xtp.so.7                   [.] votca::xtp::Sigma_Exact::CalcCorrelationDiagElementDerivative
   0,26%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::PlainObjectBase<Eigen::Array<double, -1, 1, 0, -1, 1> >::PlainObjectBase<Eigen::CwiseBin
   0,25%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::redux_impl<Eigen::internal::scalar_sum_op<double, double>, Eigen::internal::re
   0,22%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::outer_product_selector_run<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Cw
   0,17%  xtp_tools    libvotca_xtp.so.7                   [.] free@plt
   0,17%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::internal::generic_product_impl<Eigen::Transpose<Eigen::Matrix<double, -1, -1, 0, -1, -1>
   0,14%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::PlainObjectBase<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::PlainObjectBase<Eigen::CwiseBi
   0,14%  xtp_tools    libvotca_xtp.so.7                   [.] Eigen::PlainObjectBase<Eigen::Array<double, -1, 1, 0, -1, 1> >::PlainObjectBase<Eigen::CwiseBin
   0,14%  xtp_tools    libvotca_xtp.so.7                   [.] malloc@plt
   0,11%  xtp_tools    libvotca_xtp.so.7                   [.] votca::xtp::AOShell::EvalAOspace
felipeZ commented 4 years ago

@baumeier It seems to be from the second line that Eigen is indeed using is own internal implementation.

baumeier commented 4 years ago

@felipeZ Yeah, the question is why though?

JensWehner commented 4 years ago

@junghans do you have any idea why the intel does not link properly anymore? I ran out of ideas.

baumeier commented 4 years ago

Renamed issue as it appears to be a more general problem with using the MKL backend in Eigen.

JensWehner commented 4 years ago

Somehow the #define EIGEN_USE_MKL_ALL macro is undefined in xtp although it is defined in tools and the headers are included, do you have an idea how this works? @junghans @JoshuaSBrown

junghans commented 4 years ago

Can you check if MKL_FOUND, __VOTCA_TOOLS_VOTCA_CONFIG_H and VOTCA_TOOLS_EIGEN_H is defined?

JensWehner commented 4 years ago

VOTCA_TOOLS_EIGEN_H is defined the rest is not

junghans commented 4 years ago

The would mean the votca_config.h from tools didn't get included.

Can you check the order of include directories when compiling? My guess is that build/include/votca/tools (which contains toolssvotca_config.h) comes afterbuild/include/votca/xtp, which contains xtp'svotca_config.hand hence xtp'svotca_config.hgets included instead of the one fromtools`.

If that is the case, try to rename xtp's config to votca_xtp_config.h.

JensWehner commented 4 years ago

I did that. Weird because they should have include guards.

JensWehner commented 4 years ago

did not work.

JensWehner commented 4 years ago

PR coming up.