shankar1729 / jdftx

JDFTx: software for joint density functional theory
http://jdftx.org
82 stars 54 forks source link

<< compilation OK, but test fail >> #5

Closed icamps closed 7 years ago

icamps commented 7 years ago

Hello,

I successfully compiled JDFTx.

Running make test, I got:

Running tests...
Test project /root/Downloads/JDFTx/build
      Start  1: openShell
 1/10 Test  #1: openShell ........................***Failed    0.18 sec
      Start  2: vibrations
 2/10 Test  #2: vibrations .......................***Failed    0.17 sec
      Start  3: moleculeSolvation
 3/10 Test  #3: moleculeSolvation ................***Failed    0.66 sec
      Start  4: ionSolvation
 4/10 Test  #4: ionSolvation .....................***Failed    0.32 sec
      Start  5: latticeOpt
 5/10 Test  #5: latticeOpt .......................***Failed    1.16 sec
      Start  6: metalBulk
 6/10 Test  #6: metalBulk ........................***Failed    1.46 sec
      Start  7: plusU
 7/10 Test  #7: plusU ............................***Failed    1.36 sec
      Start  8: spinOrbit
 8/10 Test  #8: spinOrbit ........................***Failed    1.34 sec
      Start  9: graphene
 9/10 Test  #9: graphene .........................***Failed    0.48 sec
      Start 10: metalSurface
10/10 Test #10: metalSurface .....................***Failed    1.28 sec
0% tests passed, 10 tests failed out of 10
Total Test time (real) =   8.42 sec
The following tests FAILED:
          1 - openShell (Failed)
          2 - vibrations (Failed)
          3 - moleculeSolvation (Failed)
          4 - ionSolvation (Failed)
          5 - latticeOpt (Failed)
          6 - metalBulk (Failed)
          7 - plusU (Failed)
          8 - spinOrbit (Failed)
          9 - graphene (Failed)
         10 - metalSurface (Failed)
Errors while running CTest
Makefile:61: recipe for target 'test' failed
make: *** [test] Error 8

Running ./jdftx to check for specific error messages return:

*************** JDFTx 1.3.1  The playground for joint density functional theory ****************

Start date and time: Mon Aug 14 17:36:59 2017
Running on hosts (process indices):  lamodel ( 0 )
Executable ./jdftx with empty command-line (run with -h or --help for command-line options).
Maximum cpu threads by process: 4
Run totals: 1 processes, 4 threads, 0 GPUs
Waiting for commands from stdin (end input with EOF (Ctrl+D)):

Which I think is ok.

Running ldd ./jdftx, I got:

linux-vdso.so.1 (0x00007ffc8eff9000)
        libjdftx.so => /root/Downloads/JDFTx/build/libjdftx.so (0x00002b70b95f2000)
        libmpicxx.so.12 => /software/intel/MPI/compilers_and_libraries_2017.1.132/linux/mpi/intel64/lib/libmpicxx.so.12 (0x00002b70ba15b000)
        libmpifort.so.12 => /software/intel/MPI/compilers_and_libraries_2017.1.132/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002b70ba37c000)
        libmpi.so.12 => /software/intel/MPI/compilers_and_libraries_2017.1.132/linux/mpi/intel64/lib/libmpi.so.12 (0x00002b70ba725000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002b70bb465000)
        librt.so.1 => /lib64/librt.so.1 (0x00002b70bb66a000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b70bb872000)
        libgsl.so.0 => /usr/lib64/libgsl.so.0 (0x00002b70bba8f000)
        libfftw3_threads.so.3 => /usr/lib64/libfftw3_threads.so.3 (0x00002b70bbec3000)
        libfftw3.so.3 => /usr/lib64/libfftw3.so.3 (0x00002b70bc0ca000)
        libmkl_intel_lp64.so => /software/intel/compiladores/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b70bc4c7000)
        libmkl_intel_thread.so => /software/intel/compiladores/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b70bcddb000)
        libmkl_core.so => /software/intel/compiladores/composer_xe_2015.1.133/mkl/lib/intel64/libmkl_core.so (0x00002b70be176000)
        libiomp5.so => /software/intel/compiladores/composer_xe_2015.1.133/compiler/lib/intel64/libiomp5.so (0x00002b70bfccf000)
        libm.so.6 => /lib64/libm.so.6 (0x00002b70c0005000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b70c0302000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b70c068b000)
        libc.so.6 => /lib64/libc.so.6 (0x00002b70c08a3000)
        libimf.so => /software/intel/compiladores/composer_xe_2015.1.133/compiler/lib/intel64/libimf.so (0x00002b70c0c46000)
        libsvml.so => /software/intel/compiladores/composer_xe_2015.1.133/compiler/lib/intel64/libsvml.so (0x00002b70c1101000)
        libirng.so => /software/intel/compiladores/composer_xe_2015.1.133/compiler/lib/intel64/libirng.so (0x00002b70c1fe0000)
        libintlc.so.5 => /software/intel/compiladores/composer_xe_2015.1.133/compiler/lib/intel64/libintlc.so.5 (0x00002b70c21e7000)
        /lib64/ld-linux-x86-64.so.2 (0x000055f9ebb14000)

That also looks ok to me.

What I am missing?

shankar1729 commented 7 years ago

"make test" requires write permissions in the build directory. Looking at your paths, did you perhaps compile as root and then try to run "make test" as normal user?

icamps commented 7 years ago

Hello @shankar1729 I compiled and run as root.

icamps commented 7 years ago

Hello @shankar1729 I compiled and run as root.

icamps commented 7 years ago

@shankar1729, looking in the LastTest.log file created in the folder Testing/Temporary/ I got the following info:

Running using MPI (testing in parallel following the info on the site):

 Start testing: Aug 15 09:53 -03
----------------------------------------------------------
1/10 Testing: openShell
1/10 Test: openShell
Command: "/root/Downloads/JDFTx/jdftx-1.3.1/jdftx/test/runTest.sh" "openShell" $
Directory: /root/Downloads/JDFTx/build/test
"openShell" start time: Aug 15 09:53 -03
Output:
----------------------------------------------------------
launch="mpirun -n 2"
Segmentation Fault.
Segmentation Fault.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
<end of output>
Test time =   0.21 sec

For the other tests, I got similar messages.

For serial testing:

Start testing: Aug 15 09:57 -03
----------------------------------------------------------
1/10 Testing: openShell
1/10 Test: openShell
Command: "/root/Downloads/JDFTx/jdftx-1.3.1/jdftx/test/runTest.sh" "openShell" $
Directory: /root/Downloads/JDFTx/build/test
"openShell" start time: Aug 15 09:57 -03
Output:
----------------------------------------------------------
launch=""
Segmentation Fault.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
<end of output>
Test time =   0.15 sec
----------------------------------------------------------
Test Failed.
"openShell" end time: Aug 15 09:57 -03
"openShell" time elapsed: 00:00:00
----------------------------------------------------------

In principle, my MPI environment is ok, as I can run other programs (SIESTA, for example), without any issue.

icamps commented 7 years ago

Running the water example I got:

/software/JDFTx_BIN/libjdftx.so(_Z14stackTraceExiti+0x20) [0x2b6d8ebdbe10]
/lib64/libc.so.6(+0x34950) [0x2b6d9574c950]
jdftx(__intel_ssse3_memcpy+0x292c) [0x439b2c]
shankar1729 commented 7 years ago

Hmm, this is strange: could you please try using the GNU compiler instead? We've had strange issues with intel compilers in the past. (You can still link to MKL and will get essentially the same performance.)

icamps commented 7 years ago

I compiled using GNU compiler BUT using Intel MPI. I got the same errors. I am assuming now that the problem can be with Intel MPI. I have to find a way to have two implementations of MPI running at the same time (do you know if this is possible?).

shankar1729 commented 7 years ago

I just tested a JDFTx build with GCC (4.8.5) and Intel MPI (2017.2.174): all tests passed and with the same timings as OpenMPI on the same machine. So I doubt the MPI library is the issue. (Although not relevant now, it is straightforward to have two MPIs in parallel; you just need to have the corresponding mpirun selected in PATH during runtime based on the mpicc/mpicxx selected during compilation. This can be managed with the modules system or done by setting environment variables manually.)

When you said same error with GNU compiler, did you still get the __intel_ssse3_memcpy in the stacktrace? If so, that indicates that the build still used the intel compiler. This happens often because cmake is awfully sticky about the compiler. You'll have to clear the CMakeCache.txt (and preferably clean out the build directory completely) to switch compilers.

If you still get the errors after a clean build with GNU, could you send me the updated stack trace that you get from the water example? Preferably on a serial run, and with the trace processed using the printStacktrace script if possible.

shankar1729 commented 7 years ago

(Also btw, I don't have access to an Intel compiler at the moment to test this.)

icamps commented 7 years ago

Each time I compile, I delete/create the build folder and delete the LibXC folder two.

The error for the combination GNU+Intel MPI is bellow. There is no reference to intel there.

Start testing: Aug 15 17:58 -03
----------------------------------------------------------
1/10 Testing: openShell
1/10 Test: openShell
Command: "/root/Downloads/JDFTx/jdftx-1.3.1/jdftx/test/runTest.sh" "openShell" "/root/Downloads/JDFTx/jdftx-1.3.1/jdftx/test" "/root/Downloads/JDFTx/build/test" "/root/Downloads/JDFT$
Directory: /root/Downloads/JDFTx/build/test
"openShell" start time: Aug 15 17:58 -03
Output:
----------------------------------------------------------
launch=""
Segmentation Fault.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
:
system msg for write_line failure : Bad file descriptor
<end of output>
Test time =   0.14 sec
----------------------------------------------------------
Test Failed.
"openShell" end time: Aug 15 17:58 -03
"openShell" time elapsed: 00:00:00
----------------------------------------------------------

I am compiling using the following: LibXC: CC="mpicc -fPIC" FC=mpif90 CXX="mpicxx -fPIC" ./configure --prefix=/software/LIBS/LibXC/

JDFTx: CC="mpicc -fPIC" FC=mpif90 CXX="mpicxx -fPIC" cmake -D EnableMKL=yes -D LinkTimeOptimization=yes -D ForceFFTW=yes -D EnableLibXC=yes -D MKL_PATH=/software/intel/compiladores/composer_xe_2015.1.133/mkl -D LIBXC_PATH=/software/LIBS/LibXC ../jdftx-1.3.1/jdftx

Output from mpicc -v:

mpigcc for the Intel(R) MPI Library 2017 Update 1 for Linux*
Copyright(C) 2003-2016, Intel Corporation.  All rights reserved.
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.8/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8 --enable-ssp --disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8 --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 4.8.5 (SUSE Linux)

and from mpicxx -v:

mpigxx for the Intel(R) MPI Library 2017 Update 1 for Linux*
Copyright(C) 2003-2016, Intel Corporation.  All rights reserved.
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.8/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8 --enable-ssp --disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8 --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux --host=x86_64-suse-linux
Thread model: posix
gcc version 4.8.5 (SUSE Linux)
icamps commented 7 years ago

Update! Even getting error running test, the water example runs ok, creating only the files water.Ecomponents water.n water.out.

shankar1729 commented 7 years ago

That's really strange: I can't think of why the tests should fail in seconds if you are able to run the code normally otherwise. Is the code running fine with MPI as well: perhaps check the Silicon example (first Solids tutorial)?

icamps commented 7 years ago

Yes, it is weird! The Silicon example (til the electronic band generations) works in serial and in parallel.