sanshar / Block

Block implements the density matrix renormalization group (DMRG) algorithm for quantum chemistry.
GNU General Public License v3.0
30 stars 33 forks source link

Code Crash #29

Closed cuanto closed 8 years ago

cuanto commented 8 years ago

Dear developers,

I am trying to get a functional binary of block code as i can not use the precompiled version in my system, I am using the source code from version 1.0.1. I try to compile it using gcc/g++ version 6.1.0, boost 1.55.0, openmpi 1.10.2 and openblas, all compiled with gcc/g++ 6.1.0. Many of the test run and produce correct results, but tests involving 2-RDM or NEVPT2 crash. For example, I try to run the example in c2_d2h_small, but program crash with

[malta:04396] * Process received signal * [malta:04396] Signal: Segmentation fault (11) [malta:04396] Signal code: Address not mapped (1) [malta:04396] Failing at address: (nil) [malta:04396] [ 0] /lib64/libpthread.so.0[0x365f60e4c0] [malta:04396] [ 1] ../../block.spin_adapted-mpi[0x973936] [malta:04396] [ 2] ../../block.spin_adapted-mpi[0x966aa3] [malta:04396] [ 3] ../../block.spin_adapted-mpi[0x967c1a] [malta:04396] [ 4] ../../block.spin_adapted-mpi[0x955a13] [malta:04396] [ 5] ../../block.spin_adapted-mpi[0x956887] [malta:04396] [ 6] ../../block.spin_adapted-mpi[0x994681] [malta:04396] [ 7] ../../block.spin_adapted-mpi[0x645fe1] [malta:04396] [ 8] ../../block.spin_adapted-mpi[0x6da93d] [malta:04396] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)[0x365ea1d974] [malta:04396] [10] ../../block.spin_adapted-mpi[0x60ac29]

[malta:04396] * End of error message *

mpirun noticed that process rank 4 with PID 4396 on node malta exited on signal 11 (Segmentation fault).

serial version also produce a segmentation fault and stop. I try to compile the code in another machine with gcc-4.9 but I got the same problem. I try to compile it also with different optimization levels, but nothing seems to work.

If I remove the line twopdm in the example the code runs and stop normally.

Best regards, Jose Luis

sunqm commented 8 years ago

Hi Jose, The problem may be solved in the latest version 1.1 https://github.com/sanshar/Block/releases/latest. Best, Qiming

cuanto commented 8 years ago

Hi Qiming,

I tried to compile the version 1.1 but the same problem occurs.

Best, Jose Luis

sunqm commented 8 years ago

We cannot reproduce the error. Can you recompile the version 1.1 with option "-g", then rerun the serial version with gdb args ../../block.spin_adapted. When the program crashes, type bt to dump the stack frames. The stack frame info can help us identify the problem.

Qiming

cuanto commented 8 years ago

Hi Qiming,

I recompile the code with -g only and the code runs nicely, but If instead I use modest optimization level, for example -g with -O2 code crash, the debugger dump the following stack info, running the h2o_nevpt2 example,

Program received signal SIGSEGV, Segmentation fault. SpinAdapted::spinExpectation (wave1=..., wave2=..., leftOp=..., dotOp=..., rightOp=..., big=..., expectations=..., doTranspose=false) at modules/twopdm/twopdm_2.C:31 31 dotindices = &dotOp ? dotOp.get_orbs().size() : 0; (gdb) bt

0 SpinAdapted::spinExpectation (wave1=..., wave2=..., leftOp=..., dotOp=..., rightOp=..., big=..., expectations=..., doTranspose=false) at modules/twopdm/twopdm_2.C:31

1 0x0000000000a0c7be in SpinAdapted::compute_two_pdm_4_0_0 (wave1=..., wave2=..., big=..., twopdm=...) at modules/twopdm/twopdm.C:181

2 0x0000000000a0f5d0 in SpinAdapted::compute_twopdm_initial (wavefunctions=..., system=..., systemDot=..., newSystem=..., newEnvironment=..., big=..., numprocs=1,

state=0) at modules/twopdm/twopdm.C:109

3 0x00000000009fd4cf in SpinAdapted::SweepTwopdm::BlockAndDecimate (sweepParams=..., system=..., newSystem=..., useSlater=@0x7fffffffd91f: false,

dot_with_sys=@0x7fffffffd1fb: true, state=<optimized out>) at modules/twopdm/sweep.C:150

4 0x00000000009fee3b in SpinAdapted::SweepTwopdm::do_one (sweepParams=..., warmUp=@0x7fffffffd91f: false, forward=@0x7fffffffd91c: false, restart=,

restartSize=<optimized out>, state=<optimized out>) at modules/twopdm/sweep.C:228

5 0x0000000000c10b9b in SpinAdapted::nevpt2::nevpt2 () at modules/nevpt2/sweep_nevpt2.C:517

6 0x000000000068744b in calldmrg (input=, output=) at dmrg.C:359

7 0x000000365ea1d974 in __libc_start_main () from /lib64/libc.so.6

8 0x0000000000647d89 in _start ()

The OS in which I am trying to compile is RedHat 5.3.

Best, Jose Luis

cuanto commented 8 years ago

Fixed in last commit , now correctly compile with gcc-4.9 and gcc-6.1 !