nwchemgit / nwchem

NWChem: Open Source High-Performance Computational Chemistry
http://nwchemgit.github.io
Other
505 stars 161 forks source link

Calculation stucked. Is it normal? #144

Closed hernan3009 closed 5 years ago

hernan3009 commented 5 years ago

I got my calculation CCSDT(2)_Q stuck since about 2 days. I get not error messages. I do not know if it performing a very costly step or if it just failed. The processors are busy and sometimes the SDD seems to be working, but files in the working directory are untouched since 30 h ago. Is it a normal behavior or there a problem? Thank you.

tce
   CCSDT(2)_Q
   io eaf
   idiskx 1
   freeze atomic
end
  The SCF is already converged 

         Total SCF energy =   -757.677539420937

                   NWChem Extensible Many-Electron Theory Module
                   ---------------------------------------------

              ======================================================
                   This portion of the program was automatically
                  generated by a Tensor Contraction Engine (TCE).
                  The development of this portion of the program
                 and TCE was supported by US Department of Energy,
                Office of Science, Office of Basic Energy Science.
                      TCE is a product of Battelle and PNNL.
              Please cite: S.Hirata, J.Phys.Chem.A 107, 9887 (2003).
              ======================================================

            General Information
            -------------------
      Number of processors :     2
         Wavefunction type : Restricted Hartree-Fock
          No. of electrons :    44
           Alpha electrons :    22
            Beta electrons :    22
           No. of orbitals :   300
            Alpha orbitals :   150
             Beta orbitals :   150
        Alpha frozen cores :     8
         Beta frozen cores :     8
     Alpha frozen virtuals :     0
      Beta frozen virtuals :     0
         Spin multiplicity : singlet 
    Number of AO functions :   150
       Number of AO shells :    46
        Use of symmetry is : on 
      Symmetry adaption is : on 
         Schwarz screening : 0.10D-10

          Correlation Information
          -----------------------
          Calculation type : Coupled-cluster through triples w/ perturbation             
   Perturbative correction : (2) quadruples (nonfactorized)                              
            Max iterations :      100
        Residual threshold : 0.10D-06
     T(0) DIIS level shift : 0.00D+00
     L(0) DIIS level shift : 0.00D+00
     T(1) DIIS level shift : 0.00D+00
     L(1) DIIS level shift : 0.00D+00
     T(R) DIIS level shift : 0.00D+00
     T(I) DIIS level shift : 0.00D+00
   CC-T/L Amplitude update :  5-th order DIIS
                I/O scheme : Exclusive Access File Library
        L-threshold :  0.10D-06
        EOM-threshold :  0.10D-06
 no EOMCCSD initial starts read in
 TCE RESTART OPTIONS
 READ_INT:   F
 WRITE_INT:  F
 READ_TA:    F
 WRITE_TA:   F
 READ_XA:    F
 WRITE_XA:   F
 READ_IN3:   F
 WRITE_IN3:  F
 SLICE:      F
 D4D5:       F

            Memory Information
            ------------------
          Available GA space size is    5767145500 doubles
          Available MA space size is    2883575924 doubles

 Maximum block size        20 doubles

 tile_dim =     19

 Block   Spin    Irrep     Size     Offset   Alpha
 -------------------------------------------------
   1    alpha     a1     6 doubles       0       1
   2    alpha     a2     1 doubles       6       2
   3    alpha     b1     3 doubles       7       3
   4    alpha     b2     4 doubles      10       4
   5    beta      a1     6 doubles      14       1
   6    beta      a2     1 doubles      20       2
   7    beta      b1     3 doubles      21       3
   8    beta      b2     4 doubles      24       4
   9    alpha     a1    16 doubles      28       9
  10    alpha     a1    17 doubles      44      10
  11    alpha     a1    17 doubles      61      11
  12    alpha     a2    16 doubles      78      12
  13    alpha     b1    12 doubles      94      13
  14    alpha     b1    13 doubles     106      14
  15    alpha     b2    18 doubles     119      15
  16    alpha     b2    19 doubles     137      16
  17    beta      a1    16 doubles     156       9
  18    beta      a1    17 doubles     172      10
  19    beta      a1    17 doubles     189      11
  20    beta      a2    16 doubles     206      12
  21    beta      b1    12 doubles     222      13
  22    beta      b1    13 doubles     234      14
  23    beta      b2    18 doubles     247      15
  24    beta      b2    19 doubles     265      16

 Global files accessible by all nodes assumed

 Parallel file system coherency ......... OK

 SCF dipole moments / hartree & Debye
 ------------------------------------
   X        -0.0000000     -0.0000000
   Y         0.0000000      0.0000000
   Z         0.2928675      0.7444006
 Total       0.2928675      0.7444006
 ------------------------------------

 Cpu & wall time / sec            0.0            0.0

 X   axis ( b1  symmetry)
 dipole file size   =             4530
 dipole file name   = ./input.d1x

 Y   axis ( b2  symmetry)
 dipole file size   =             5544
 dipole file name   = ./input.d1y

 Z   axis ( a1  symmetry)
 dipole file size   =             5890
 dipole file name   = ./input.d1z

 Integral file          = ./input.aoints.0
 Record size in doubles =  65536        No. of integs per rec  =  43688
 Max. records in memory =    324        Max. records in file   = 422264
 No. of bits per label  =      8        No. of bits per value  =     64

 #quartets = 4.805D+05 #integrals = 2.446D+07 #direct =  0.0% #cached =100.0%

File balance: exchanges=     1  moved=     6  time=   0.0

 Fock matrix recomputed
 1-e file size   =             5890
 1-e file name   = ./input.f1
 Cpu & wall time / sec            0.8            0.8

 tce_ao2e: fast2e=1
 half-transformed integrals in memory

 2-e (intermediate) file size =       955755000
 2-e (intermediate) file name = ./input.v2i
 Cpu & wall time / sec           75.2           54.4

 tce_mo2e: fast2e=1
 2-e integrals stored in memory

 2-e file size   =        136973545
 2-e file name   = ./input.v2
 Cpu & wall time / sec          133.6          113.3
 T1-number-of-tasks                    8

 t1 file size   =              539
 t1 file name   = ./input.t1
 t1 file handle =         14
 T2-number-of-boxes                  354

 t2 file size   =          1168098
 t2 file name   = ./input.t2
 t2 file handle =         17

 t3 file size   =       1213542760
 t3 file name   = ./input.t3

 CCSDT iterations
 --------------------------------------------------------
 Iter          Residuum       Correlation     Cpu    Wall
 --------------------------------------------------------
    1   1.6113122505693  -1.0120161483415  2667.9  1877.3
    2   0.3355554836886  -0.9817782161370  2678.9  1880.8
    3   0.3531609020755  -1.0576149004216  2677.1  1882.0
    4   0.2272447847264  -1.0373453738669  2682.1  1885.1
    5   0.2495654560741  -1.0597043889264  2681.6  1883.3
 MICROCYCLE DIIS UPDATE:                    5                    5
    6   0.0326177086130  -1.0558791145258  2695.0  1906.6
    7   0.0216896352336  -1.0541321693123  2682.1  1885.3
    8   0.0212477557063  -1.0543152596805  2678.3  1883.8
    9   0.0173693290891  -1.0545830832083  2683.5  1885.2
   10   0.0212383735835  -1.0546834785933  2685.6  1884.3
 MICROCYCLE DIIS UPDATE:                   10                    5
   11   0.0028469020938  -1.0551843968247  2696.3  1912.5
   12   0.0019668502895  -1.0549686699351  2690.1  1895.9
   13   0.0013165770634  -1.0550560547359  2683.2  1892.0
   14   0.0011105715000  -1.0550047232922  2686.0  1893.1
   15   0.0009312960878  -1.0550377241323  2683.0  1891.6
 MICROCYCLE DIIS UPDATE:                   15                    5
   16   0.0003196581116  -1.0550271625730  2695.8  1909.8
   17   0.0002241894978  -1.0550225908696  2685.9  1895.1
   18   0.0002068798303  -1.0550252984837  2693.2  1896.7
   19   0.0001844203125  -1.0550288294543  2681.5  1892.1
   20   0.0002122398932  -1.0550291564733  2685.1  1892.5
 MICROCYCLE DIIS UPDATE:                   20                    5
   21   0.0000355664112  -1.0550352720260  2695.3  1918.2
   22   0.0000245280346  -1.0550321385843  2688.9  1903.1
   23   0.0000164023302  -1.0550329553036  2689.4  1903.0
   24   0.0000134533848  -1.0550323036593  2693.1  1902.0
   25   0.0000116103012  -1.0550326456734  2689.4  1901.7
 MICROCYCLE DIIS UPDATE:                   25                    5
   26   0.0000046459010  -1.0550324035119  2699.7  1917.8
   27   0.0000031676119  -1.0550323973653  2691.4  1902.1
   28   0.0000030730259  -1.0550324454316  2690.5  1901.4
   29   0.0000026352274  -1.0550324944129  2691.6  1902.8
   30   0.0000031463247  -1.0550325034728  2685.9  1898.7
 MICROCYCLE DIIS UPDATE:                   30                    5
   31   0.0000005330332  -1.0550325871387  2697.6  1923.9
   32   0.0000003554549  -1.0550325401236  2684.5  1903.1
   33   0.0000002305357  -1.0550325494100  2691.5  1903.6
   34   0.0000001923162  -1.0550325408729  2691.0  1903.5
   35   0.0000001558846  -1.0550325446112  2685.1  1900.9
 MICROCYCLE DIIS UPDATE:                   35                    5
   36   0.0000000696708  -1.0550325410267  2699.5  1924.0
 --------------------------------------------------------
 Iterations converged
 CCSDT correlation energy / hartree =        -1.055032541026745
 CCSDT total energy / hartree       =      -758.732571961963686

 Singles contributions

 Doubles contributions
    24b2  (alpha)    24b2  (beta ) ---    21a1  (alpha)    21a1  (beta )       -0.1050861035

 CCSDT Lambda iterations
 ---------------------------------------------
 Iter          Residuum            Cpu    Wall
 ---------------------------------------------
    1          6.5467260119140  6640.7  5341.4
    2          0.5493612982854  6655.2  5354.4
    3          0.1206489560793  6656.3  5360.5
    4          0.0526271398165  6670.9  5368.3
    5          0.0476611702865  6678.5  5374.2
 MICROCYCLE DIIS UPDATE:                    5                    5
    6          0.0022408363281  6702.5  5418.9
    7          0.0010889836580  6679.7  5371.7
    8          0.0005591367317  6673.9  5370.6
    9          0.0003384221339  6670.1  5359.3
   10          0.0003515017380  6665.2  5368.0
 MICROCYCLE DIIS UPDATE:                   10                    5
   11          0.0000188852435  6676.3  5384.0
   12          0.0000102299751  6666.9  5356.7
   13          0.0000043776733  6657.9  5348.5
   14          0.0000031058628  6669.0  5372.7
   15          0.0000014286886  6651.4  5359.9
 MICROCYCLE DIIS UPDATE:                   15                    5
   16          0.0000001488562  6688.5  5411.7
   17          0.0000000991480  6665.5  5372.4
 ---------------------------------------------
 Iterations converged

 CCSDT dipole moments / hartree & Debye
 ------------------------------------
   X        -0.0000000     -0.0000000
   Y         0.0000000      0.0000000
   Z         0.2646165      0.6725931
 Total       0.2646165      0.6725931
 ------------------------------------

And the files present in the directory are:```

-rw-r--r-- 1 hernan hernan 154435408 2019-08-26 14:09:27.210613602 -0300 input.ccsdt2_q_left_4_1_i1 -rw-r--r-- 1 hernan hernan 1946552 2019-08-26 14:09:26.754615776 -0300 input.ccsdt2_q_left_3_1_i1 -rw-r--r-- 1 hernan hernan 4312 2019-08-26 14:09:26.710615986 -0300 input.ccsdt2_q_left_2_1_i1 -rw-r--r-- 1 hernan hernan 64450253616 2019-08-26 14:09:26.706616005 -0300 input.ccsdt2_q_right_6_1_i1 -rw-r--r-- 1 hernan hernan 340420224 2019-08-26 13:59:30.121421802 -0300 input.ccsdt2_q_right_5_1_i1 -rw-r--r-- 1 hernan hernan 215069513856 2019-08-26 13:58:54.741584521 -0300 input.ccsdt2_q_right_4_1_i1 -rw-r--r-- 1 hernan hernan 2892055216 2019-08-26 12:26:05.565007570 -0300 input.ccsdt2_q_right_3_1_i1 -rw-r--r-- 1 hernan hernan 154435408 2019-08-26 12:20:18.279533476 -0300 input.ccsdt2_q_right_2_1_i1 -rw-r--r-- 1 hernan hernan 1946552 2019-08-26 12:19:36.199837795 -0300 input.ccsdt2_q_right_1_1_i1 -rw-r--r-- 1 hernan hernan 8 2019-08-26 12:19:21.971940581 -0300 input.e -rw-rw-r-- 1 hernan hernan 64747 2019-08-26 12:19:21.971940581 -0300 salida.out -rw-r--r-- 1 hernan hernan 9708342080 2019-08-26 10:45:16.369865154 -0300 input.lambda3 -rw-r--r-- 1 hernan hernan 9344784 2019-08-26 10:45:03.481909838 -0300 input.lambda2 -rw-r--r-- 1 hernan hernan 4312 2019-08-26 10:45:03.473909865 -0300 input.lambda1 -rw-r--r-- 1 hernan hernan 9708342080 2019-08-25 10:02:27.431955038 -0300 input.t3 -rw-r--r-- 1 hernan hernan 9344784 2019-08-25 10:01:34.228311196 -0300 input.t2 -rw-r--r-- 1 hernan hernan 4312 2019-08-25 10:01:34.188311464 -0300 input.t1 -rw-r--r-- 1 hernan hernan 1095788360 2019-08-24 14:54:53.383509792 -0300 input.v2 -rw-r--r-- 1 hernan hernan 47120 2019-08-24 14:52:09.508643260 -0300 input.f1 -rw-r--r-- 1 hernan hernan 47120 2019-08-24 14:52:08.684648428 -0300 input.d1z -rw-r--r-- 1 hernan hernan 44352 2019-08-24 14:52:08.680648452 -0300 input.d1y -rw-r--r-- 1 hernan hernan 36240 2019-08-24 14:52:08.676648477 -0300 input.d1x -rw-rw-r-- 1 hernan hernan 2247984 2019-08-24 14:52:08.628648778 -0300 input.db -rw-rw-r-- 1 hernan hernan 183920 2019-08-24 14:52:08.620648828 -0300 large.mos -rw-r--r-- 1 hernan hernan 180035 2019-08-24 14:52:08.548649280 -0300 input.cfock -rw-rw-r-- 1 hernan hernan 34624 2019-08-24 14:52:05.524668244 -0300 small.mos -rw-rw-r-- 1 hernan hernan 696 2019-08-24 14:52:05.288669725 -0300 input.b -rw-rw-r-- 1 hernan hernan 696 2019-08-24 14:52:05.288669725 -0300 input.b^-1 -rw-rw-r-- 1 hernan hernan 416 2019-08-24 14:52:05.288669725 -0300 input.c -rw-rw-r-- 1 hernan hernan 416 2019-08-24 14:52:05.288669725 -0300 input.p -rw-rw-r-- 1 hernan hernan 80 2019-08-24 14:52:05.288669725 -0300 input.zmat -rw-rw-r-- 1 hernan hernan 1023 2019-08-24 14:51:13.528994692 -0300 input.inp

jeffhammond commented 5 years ago

Run gstack or equivalent to get a backtrace. Probably just really expensive calculation taking a long time. Please also provide input file to reproduce. -- Jeff Hammond jeff.science@gmail.com http://jeffhammond.github.io/

hernan3009 commented 5 years ago

Thank you for your answer. The input file is https://www.pastefs.com/pid/159041

I tried strace (is it equivalent to gstack?). I do not understand its continuous output. I ran it and stop it with CTRL + C. The output is https://www.pastefs.com/pid/159042 .

jeffhammond commented 5 years ago

IO EAF puts everything on disk. This is super slow. Any chance you can fit this calculation into memory?

jeffhammond commented 5 years ago

Try this... memory stack 22000 mb heap 100 mb global 22000 mb noverify geometry units angstrom symmetry C2V CL 0.0000000000 0.0000000000 -0.3666912916 F 0.0000000000 -1.7010030671 -0.2815117930 F 0.0000000000 1.7010030671 -0.2815117930 F 0.0000000000 0.0000000000 1.2379631400 end basis small

hernan3009 commented 5 years ago

Thank you. I bet I cannot fit everything in memory. I tried before but I get out of memory. Although I did not specify stack heap and global previously. Can it make difference? (I am performing the calculation in my home PC)

jeffhammond commented 5 years ago

GA and stack+heap are separate. The latter is static up-front allocation. GA is dynamic ala malloc and you can set it to a big number and it will only segfault if it really needs all of it.

I’ll see if I can run this on one of my machines with 1.5 TB.

hernan3009 commented 5 years ago

I really appreciate your help. Do you think that I should stop this calculation?

jeffhammond commented 5 years ago

I will look at the source later and see if I can determine how far along it is. Don’t kill it now if you don’t need your computer to do anything else.

edoapra commented 5 years ago

@hernan3009 To second @jeffhammond suggestion to avoid memory, you might want to decrease the tilesize to 16 to reduce memory usage

hernan3009 commented 5 years ago

@edoapra , thank you. Before writing data to disk I tried the default behavior. I reduced the tilesize. I do not remember the smaller value I tried, I am pretty sure that it was not so small as 2 or 3, but surely I tried 8 and 10. I did not set the attilesize parameter. I did not added

memory stack 22000 mb heap 100 mb global 22000 mb noverify

but only specified the total memory.

Does it make sense to use tilesize very small (let say 1, 2 or 4)?

In any case there is something that I do not understand. I am running this calculation in two cores. I understand that a hard disk is slower than the RAM. Intuitively, I tend to think that there is a waste of time when writing/reading from disk, and that when the reading (from disk) is slow the processors could process more data per time unit that the amount of data per time unit that they receive. But in the actual state of my calculation I don't see that data is been written to disk and the two processors are at 100% of usage (according to htop). So, I suspect that in the actual state of the calculation the io EAF should not be slowing down the calculation. Does it make sense?

PD: It is still runing

jeffhammond commented 5 years ago

Tilesize less than 8 doesn’t really make sense.

edoapra commented 5 years ago

@hernan3009 Your input seems to have tilesize 19. I would go for 16 or 12

hernan3009 commented 5 years ago

@edoapra Yes. That is the titlesize chosen by NWChem when I did not set it explicitly. I set it for previous runs when I used io ga (to smaller values). I thought that large titlesize improve performance. As I had not RAM issues with IO EAF I just let NWChem choose it. Is it a bad practice?

hernan3009 commented 5 years ago

@jeffhammond Could you reproduce the behavior that I am experiencing?

jeffhammond commented 5 years ago

I am working on it. My big machine is offline but I'm trying another now.

edoapra commented 5 years ago

Got calculation to complete using 32 nodes and four processes/node and 4 threads. Tilesize=16. It took 30000 seconds

    9          0.0003265546817   617.3   582.3
   10          0.0003432806979   613.9   578.3
 MICROCYCLE DIIS UPDATE:                    10                     5
   11          0.0000173152244   613.1   577.6
   12          0.0000093283269   613.9   578.7
   13          0.0000040227335   610.0   574.8
   14          0.0000028508313   610.0   575.0
   15          0.0000013125364   624.5   589.9
 MICROCYCLE DIIS UPDATE:                    15                     5
   16          0.0000001424974   627.0   592.5
   17          0.0000000930854   615.2   580.2
 ---------------------------------------------
 Iterations converged

 CCSDT dipole moments / hartree & Debye
 ------------------------------------
   X        -0.0000000     -0.0000000
   Y        -0.0000000     -0.0000000
   Z         0.2683693      0.6821318
 Total       0.2683693      0.6821318
 ------------------------------------

 CCSDT(2)_Q correlation energy / hartree =        -1.040101618865013
 CCSDT(2)_Q total energy / hartree       =      -758.715217414449967
 Cpu & wall time / sec        12947.4        12889.8

 Task  times  cpu:    30441.2s     wall:    29312.4s
hernan3009 commented 5 years ago

Thank you so much. It seems that NWChem works perfectly and it was just lack of computational power.

@edoapra I noticed that the dipole moments from your calculation differs from mine. Did you use the same input file? Could you share the complete output for comparison?

edoapra commented 5 years ago

I have uses a spherical basis set (at least in the first part of the input) Here is the full input file. hern2.nw.txt hern2.out.txt

hernan3009 commented 5 years ago

@edoapra thanks. I did not notice the existence of the spherical keyword.

Just a doubt, does the lines before TCE in the input file revert to Cartesian? I mean:

`basis

F  library def2-tzvp

Cl library def2-tzvp

end

`

That is, the SCF part was computed with spherical and the TCE with cartesian. Am I right?

edoapra commented 5 years ago

Yes to both of your questions. See https://github.com/nwchemgit/nwchem/wiki/Basis

hernan3009 commented 5 years ago

@edapra and @jeffhammond : Thank you very much.