sanshar / Block

Block implements the density matrix renormalization group (DMRG) algorithm for quantum chemistry.
GNU General Public License v3.0
30 stars 33 forks source link

Openmpi-1.8.5 doesn't seem to function well with Block-1.0.1 on my servers. #20

Closed dingld closed 9 years ago

dingld commented 9 years ago

The program didn't complain about any error, but I am afraid something went wrong. Actually I tried to do the same calculation, DMRG-CASCI-[12e,28o], with different number of procs (8 vs 16). But the totoal time is almost the same (7424.351 vs 7384.303 ). I compiled openmpi using gcc-4.8.4, and boost-1.5.5 the same. By the way, the same calculation using Block-0.9.6 cost about 2000 seconds using 16 procs. How should I deal with this?

gkc1000 commented 9 years ago

Please attach your input/output.

On Thu, Aug 20, 2015 at 2:05 AM, dingld notifications@github.com wrote:

The program didn't complain about any error, but I am afraid something actually went wrong. Actually I tried to do the same calculation, DMRG-CASCI-[12e,28o], with different number of procs (8 vs 16). But the totoal time is almost the same (7424.351 vs 7384.303 ). I compiled openmpi using gcc-4.8.4, and boost-1.5.5 the same. By the way, the same calculation using Block-0.9.6 cost about 2000s more or less using 16 procs. could be the problem ? How should I deal with this?

— Reply to this email directly or view it on GitHub https://github.com/sanshar/Block/issues/20.

gkc1000 commented 9 years ago

Might be easier to personal message me with the input/output and/or a description of the system that you are trying to run.

dingld commented 9 years ago

SYSTEM-- LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.3 (Santiago) Release: 6.3 Codename: Santiago nelec 12

INPUT--

spin 0 irrep 1 point_group d2h sweep_tol 1.0e-9 schedule default outputlevel 0 maxiter 24 maxm 2000 twodot orbitals FCIDUMP1.68 reorder order1.68 scratch ./tmp
hf_occ integral

OUTPUT--last few lines

Block Iteration :: 24

                     System  Block                   Sites ::  2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27     # states: 110    # states: 110
                     Environment Block                       Sites ::  0 1     # states: 10    # states: 10
                     Total discarded weight 0.0000000000

                     Total block energy for State [ 0 ] with 2000 States :: -2099.3167257663

                     Finished Sweep with 2000 states and sweep energy for State [ 0 ] with Spin [ 0 ] :: -2099.3167599819

                     Largest Error for Sweep with 2000 states is 0.0000161238
                     M = 2000    state = 0     Largest Discarded Weight = 1.612e-05  Sweep Energy = -2099.3167599819
                     ============================================================================
                     Elapsed Sweep CPU  Time (seconds): 17167.590
                     Elapsed Sweep Wall Time (seconds): 1089.231
                     Finished Sweep Iteration 30

                     BLOCK CPU  Time (seconds): 116102.920
                     BLOCK Wall Time (seconds): 7384.303
sanshar commented 9 years ago

Can you please send the entire output file of the two runs (with 8 and 16 cores). It's hard to diagnose anything with just the last 2 lines of the output.

Sandeep.

On Thursday, August 20, 2015, dingld notifications@github.com wrote:

SYSTEM-- LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 6.3 (Santiago) Release: 6.3 Codename: Santiago nelec 12

INPUT--

spin 0 irrep 1 point_group d2h sweep_tol 1.0e-9 schedule default outputlevel 0 maxiter 24 maxm 2000 twodot orbitals FCIDUMP1.68 reorder order1.68 scratch ./tmp

hf_occ integral

OUTPUT--last few lines

Block Iteration :: 24

System Block Sites :: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 # states: 110 # states: 110 Environment Block Sites :: 0 1 # states: 10 # states: 10

                 Total discarded weight 0.0000000000

                 Total block energy for State [ 0 ] with 2000 States :: -2099.3167257663

                 Finished Sweep with 2000 states and sweep energy for State [ 0 ] with Spin [ 0 ] :: -2099.3167599819

                 Largest Error for Sweep with 2000 states is 0.0000161238
                 M = 2000    state = 0     Largest Discarded Weight = 1.612e-05  Sweep Energy = -2099.3167599819
                 ============================================================================
                 Elapsed Sweep CPU  Time (seconds): 17167.590
                 Elapsed Sweep Wall Time (seconds): 1089.231
                 Finished Sweep Iteration 30

                 BLOCK CPU  Time (seconds): 116102.920
                 BLOCK Wall Time (seconds): 7384.303

— Reply to this email directly or view it on GitHub https://github.com/sanshar/Block/issues/20#issuecomment-132922895.

gkc1000 commented 9 years ago

Are you able to send us the input file + integrals so we can try the calculation? Also, when you ran v0.9.6, did you observe the same number of sweeps etc. in the faster run? It seems impossible that it could run 4 times faster with the same sweep schedule.

It is quite possible that for such a small number of orbitals, on a single node, speed up between 8 cores and 16 cores will not be good. For comparison, recent timings with 8 cores for a 10e/41 orbital H2O/ANO-DZ/M=1000 sweep is ~1200s, while with 16 cores it is about 800s.

We also observe significant speedups using the intel compiler. If you are using icpc 15.0.3, because of a bug in the compiler you will need to use the latest Block snapshop rather than the v1.0.1 release.

gkc1000 commented 9 years ago

Mark as closed due to no response from user.