AFQMC (DQMC) more cores no improvement in speed

nickirk commented 2 years ago

Hi Ankit,

I am benchmarking the AFQMC implementation in DQMC using different number of cores. But it seems to me that using 2 cores does not shorten the time it takes to run the same calculation. I modified your example script examples/DQMC/h2o_afqmc_uihf/h2o.py for n2 using ccpvdz basis set (28 orbitals) and rhf as trial wavefunction. And here are the timing for using 1 and 2 cores respectively:

Results from mpirun -np 1 DQMC input.json

Total propagation time:  38.260809 s
VHS Time: 7.4455688 s
Matmul Time: 16.580656 s
Force bias Time: 12.007124 s
Energy evaluation time:  1.5541492 s

Number of large deviations:  1

Total calculation time:  40.130182 s

Results from mpirun -np 2 DQMC input.json

Total propagation time:  38.120556 s
VHS Time: 7.2959769 s
Matmul Time: 16.628925 s
Force bias Time: 11.969916 s
Energy evaluation time:  1.5391603 s

Number of large deviations:  3

Total calculation time:  40.270441 s

I compiled boost with mpi support. I am wondering what the expected scaling with the number of cores is for AFQMC in theory and in practice for your DQMC implementation (judging from your experience)?

Best, Ke

ankit76 commented 2 years ago

Hi Ke,

The number of walkers in the input is for walkers per process. When you use more processes they are used for additional sampling with their own walkers. So it takes the same amount of time but because of more samples you get a smaller stochastic error. It is almost embarrassingly parallelized except for a small communication overhead due to population control.

nickirk commented 2 years ago

Hi Ankit,

Thanks for clarifying.

sanshar commented 2 years ago

HI Ke, The calculations run in parallel, which means it runs using larger number of walkers. So the total time will be the same but hopefully the variance will be half of the original run.

Sandeep.

On Wed, Jun 8, 2022 at 4:52 PM Ke @.***> wrote:

Hi Ankit,

I am benchmarking the AFQMC implementation in DQMC using different number of cores. But it seems to me that using 2 cores does not shorten the time it takes to run the same calculation. I modified your example script examples/DQMC/h2o_afqmc_uihf/h2o.py for n2 using ccpvdz basis set (28 orbitals) and rhf as trial wavefunction. And here are the timing for using 1 and 2 cores respectively:

Results from mpirun -np 1 DQMC input.json

Total propagation time: 38.260809 s VHS Time: 7.4455688 s Matmul Time: 16.580656 s Force bias Time: 12.007124 s Energy evaluation time: 1.5541492 s

Number of large deviations: 1

Total calculation time: 40.130182 s

Results from mpirun -np 2 DQMC input.json

Total propagation time: 38.120556 s VHS Time: 7.2959769 s Matmul Time: 16.628925 s Force bias Time: 11.969916 s Energy evaluation time: 1.5391603 s

Number of large deviations: 3

Total calculation time: 40.270441 s

I compiled boost with mpi support. I am wondering what the expected scaling with the number of cores is for AFQMC in theory and in practice for your DQMC implementation (judging from your experience)?

Best, Ke

— Reply to this email directly, view it on GitHub https://github.com/sanshar/VMC/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVW4D5MNNSBCUKLHJGBUDVOEP33ANCNFSM5YIDN6GQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sanshar / VMC

AFQMC (DQMC) more cores no improvement in speed #3