Closed GiovanniBussi closed 3 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 83.25%. Comparing base (
267c68f
) to head (8701a7d
). Report is 1 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Speed up is less but still measurable when using intel compiler with system blas on my workstation (from 32% to 29% slowdown), so I think I can merge this.
BENCH: Kernel: /scratch/bussi/plumed2/tmp/intel-v2.9/lib/libplumedKernel.so
BENCH: Input: plumed.dat
BENCH: Comparative: 1.000 +- 0.000
BENCH: Cycles Total Average Minimum Maximum
BENCH: A Initialization 1 0.808588 0.808588 0.808588 0.808588
BENCH: B0 First step 1 0.016654 0.016654 0.016654 0.016654
BENCH: B1 Warm-up 399 3.560118 0.008923 0.008333 0.026649
BENCH: B2 Calculation part 1 800 7.011192 0.008764 0.008338 0.012098
BENCH: B3 Calculation part 2 800 6.990790 0.008738 0.008322 0.011580
PLUMED: Cycles Total Average Minimum Maximum
PLUMED: 1 18.354650 18.354650 18.354650 18.354650
PLUMED: 1 Prepare dependencies 2000 0.008286 0.000004 0.000002 0.000018
PLUMED: 2 Sharing data 2000 1.602436 0.000801 0.000499 0.005897
PLUMED: 3 Waiting for data 2000 0.003293 0.000002 0.000001 0.000014
PLUMED: 4 Calculating (forward loop) 2000 12.499940 0.006250 0.006114 0.018466
PLUMED: 5 Applying (backward loop) 2000 3.349057 0.001675 0.001641 0.004945
PLUMED: 6 Update 2000 0.004910 0.000002 0.000002 0.000013
BENCH:
BENCH: Kernel: /scratch/bussi/plumed2/tmp/intel-reference/lib/libplumedKernel.so
BENCH: Input: plumed.dat
BENCH: Comparative: 1.325 +- 0.001
BENCH: Cycles Total Average Minimum Maximum
BENCH: A Initialization 1 0.989982 0.989982 0.989982 0.989982
BENCH: B0 First step 1 0.021248 0.021248 0.021248 0.021248
BENCH: B1 Warm-up 399 4.729897 0.011854 0.011131 0.029778
BENCH: B2 Calculation part 1 800 9.287099 0.011609 0.011132 0.020115
BENCH: B3 Calculation part 2 800 9.269063 0.011586 0.011125 0.017037
PLUMED: Cycles Total Average Minimum Maximum
PLUMED: 1 24.268297 24.268297 24.268297 24.268297
PLUMED: 1 Prepare dependencies 2000 0.011475 0.000006 0.000003 0.000019
PLUMED: 2 Sharing data 2000 1.379957 0.000690 0.000537 0.003343
PLUMED: 3 Waiting for data 2000 0.010423 0.000005 0.000004 0.000018
PLUMED: 4 Calculating (forward loop) 2000 16.268256 0.008134 0.007786 0.018418
PLUMED: 5 Applying (backward loop) 2000 5.528992 0.002764 0.002709 0.019713
PLUMED: 6 Update 2000 0.021584 0.000011 0.000008 0.000031
BENCH:
BENCH: Kernel: this
BENCH: Input: plumed.dat
BENCH: Comparative: 1.287 +- 0.001
BENCH: Cycles Total Average Minimum Maximum
BENCH: A Initialization 1 0.494030 0.494030 0.494030 0.494030
BENCH: B0 First step 1 0.030694 0.030694 0.030694 0.030694
BENCH: B1 Warm-up 399 4.585869 0.011493 0.010771 0.037033
BENCH: B2 Calculation part 1 800 9.031793 0.011290 0.010769 0.017739
BENCH: B3 Calculation part 2 800 8.991378 0.011239 0.010752 0.017647
PLUMED: Cycles Total Average Minimum Maximum
PLUMED: 1 23.114919 23.114919 23.114919 23.114919
PLUMED: 1 Prepare dependencies 2000 0.010794 0.000005 0.000003 0.000023
PLUMED: 2 Sharing data 2000 1.394624 0.000697 0.000533 0.003837
PLUMED: 3 Waiting for data 2000 0.009667 0.000005 0.000004 0.000035
PLUMED: 4 Calculating (forward loop) 2000 16.324953 0.008162 0.007791 0.029984
PLUMED: 5 Applying (backward loop) 2000 4.799277 0.002400 0.002368 0.010953
PLUMED: 6 Update 2000 0.021809 0.000011 0.000009 0.000024
I am working with this input file:
By just replacing a loop with a blas call I can see a significant gain. I only tried in my laptop, but I guess this could be quite general (when optimized blas are available).
Below the full results, comparing: v2.9, master before this commit, and master with this commit. The slowdown wrt v2.9 is reduced from 26% to 18%