Open dipietrantonio opened 3 years ago
@dipietrantonio it is very difficult to judge why this is happening without knowing more about the simulation:
Giovanni
@GiovanniBussi you are right, I have been overly vague. I was waiting for the user to provide me authorization to share the test case. You can find it in the attached archive, which should provide answers to the domain-specific questions you have (I am not familiar with the domain, unfortunately).
The computing system is a GPU cluster (named Topaz) running Centos7.6 with 2 V100 GPUs and 2 Dual Intel “Cascade Lake” Silver 4215 2.5 GHz 8-core processors. I run the experiment using 1 GPU and 15 CPU cores (as you will see in the submission script). Let me know if you need more information.
Best regards Cristian test_case.tar.gz
I checked the input and log file. The system is small (~2600 atoms) and probably it also suffers from the fact that GMX 2021 can run everything on the GPU if plumed is not used.
Anyway, for such a simple input file the cost of the plumed part seems a bit too large to me.
Could you share a log file obtained by running the simulation adding the following line at the beginning of the metad.dat
file?
DEBUG DETAILED_TIMERS
The final part of the log file should show the cost associated to each of the plumed input lines.
Hi Giovanni, here is the log file, md_carb.log
Thanks, I paste below the relevant part as a reference.
I don't see anything weird (like a very high time spent in METAD, that could have been maybe optimized with some setting). Even the cost of a single torsion (phi1
) is a significant fraction of the total time. I conclude that the simulated system is just too small to be simulated efficiently with GROMACS + PLUMED. So I cannot suggest any solution, beside noticing that this should not happen for systems with a larger number of atoms (and thus slower in the calculation of GROMACS forces).
PLUMED: Cycles Total Average Minimum Maximum
PLUMED: 1 80845.967654 80845.967654 80845.967654 80845.967654
PLUMED: 1 Prepare dependencies 585442 0.615870 0.000001 0.000000 0.000590
PLUMED: 2 Sharing data 585442 8145.403068 0.013913 0.000032 0.127919
PLUMED: 3 Waiting for data 585442 1.476346 0.000003 0.000001 0.005705
PLUMED: 4 Calculating (forward loop) 585442 62067.920605 0.106019 0.003150 0.205252
PLUMED: 4A 0 @0 585442 0.349757 0.000001 0.000000 0.000794
PLUMED: 4A 1 @1 585442 1.981848 0.000003 0.000002 0.000986
PLUMED: 4A 2 phi1 585442 8522.777175 0.014558 0.000029 0.079167
PLUMED: 4A 3 psi1 585442 10786.286090 0.018424 0.000006 0.111998
PLUMED: 4A 4 phi2 585442 10641.521903 0.018177 0.000006 0.065557
PLUMED: 4A 5 psi2 585442 10731.279967 0.018330 0.000006 0.065554
PLUMED: 4A 6 metad1 585442 10756.873912 0.018374 0.000008 0.078734
PLUMED: 4A 7 metad2 585442 10606.596536 0.018117 0.000007 0.060864
PLUMED: 4A 8 @8 2342 0.028054 0.000012 0.000000 0.009357
PLUMED: 5 Applying (backward loop) 585442 10564.210273 0.018045 0.000018 0.059477
PLUMED: 5A 0 @8 2342 0.009771 0.000004 0.000000 0.004683
PLUMED: 5A 1 metad2 585442 1.877526 0.000003 0.000000 0.014044
PLUMED: 5A 2 metad1 585442 1.218530 0.000002 0.000000 0.014046
PLUMED: 5A 3 psi2 585442 12.586233 0.000021 0.000002 0.018732
PLUMED: 5A 4 phi2 585442 2.990273 0.000005 0.000001 0.014047
PLUMED: 5A 5 psi1 585442 1.749104 0.000003 0.000001 0.014044
PLUMED: 5A 6 phi1 585442 1.127604 0.000002 0.000001 0.009375
PLUMED: 5A 7 @1 585442 0.157751 0.000000 0.000000 0.008696
PLUMED: 5A 8 @0 585442 0.154341 0.000000 0.000000 0.009359
PLUMED: 5B Update forces 585442 10530.212192 0.017987 0.000007 0.059465
PLUMED: 6 Update 585442 44.106911 0.000075 0.000003 6.582783
``
Hi Giovanni, running the same simulation with gromacs/2020.3 + plumed/2.6.1 gives the following result,
On 1 MPI rank, each using 15 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 15 25001 8.775 328.261 1.3
Launch GPU ops. 1 15 2500001 202.213 7564.559 31.1
Force 1 15 2500001 228.468 8546.737 35.1
Wait PME GPU gather 1 15 2500001 3.912 146.326 0.6
Reduce GPU PME F 1 15 2500001 11.026 412.461 1.7
NB X/F buffer ops. 1 15 4975001 34.329 1284.203 5.3
Write traj. 1 15 501 0.860 32.189 0.1
Update 1 15 5000002 52.287 1955.994 8.0
Constraints 1 15 5000002 77.111 2884.637 11.8
Rest 32.044 1198.740 4.9
-----------------------------------------------------------------------------
Total 651.024 24354.107 100.0
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 9765.352 651.024 1500.0
(ns/day) (hour/ns)
Performance: 663.570 0.036
Finished mdrun on rank 0 Tue Sep 21 10:28:39 2021
PLUMED: Cycles Total Average Minumum Maximum
PLUMED: 1 224.058133 224.058133 224.058133 224.058133
PLUMED: 1 Prepare dependencies 2500001 1.469931 0.000001 0.000000 0.001052
PLUMED: 2 Sharing data 2500001 25.894379 0.000010 0.000008 0.013028
PLUMED: 3 Waiting for data 2500001 3.716172 0.000001 0.000001 0.001440
PLUMED: 4 Calculating (forward loop) 2500001 98.568239 0.000039 0.000031 0.038046
PLUMED: 4A 0 @0 2500001 0.861788 0.000000 0.000000 0.000220
PLUMED: 4A 1 @1 2500001 7.074460 0.000003 0.000002 0.000557
PLUMED: 4A 2 phi1 2500001 13.335195 0.000005 0.000004 0.022992
PLUMED: 4A 3 psi1 2500001 11.686390 0.000005 0.000003 0.017996
PLUMED: 4A 4 phi2 2500001 11.629067 0.000005 0.000003 0.020036
PLUMED: 4A 5 psi2 2500001 11.834101 0.000005 0.000003 0.001336
PLUMED: 4A 6 metad1 2500001 14.236606 0.000006 0.000004 0.022017
PLUMED: 4A 7 metad2 2500001 13.023018 0.000005 0.000004 0.038014
PLUMED: 4A 8 @8 10001 0.004306 0.000000 0.000000 0.000005
PLUMED: 5 Applying (backward loop) 2500001 42.420516 0.000017 0.000014 0.003928
PLUMED: 5A 0 @8 10001 0.001630 0.000000 0.000000 0.000000
PLUMED: 5A 1 metad2 2500001 1.141783 0.000000 0.000000 0.000207
PLUMED: 5A 2 metad1 2500001 0.791034 0.000000 0.000000 0.002841
PLUMED: 5A 3 psi2 2500001 4.993111 0.000002 0.000002 0.000593
PLUMED: 5A 4 phi2 2500001 2.104832 0.000001 0.000001 0.000188
PLUMED: 5A 5 psi1 2500001 2.017699 0.000001 0.000001 0.000202
PLUMED: 5A 6 phi1 2500001 1.878040 0.000001 0.000001 0.000239
PLUMED: 5A 7 @1 2500001 0.382523 0.000000 0.000000 0.000158
PLUMED: 5A 8 @0 2500001 0.318751 0.000000 0.000000 0.000146
PLUMED: 5B Update forces 2500001 16.413717 0.000007 0.000004 0.003916
PLUMED: 6 Update 2500001 35.470910 0.000014 0.000003 2.715419
Which makes me think it is a software issue. The compilation process I followed is the same for both packages, you can have an idea by looking at https://github.com/PawseySC/maali-cygnet/blob/master/centos7.6/gromacs.cyg but it is essentially what is suggested on the gromacs website
@dipietrantonio thanks for the detailed report. You are right, there is a problem. Actually, thinking back at it, it is just impossible that the calculation of a single torsional angle takes on average 0.014558 seconds (as in the first example). The second example makes much more sense (0.000005).
The only thing that I could suggest is to find out if the problem is GMX 2020->2021 or PLUMED 2.6->2.7. Actually, plumed patches are transferable across versions, so you could just copy the files patches/gromacs-2021.config
and patches/gromacs-2021.diff
in PLUMED 2.6 to enable 2021 + 2.6.
If the problem is in the PLUMED upgrade (i.e., with the same GMX, PLUMED 2.6 is working and PLUMED 2.7 is not) then it is definitely to be solved on our side. I would be surprised as nobody reported this type of issue
If the problem is with GROMACS upgrade, then it is difficult to understand what's happening. Perhaps GROMACS 2021 is doing something on the environment that significantly slows down PLUMED?
In short, this seems a relevant problem and we would be happy to learn more and help but I am not sure we will be able.
Also notice that if you patch with -m runtime
(instead of -m shared
) you will be able to pick the PLUMED version at runtime setting the env var PLUMED_KERNEL
. It can make these types of test faster (you compile GROMACS with PLUMED 2.7 only and then switch back to PLUMED 2.6 at runtime).
Hi Giovanni, I have compiled gromacs/2021 with plumed 2.6.1, following your instructions. It results in a slow simulation too, indicating that there is something in plumed that does not work well with newer versions of gromacs. Here is the debug info:
Core t (s) Wall t (s) (%)
Time: 97165.044 6477.692 1500.0
1h47:57
(ns/day) (hour/ns)
Performance: 1.134 21.168
Finished mdrun on rank 0 Thu Sep 23 15:06:06 2021
PLUMED: Cycles Total Average Minumum Maximum
PLUMED: 1 6095.349020 6095.349020 6095.349020 6095.349020
PLUMED: 1 Prepare dependencies 42501 0.036179 0.000001 0.000000 0.000708
PLUMED: 2 Sharing data 42501 620.260253 0.014594 0.000034 0.046939
PLUMED: 3 Waiting for data 42501 0.102702 0.000002 0.000001 0.001097
PLUMED: 4 Calculating (forward loop) 42501 4671.063844 0.109905 0.054999 0.209329
PLUMED: 4A 0 @0 42501 0.025230 0.000001 0.000000 0.000611
PLUMED: 4A 1 @1 42501 0.137245 0.000003 0.000002 0.001743
PLUMED: 4A 2 phi1 42501 647.809034 0.015242 0.004417 0.047539
PLUMED: 4A 3 psi1 42501 810.651214 0.019074 0.004679 0.056561
PLUMED: 4A 4 phi2 42501 795.250604 0.018711 0.001431 0.056198
PLUMED: 4A 5 psi2 42501 805.875933 0.018961 0.000010 0.060865
PLUMED: 4A 6 metad1 42501 812.013233 0.019106 0.000010 0.055463
PLUMED: 4A 7 metad2 42501 797.998516 0.018776 0.000008 0.054968
PLUMED: 4A 8 @8 171 0.009526 0.000056 0.000000 0.009414
PLUMED: 5 Applying (backward loop) 42501 800.519477 0.018835 0.000020 0.051540
PLUMED: 5A 0 @8 171 0.000033 0.000000 0.000000 0.000003
PLUMED: 5A 1 metad2 42501 0.096811 0.000002 0.000000 0.009356
PLUMED: 5A 2 metad1 42501 0.053702 0.000001 0.000000 0.009048
PLUMED: 5A 3 psi2 42501 0.537368 0.000013 0.000002 0.014036
PLUMED: 5A 4 phi2 42501 0.155386 0.000004 0.000001 0.009361
PLUMED: 5A 5 psi1 42501 0.076498 0.000002 0.000001 0.004693
PLUMED: 5A 6 phi1 42501 0.078808 0.000002 0.000001 0.009359
PLUMED: 5A 7 @1 42501 0.013363 0.000000 0.000000 0.004678
PLUMED: 5A 8 @0 42501 0.008027 0.000000 0.000000 0.000168
PLUMED: 5B Update forces 42501 798.837358 0.018796 0.000007 0.051528
PLUMED: 6 Update 42501 2.121210 0.000050 0.000003 0.201005
and I attach the complete log for reference.
So, what would be the next step? md_carb.log
Hi Giovanni, were you able to reproduce the issue?
Best regards, Cristian Di Pietrantonio
I am seeing a similar slow down when using GROMACS-2021-EasyBuild-4.5.0-PLUMED-2.7.2 (I think it is GROMACS 2021.3). A simulation running without plumed is about 10 times faster when using CPUs only. Yours, Mat
UPDATE: I also compiled Gromacs 2021.5 with plumed 2.6.5 (using the Gromacs 2021.4 patch files from plumed 2.7.3) and saw the same slowdown as before. For reference, my system contains 270000 atoms. This is on an AMD EPYC Rome system using MPI parallelisation. I have tried different versions of GROMACS and Plumed and using GNU and Intel compilers and associated libraries (Scalapack/MKL, openMPI etc)
Dear plumed developers,
I am a Supercomputing Applications Specialist at Pawsey Supercomputing Centre. I have patched gromacs/2021 with plumed2/2.7.2 and I have run a test case provided by a user. Unfortunately, the simulation runs very slow, that is,
whereas with the same build process, I have build gromacs/2020.3 patched with plumed 2.6.1and what I get is
without plumed, the simulation with gromacs 2021 runs much faster, which makes me think it is a problem with plumed?
Best regards, Cristian Di Pietrantonio