Split list of forced actions from list of non-forced actions in domain decomposition

gtribello commented 3 months ago

Description

This is my try at the rework of the stuff for speeding up the applying of forces in domain decomposition. I have tried to do this in the way we discussed in the meeting on Wednesday. You can see it speeds up the ally force part. However, the sharing of data is slowed down. I think this may because I have not made the changes to DomainDecomposition correctly.

Can you take a look @GiovanniBussi and let me know if I am doing anything wrong.

BENCH:  Kernel:      this
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.000 +- 0.000
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.006875     0.006875     0.006875     0.006875
BENCH:  B0 First step                                      1     0.000716     0.000716     0.000716     0.000716
BENCH:  B1 Warm-up                                       399     0.093572     0.000235     0.000223     0.000263
BENCH:  B2 Calculation part 1                            800     0.188908     0.000236     0.000228     0.000284
BENCH:  B3 Calculation part 2                            800     0.190237     0.000238     0.000229     0.000285
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     0.479767     0.479767     0.479767     0.479767
PLUMED: 1 Prepare dependencies                          2000     0.000242     0.000000     0.000000     0.000009
PLUMED: 2 Sharing data                                  2000     0.193900     0.000097     0.000091     0.000266
PLUMED: 3 Waiting for data                              2000     0.000633     0.000000     0.000000     0.000005
PLUMED: 4 Calculating (forward loop)                    2000     0.256157     0.000128     0.000120     0.000165
PLUMED: 5 Applying (backward loop)                      2000     0.001365     0.000001     0.000000     0.000293
PLUMED: 6 Update                                        2000     0.000811     0.000000     0.000000     0.000006
BENCH:  
BENCH:  Kernel:      libplumedKernel.dylib
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 0.947 +- 0.001
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.006509     0.006509     0.006509     0.006509
BENCH:  B0 First step                                      1     0.000614     0.000614     0.000614     0.000614
BENCH:  B1 Warm-up                                       399     0.088491     0.000222     0.000212     0.000267
BENCH:  B2 Calculation part 1                            800     0.178787     0.000223     0.000216     0.000275
BENCH:  B3 Calculation part 2                            800     0.180250     0.000225     0.000216     0.000264
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     0.453969     0.453969     0.453969     0.453969
PLUMED: 1 Prepare dependencies                          2000     0.000310     0.000000     0.000000     0.000010
PLUMED: 2 Sharing data                                  2000     0.118912     0.000059     0.000056     0.000223
PLUMED: 3 Waiting for data                              2000     0.000627     0.000000     0.000000     0.000014
PLUMED: 4 Calculating (forward loop)                    2000     0.256020     0.000128     0.000121     0.000179
PLUMED: 5 Applying (backward loop)                      2000     0.050735     0.000025     0.000024     0.000236
PLUMED: 6 Update                                        2000     0.000820     0.000000     0.000000     0.000014

Target release

I would like my code to appear in release 2.10

Type of contribution

[x] changes to code or doc authored by PLUMED developers, or additions of code in the core or within the default modules
[ ] changes to a module not authored by you
[ ] new module contribution or edit of a module authored by you

Copyright

[x] I agree to transfer the copyright of the code I have written to the PLUMED developers or to the author of the code I am modifying.

[ ] the module I added or modified contains a COPYRIGHT file with the correct license information. Code should be released under an open source license. I also used the command cd src && ./header.sh mymodulename in order to make sure the headers of the module are correct.

Tests

[ ] I added a new regtest or modified an existing regtest to validate my changes.
[x] I verified that all regtests are passed successfully on GitHub Actions.

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 84.61538% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 83.32%. Comparing base (7b7bedf) to head (d780cf9).

:exclamation: Current head d780cf9 differs from pull request most recent head d927b23. Consider uploading reports for the commit d927b23 to get more accurate results

Files	Patch %	Lines
src/core/ActionAtomistic.cpp	75.00%	1 Missing :warning:
src/generic/DumpMassCharge.cpp	0.00%	1 Missing :warning:
src/generic/PrintNDX.cpp	0.00%	1 Missing :warning:
src/generic/WrapAround.cpp	0.00%	1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #1047 +/- ## ========================================== - Coverage 83.32% 83.32% -0.01% ========================================== Files 619 619 Lines 59373 59393 +20 ========================================== + Hits 49475 49491 +16 - Misses 9898 9902 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

gtribello commented 3 months ago

Thanks @GiovanniBussi

I tried changing it using the first suggestion and it worked. The revised code is now faster:

BENCH:  Kernel:      this
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.000 +- 0.000
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.006532     0.006532     0.006532     0.006532
BENCH:  B0 First step                                      1     0.000776     0.000776     0.000776     0.000776
BENCH:  B1 Warm-up                                       399     0.076957     0.000193     0.000187     0.000234
BENCH:  B2 Calculation part 1                            800     0.155136     0.000194     0.000187     0.000236
BENCH:  B3 Calculation part 2                            800     0.155996     0.000195     0.000187     0.000410
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     0.394842     0.394842     0.394842     0.394842
PLUMED: 1 Prepare dependencies                          2000     0.000243     0.000000     0.000000     0.000013
PLUMED: 2 Sharing data                                  2000     0.115616     0.000058     0.000055     0.000336
PLUMED: 3 Waiting for data                              2000     0.000603     0.000000     0.000000     0.000008
PLUMED: 4 Calculating (forward loop)                    2000     0.250311     0.000125     0.000121     0.000314
PLUMED: 5 Applying (backward loop)                      2000     0.001424     0.000001     0.000000     0.000288
PLUMED: 6 Update                                        2000     0.000727     0.000000     0.000000     0.000011
BENCH:  
BENCH:  Kernel:      libplumedKernel.dylib
BENCH:  Input:       plumed.dat
BENCH:  Comparative: 1.123 +- 0.001
BENCH:                                                Cycles        Total      Average      Minimum      Maximum
BENCH:  A Initialization                                   1     0.009362     0.009362     0.009362     0.009362
BENCH:  B0 First step                                      1     0.001036     0.001036     0.001036     0.001036
BENCH:  B1 Warm-up                                       399     0.086324     0.000216     0.000210     0.000245
BENCH:  B2 Calculation part 1                            800     0.174179     0.000218     0.000210     0.000249
BENCH:  B3 Calculation part 2                            800     0.175182     0.000219     0.000210     0.000283
PLUMED:                                               Cycles        Total      Average      Minimum      Maximum
PLUMED:                                                    1     0.445239     0.445239     0.445239     0.445239
PLUMED: 1 Prepare dependencies                          2000     0.000250     0.000000     0.000000     0.000016
PLUMED: 2 Sharing data                                  2000     0.116273     0.000058     0.000055     0.000392
PLUMED: 3 Waiting for data                              2000     0.000609     0.000000     0.000000     0.000011
PLUMED: 4 Calculating (forward loop)                    2000     0.249121     0.000125     0.000120     0.000321
PLUMED: 5 Applying (backward loop)                      2000     0.049446     0.000025     0.000023     0.000293
PLUMED: 6 Update                                        2000     0.000783     0.000000     0.000000     0.000013

GiovanniBussi commented 3 months ago

Great! It is reassuring that the share step takes the same time as before. I guess how does this depend on the force/noforce pattern. Anyway, I think this could be merged.

And do you think the same trick could be used to make in steps the construction of local/global atoms?

Maybe the simplest way is to:

store a list of non local atoms in each action
in DomainDecomposition::getAllActiveAtoms, merge all the nonlocal atoms with the (already available) unique vector

Is this correct or there some case where this might not work?

plumed / plumed2