Possible slow down in htt (merging large vectors twice per step)

GiovanniBussi commented 7 months ago

@gtribello I am looking for hot spots here and there and I discovered that there is a potentially expensive thing that is done twice per step with htt (once per step before).

In particular, in DomainDecomposition.cpp, the operation mergeSortedVectors is done:

in DomainDecomposition::share() (equivalent to pre-htt), for local (in the MPI sense) atoms
in DomainDecomposition::reset() for all (non local) atoms

Is this expected? If I just remove the call to getAllActiveAtoms in reset(), I get a speed up of ~ 10% with this input:

    c1: CENTER ATOMS=1-20000 NOPBC
    c2: CENTER ATOMS=20000-40000 NOPBC
    c3: CENTER ATOMS=40000-60000 NOPBC
    a: ANGLE ATOMS=c1,c2,c3
    RESTRAINT ARG=a AT=0 KAPPA=1

However, the code is not working anymore correctly when we use domain decomposition. Which is the correct way of removing this unnecessary calculation? Why do we need to set to zero forces that are expected to be never used on the local processor?

gtribello commented 7 months ago

Hello

I had a look at this and if you are using domain decomposition it is necessary to have that call twice per step.

There is a difference between what needs to be in unique in DomainDecomposition::share() and in DomainDecomposition::reset(). In particular:

In DomainDecomposition::share() unique needs to contain only the atoms in the local domain
In DomainDecomposition::reset() unique needs to contain all the atoms that plumed may have added force on (which Is all the atoms in all the domains).

If the DomainDecomposition is off then you can avoid calling getAllActiveAtoms in reset. If that is not the case then I think there is no way to avoid the second call.

GiovanniBussi commented 7 months ago

Addressed in #1027

I leave this open to remember that we should check the impact in MPI runs and, if necessary, address it

plumed / plumed2

Possible slow down in htt (merging large vectors twice per step) #1024