plumed / plumed2

Development version of plumed 2
https://www.plumed.org
GNU Lesser General Public License v3.0
323 stars 269 forks source link

Enable EMST in all colvars using makeWhole() #1068

Open GiovanniBussi opened 2 months ago

GiovanniBussi commented 2 months ago

Description

This is changing how PBCs are dealt with in a lot of CVs, so please @maxbonomi @carlocamilloni @gtribello @Iximiel have a look if it makes sense

Current state

Currently, most CVs handle PBCs automatically to some extent. In many cases, the CV definition is not depending on periodic boundaries per se, but just requires molecules to be properly reconstructed before the CV is computed. This is achieved by calling the function makeWhole. This function makes a local copy of the coordinates whole by computing the distance between consecutive atoms in the list using PBCs.

WHOLEMOLECULES is someway special, because in addition to this can use a minimum spanning tree computed on the reference structure provided by MOLINFO (see #681). In practice:

MOLINFO STRUCTURE=ref.pdb WHOLE
WHOLEMOLECULES ENTITY0=1-123 EMST

will reconstruct PBCs using the spanning tree. Here we specify the information twice:

  1. We note that the ref.pdb is WHOLE.
  2. We tell to WHOLEMOLECULE to use those coordinates as a reference

Proposed change

I here implemented to possibility to use the spanning tree also in local reconstructions (e.g., in RMSD, GYRATION, etc). The logic is the following: for a given CV, if the last appearing MOLINFO in the input file is marked as WHOLE we use the spanning tree. Otherwise we use the ordered reconstruction. With NOPBC, PBCs are always ignored.

For example:

# no molfile, so we use the standard (ordered) reconstruction
r1: RMSD REFERENCE=ref.pdb
# PBC are not used:
r2: RMSD REFERENCE=ref.pdb NOPBC

MOLINFO STRUCTURE=ref.pdb WHOLE
# use the minimum spanning tree
r3: RMSD REFERENCE=ref.pdb
# PBC are not used, same as r2
r4: RMSD REFERENCE=ref.pdb NOPBC

MOLINFO STRUCTURE=ref.pdb # notice: no WHOLE here
# latest molfile is not whole, so we use the standard (ordered) reconstruction
# same as r1
r5: RMSD REFERENCE=ref.pdb
# PBC are not used, same as r2 and r4
r6: RMSD REFERENCE=ref.pdb NOPBC

For consistency, I did the same to WHOLEMOLECULES. This means that:

Closed issue

This PR addresses points 1 and 4 in #681. Given that point 2 is closed and point 3 was optional (and likely not a good idea, because it would be extremely expensive), I think this closes #681 .

Implementation

This required me to rewrite the Tree class so as to manage a representation of the tree where we store indexes rather than AtomNumbers. In pratice, I construct the indexes list, then I use it to construct the AtomNumbers list. This is only done during initialization.

Notice that a separate tree is build for each CV depending on the used atoms. So, in order to compute the end-to-end distance in a polymer it is necessary to use WHOLEMOLECULES!

Target release

I would like my code to appear in release v2.10

carlocamilloni commented 2 months ago

i think is a very useful feature

maxbonomi commented 1 month ago

I agree!

GiovanniBussi commented 1 month ago

Regarding this:

Given that point 2 is closed and point 3 was optional (and likely not a good idea, because it would be extremely expensive)

maybe it is worth to think about the on-the-fly version. This would require to rewrite the EMST in a more general version that:

Syntax could be something like:

# no need to have a whole reference structure: the structure is made whole on the fly!

# do it every time the action is enabled, here it's every 1000 steps
DUMPATOMS ATOMS=1-1000 DYNAMIC_EMST=YES STRIDE=1000

# do it every fixed pace (here 1000), assuming it's constant in the intermediate steps
# very much like a neighbor list
DUMPATOMS ATOMS=1-1000 DYNAMIC_EMST=1000 STRIDE=10

And DYNAMIC_EMST would be a kind of ubiquitous keyword, similar to NOPBC. We might choose a better name for it

I think this can be super useful for visualization, so I want to give it a try. In the end, this is order N^2 calculation, which is the same that happens for Coordination numbers when we do the neighbor list update, so it is not completely crazy to use it "as a neighbor list".