plumed / plumed2

Development version of plumed 2
https://www.plumed.org
GNU Lesser General Public License v3.0
362 stars 288 forks source link

Suggestions for improving WHAM syntax #347

Closed GiovanniBussi closed 6 years ago

GiovanniBussi commented 6 years ago

After some internal discussion with the other PLUMED core-developers, I open an issue on this topic so that anyone can comment. So, particularly in this case, comments from non-core-developers are welcome!

The goal is to arrive at a simple yet flexible syntax for WHAM analysis in PLUMED. Using WHAM to analyze PLUMED simulations was already discussed in many tutorials, including Belfast 2014, Trieste 2017, and other shorter ones. All these tutorials are based on the same short c++ program used in Belfast or an equivalent python script used in Trieste. However, the procedure is still a bit complicated and we would like to simplify it.

Notice that the WHAM analysis that we want to consider is the binless formulation which, although more expensive, is the most flexible one and would allow combining simulations with arbitrary input files (including e.g. any form of umbrella sampling, bias exchange, collective-variable tempering, etc).

If you are familiar with the current way of doing WHAM analysis within PLUMED, you might help us in making the procedure simpler! Questions are below in boldface.

Current procedure

With the current tools, the WHAM analysis requires these steps:

Step 1: Run your simulations, saving the trajectories. Simulations might be independent, although in many cases they will be done using a replica exchange simulation (recommended). In case they are independent, they might even have different lenghts and be saved with different strides (these options are not supported by the current wham scripts, but would be easy to add, I am already using them for a separate project). Currently, with replica exchange it is something like:

> mpirun -np 4 gmx mdrun -plumed plumed.dat -replex 100

Step 2: Recompute the bias potential for all the trajectories using all the Hamiltonians. With current scripts, this is done concatenating the trajectory first and then running the driver on it. Notice that if you are using metadynamics this potential might be the "last potential" (as discussed here) or the "current potential with a shift" (as discussed here). Currently, with replica exchange it is something like:

> trjcat -f traj*.xtc -o all.xtc
> mpirun -np 4 plumed driver --multi 4 --plumed plumed-wham.dat

where plumed-wham.dat is equal to plumed.dat plus the following lines:

all: COMBINE ARG=*.bias
PRINT ARG=all FILE=all

To be precise, some change should be made if metadynamics was used, namely: HEIGHT should be set to zero, PACE to a very large number, and RESTART=YES should be added (see e.g. this tutorial). Notice that plumed-wham.dat is very similar to plumed.dat.

In case one did not use replica exchange things are a bit different, and individual plumed.dat files should be used. There is yet another possibility: you used replica exchange during the simulation but you do not want to use mpi while processing it. In this case, we might have to cheat PLUMED with some option that would be easy to add. Something like

> plumed driver --plumed plumed-wham.dat --replica 0
> plumed driver --plumed plumed-wham.dat --replica 1
> plumed driver --plumed plumed-wham.dat --replica 2
> plumed driver --plumed plumed-wham.dat --replica 3

Step 3: Use a script to process the output of the previous step and obtain weights. This is either a c++ or python tool. Something like

> python3 SCRIPTS/wham.py ALLBIAS 4 2.5

As a minimal improvement, this could be included as a standard PLUMED command line tool. Something like

> plumed wham --bias ALLBIAS --replicas 4 --kbt 2.5 --weights weights

Step 4: Use the mentioned weights to compute arbitrary averages. Currently, we have no way to do this within PLUMED. However, we could extend PLUMED so as to read the weights produced by step 3 and use them to work with, e.g., histograms or averages. For instance, we could implement something to allow a plumed-average.dat file like this:

data: DISTANCE ATOMS=1,2 # some CV you want to analyze
w: REWEIGHT_FILE FILE=weights
av: AVERAGE ARG=data STRIDE=1 LOGWEIGHTS=w
PRINT ARG=av STRIDE=10000 FILE=colvar

Possible simplifications

Clearly some of these steps could be merged in order to simplify the procedure.

Steps 1 and 2 could be merged by computing the energy of the other replicas on the fly. Clearly, this is not possible if one of the following statements is true:

In order to avoid these limitations, I would avoid this merge and I would keep step 1 separate. In other words, the analysis would always be done a posterori. Is this a good idea?

Steps 2 and 3 could be merged by embedding the code now included in the wham script into PLUMED. This could be done on the concatenated trajectory, requiring a concatenation, or we might also modify the driver so that it is capable of reading multiple trajectories so as to avoid the concatenation (easy to do). This procedure would produce an array of weights to be used in step 4. Thus the file plumed-wham.dat would be something like plumed.dat plus the following line:

WHAM_WEIGHTS ARG=*.bias FILE=wham_weights

(this would save weights to file wham_weights).

Notice that this procedure will not work if separate simulations were run (no replica exchange), since in this case it would be necessary to manipulate all the Hamiltonians at the same time. Notice that it is difficult to combine separate plumed.dat files since names might interfere. A possible solution would be an input like:

r1: PLUMED FILE=plumed.dat DIR=dir1
r2: PLUMED FILE=plumed.dat DIR=dir2
r3: PLUMED FILE=plumed.dat DIR=dir3
r4: PLUMED FILE=plumed.dat DIR=dir4
WHAM_WEIGHTS ARG1=r1.bias ARG2=r2.bias ARG3=r3.bias ARG4=r4.bias FILE=wham_weights

Here PLUMED would be a special keyword that runs a separate PLUMED instance on the specified file and directory, that could by the way be useful for other applications as well.

My question is: is it worth implementing this or should we NOT merge steps 2 and 3 when analyzing separate simulations?

Steps 3 and 4 could be merged similarly. Precisely, step 3 could be done at the end of step 2 (as discussed above) or at the beginning of step 3. In this latter case, one should read the file with the bias potentials at the beginning, compute the weights, and then make them available for further analysis in the same plumed driver run. Thus the plumed-average.dat file used for analysis would be something like

w: REWEIGHT_WHAM TEMP=1 FILE=all # this would do all the work at the first step
av: AVERAGE ARG=data STRIDE=1 LOGWEIGHTS=w
PRINT ARG=av STRIDE=10000 FILE=colvar

Here all is the file with all potentials produced at the end of step 2. My questions is: is this useful or do people prefer to, in any case, store the weights in order to analyze them with some external tool?

Merging steps 2, 3, and 4 is more difficult since it would conceptually require scanning the trajectory twice: once for accumulating the bias potentials and a second time to do the analysis. It turns out however that it is not difficult to implement, since the current analysis framework allows one to postpone the analysis after the WHAM stuff has been done. We currently have a tentative implementation that can do something like this:

... here goes your original plumed.dat file ...

data: DISTANCE ATOMS=1,2 # some CV you want to analyze
w: REWEIGHT_WHAM ARG=*.bias
cc: COLLECT_FRAMES ARG=d STRIDE=1 LOGWEIGHTS=w
av: AVERAGE ARG=cc.data STRIDE=1
PRINT ARG=av STRIDE=10000 FILE=colvar

Here the COLLECT_FRAMES keyword is used to postpone the evaluation of AVERAGE. My question is: Is this implementation misleading (because of the postponed evaluation of the average) or is it clear?

Final remarks

Clearly, it is possible to leave more than one path to do the same thing. However, I think it is important that we design the standard way of doing WHAM (that is: the one that we use in tutorials) so that it has the right compromise between being easy to use and flexible.

Many thanks if you arrived at the end of this very long post!

msultan commented 6 years ago

My question is: is it worth implementing this or should we NOT merge steps 2 and 3 when analyzing separate simulations?

I would vote for NOT merging steps 2 and 3 but maybe having plumed wham having an additional argument run-driver so that it can compute the biases as needed.

To be precise, some change should be made if metadynamics was used, namely: HEIGHT should be set to zero, PACE to a very large number, and RESTART=YES should be added (see e.g. this tutorial). Notice that plumed-wham.dat is very similar to plumed.dat.

I would like to lobby for this step being automated. There are several threads on the google group, including my own, where people have asked about this step. I

Here all is the file with all potentials produced at the end of step 2. My questions is: is this useful or do people prefer to, in any case, store the weights in order to analyze them with some external tool?

Having biases and weights in a separate file is extremely useful for using other estimators like TRAM/DHAM etc or maybe just even plotting using python libraries.

Here the COLLECT_FRAMES keyword is used to postpone the evaluation of AVERAGE. My question is: Is this implementation misleading (because of the postponed evaluation of the average) or is it clear?

I dont think there is anything wrong with this.

Thank you for continuing to develop Plumed! Its awesome!

GiovanniBussi commented 6 years ago

A short update on this. We just discussed how to do it:

@msultan notice that using a command line tool plumed wham that calls the driver is actually a way to merge steps 2 and 3. So, the way we plan to do it is the opposite way round (plumed driver that, thanks to a REWEIGHT_WHAM action in the input file, does the wham calculation). I hope it is still clear.

In any case, doing wham in the way it is done now will still be possible.

Things will be merged before version 2.5.

Thanks!

Giovanni

GiovanniBussi commented 6 years ago

This is closed now