wehs7661 / ensemble_md

A python package for performing GROMACS simulation ensembles
MIT License
13 stars 4 forks source link

Enable coordinate modification in the EEXE framework #15

Closed wehs7661 closed 1 year ago

wehs7661 commented 1 year ago

To expand the usage of EEXE, we want to enable coordinate manipulation at exchanges between replicas, which is most likely to be useful for estimating the free energy of multiple serial mutations using expanded ensemble simulations, such as mutating methane into ethane and then propane.

For example, we can have an EEXE simulation composed of two replicas mutating methane into ethane and ethane into propane, respectively, and only exchange the coordinates between replicas when they are at the end states, i.e., replica 1 being at λ=1 and replica 2 being at λ=0. In this example, we will have the following end states:

At exchanges, we will have two output gro files respectively from replicas 1 and 2, namely rep1.gro (state b, ethane with a dummy H atom at the first carbon) and rep2.gro (state c, ethane with a dummy ethyl group at the second carbon).

Note that in EEXE, each replica is bound to the transformation for its assigned alchemical range. In our case, this means that replica 1 will only be responsible for the mutation of a methane to an ethane, and replica 2 will only be responsible for mutating an ethane to a propane. Normally, we would just swap the gro files as is, so in the next iteration, replica 1 will be initialized with rep2.gro and sample the intermediate states along the mutation path between methane and ethane. However, rep2.gro is an ethane with a dummy methyl group, not an ethane with a dummy H atom that we need for such sampling. The same thing would happen when trying to initialize the next iteration of replica 2 using rep1.gro.

To address this issue, we can modify rep2.gro as follows and use it to proceed to the next iteration of replica 1:

Similarly, we can modify rep1.gro as follows for the next iteration of replica 2:

Importantly, we can make the two modified gro files have the same potential energy, so the proposed exchange will always be adopted.

Here, we are not going to implement functions for coordinate manipulation in EEXE but modify the CLI run_EEXE (and the function run_grompp in ensemble_EXE.py, if necessary) to allow the flexibility of calling a user-defined function for coordinate manipulation from an input python module (where the user-defined function is defined).

wehs7661 commented 1 year ago

To better keep track of all efforts relevant to enabling chemistry transformation in the EEXE framework, I've expanded the scope of the original issue. Specifically, at this point, we have enabled coordinate manipulation, but the following two features still need to be enabled in the near term. For details on each of these tasks, please refer to the corresponding issues or PRs.

wehs7661 commented 1 year ago

It might make more sense to use a GitHub project to keep track of relevant efforts (instead of using an issue to track other issues), so I'm changing the title of the issue back to its original and closing the issue now.

This issue is a part of the work in the project EEXE for serial mutations.

wehs7661 commented 1 year ago

I'm re-opening the issue since the original implementation failed would fail to import the external module using importlib. This is because the working directory where run_EEXE is executed is not in sys.path that importlib looks for the module. Appending the current working directory to sys.path should solve the problem.

Also, the current implementation assumes that the function for modifying coordinates takes only one input GRO file, but it seems more common that such a function takes in two input GRO files. For example, in the case described above, to modify rep1.gro, we'll also need coordinates of the methyl group from rep2.gro. This has to be dealt with and we should probably explore options that allow higher flexibility when possible.

wehs7661 commented 1 year ago

Closing the issue since the abovementioned problems have been fixed in PR #21.