michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Running free energy calculations from already prepared input #144

Open jmichel80 opened 4 years ago

jmichel80 commented 4 years ago

Currently, at least for somd, it appears it isn't possible to setup input files for a free energy calculation in one BioSimSpace script, and load the input and run a free energy calculation with a separate BioSimSpace script.

This use case arise if one wishes to prepare a large number of input files on one computer, and then move the prepped inputs somewhere else for running simulations. It is also possible to have a time-consuming setup phase for some ligands which would be ideally run only once even if multiple simulations are carried out later using that input.

lohedges commented 4 years ago

Hi @jmichel80. Yes, this would certainly be a desirable feature. I think our original aim was the ability to re-initialise a FreeEnergy object by passing an existing working directory to the constructor, e.g. something like:

free_nrg = BSS.FreeEnergy.Binding(work_dir)

Ideally this would also let you recover an existing simulation, i.e. it would work out which legs had finished, errored, etc.

In order to do this properly we would need the following additions to BioSimSpace:

My main concern is that using SOMD for any kind of MD is a non information-preserving operation, i.e. SOMD requires specific atom naming and molecule ordering, which might be different to the original system loaded by the user. Once you move outside of BioSimSpace, e.g. by breaking the setup and running stages into multiple nodes, then the original system is currently non-recoverable, hence breaking one of the core design goals of BioSimSpace. This might be okay if you simply want to run the simulations and calculate something, but wouldn't be ideal if you actually want to grab a molecular configuration and write it back to the original file format. (This information might need to be consistent for use with external tools used in a larger workflow, of which BioSImSpace is a part.) Similarly, the pert file contains modifications for terms involving dummy atoms in one of the end states. Without additional information in the file we wouldn't be able to recover the original molecular potential of the lambda = 1 state.

Ideally, SOMD would be updated to allow more flexibilty in the way that perturbations are defined. (Obviously being backwards compatible with the current format.) From previous discussions, e.g. here, it is clear that the use of atom name for the perturbation reference is quite useful since it allows you to use the same file for multiple simulations. As a workaround, perhaps we could also write some kind of mapping file when running simulations with SOMD which tells you how the atom names used by SOMD map to those in the original system. We would also need to map the molecule ordering and record any terms in the lambda = 1 potential that were modified.

None of the above issues are present for alchemical simulations with GROMACS since the topology file is completely self contained and GROMACS doesn't care about the naming scheme.