Running free energy calculations from already prepared input

Hi @jmichel80. Yes, this would certainly be a desirable feature. I think our original aim was the ability to re-initialise a FreeEnergy object by passing an existing working directory to the constructor, e.g. something like:

free_nrg = BSS.FreeEnergy.Binding(work_dir)

Ideally this would also let you recover an existing simulation, i.e. it would work out which legs had finished, errored, etc.

In order to do this properly we would need the following additions to BioSimSpace:

Protocol readers. (Given that we write the config files for SOMD and GROMACS it should be easy enough to reverse the process. This could get tricky if the user has customised things or hand edited the files.)
GROMACS reader for perturbable systems. (Should be okay since the information is invertible, other than potential terms that were once impropers now being labelled as dihedrals.)
Pert file reader. (Shouldn't be too hard, but see caveats below.)

My main concern is that using SOMD for any kind of MD is a non information-preserving operation, i.e. SOMD requires specific atom naming and molecule ordering, which might be different to the original system loaded by the user. Once you move outside of BioSimSpace, e.g. by breaking the setup and running stages into multiple nodes, then the original system is currently non-recoverable, hence breaking one of the core design goals of BioSimSpace. This might be okay if you simply want to run the simulations and calculate something, but wouldn't be ideal if you actually want to grab a molecular configuration and write it back to the original file format. (This information might need to be consistent for use with external tools used in a larger workflow, of which BioSImSpace is a part.) Similarly, the pert file contains modifications for terms involving dummy atoms in one of the end states. Without additional information in the file we wouldn't be able to recover the original molecular potential of the lambda = 1 state.

Ideally, SOMD would be updated to allow more flexibilty in the way that perturbations are defined. (Obviously being backwards compatible with the current format.) From previous discussions, e.g. here, it is clear that the use of atom name for the perturbation reference is quite useful since it allows you to use the same file for multiple simulations. As a workaround, perhaps we could also write some kind of mapping file when running simulations with SOMD which tells you how the atom names used by SOMD map to those in the original system. We would also need to map the molecule ordering and record any terms in the lambda = 1 potential that were modified.

None of the above issues are present for alchemical simulations with GROMACS since the topology file is completely self contained and GROMACS doesn't care about the naming scheme.

michellab / BioSimSpace

Running free energy calculations from already prepared input #144