michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Feature discussion: multi-step perturbation #165

Closed jmichel80 closed 3 years ago

jmichel80 commented 3 years ago

Opening this thread here for discussions with @JenkeScheen . The current focus is SOMD and initial tests will be done using simple python scripts but it would be useful to eventually support this from within BioSimSpace to facilitate integration with existing workflows and to roll out functionality to other free energy codes.

In some cases it may be desirable to implement a multi-staged lambda schedule where only some terms of the potential energy function are coupled to a lambda parameter at a given time. For instance, a general scheme that could apply to a morph that contains both atoms that need to become dummy atoms ('to dummy'), or start as dummy atoms ('from dummy'), or atoms that remain present throughout ('hard atoms') could be implemented using 6 steps described below.

image

Each step should have its own pert file. The same morph topology should be applicable throughout. This leads to six simulations each with their own lambda schedule to be determined. Each simulation should be analysed separately with analyse_freenrg mbar. The sum of the six resulting free energy changes should be identical (to within uncertainties) to that obtained from the direct path sketched at the top of the figure (in practice this would only be true for well behaved systems).

Some of the steps could be lumped together for efficiency, e.g. bonded terms + hard atoms charges/LJ if testing reveal these converged rapidly.

JenkeScheen commented 3 years ago

Thanks for this Julien. I should be able to get started on writing a script that does this.. Do you think we'd run into trouble with this protocol when trying to transform A to B with mixed growing/shrinking, i.e.: image ?

jmichel80 commented 3 years ago

Should be covered by the example I posted. In some schemes (where e.g. you only need to convert some atoms into dummy atoms) you may not need steps 5 and 6.

lohedges commented 3 years ago

This should be simple enough to implement in the current BioSImSpace set up. We would just need to add a parameter the the protocol for a single free energy leg that lists the terms that are perturbed (or are excluded from the perturbation if that's simpler). That way you could do something like:

protocol = BSS.Protocol.FreeEnergy(terms="charge")

This would write a pert file for the lambda leg that only includes the specified terms.

JenkeScheen commented 3 years ago

That would be ideal Lester, would you think it best that in this case we would declare six protocols and set up a directory with each? i.e.

BSS.FreeEnergy.Solvation(system, protocol_step1, work_dir="step1/")
...
BSS.FreeEnergy.Solvation(system, protocol_step6, work_dir="step6/")

Potentially this could give the user to define different numbers of lambda windows per step.

lohedges commented 3 years ago

Yes, I think that would be best and provide the most amount of flexibility. If certain combinations of stages turn out to be particularly useful, or could be combined, then we could also add keywords to capture those kinds of protocols too.

jmichel80 commented 3 years ago

FYI I believe Gromacs would also be able to handle this with one set of input files using a lambda schedule that allows interpolation of selected components at one time, but haven't looked in details.

lohedges commented 3 years ago

GROMACS is nice in that everything is contained within the topology file and explicitly states what terms are perturbed and the values in each end state. It would be easy enough to adapt the multi-stage approach to GROMACS too by passing a list of the perturbed terms to the GroTop parser using a property map e.g.:

from Sire.IO import GroTop
gro_top = GroTop(system, {perturbation_terms = ["charge"]})

The parser would then extract the terms from the property map and only change the value between the end states for the ones that the user wants to perturb.

jmichel80 commented 3 years ago

Hi @lohedges , thanks for the comments ! @JenkeScheen it will be easier to implement this feature directly in BioSimSpace to generate the panel of pert files you wish. Could you look into extending the API to support generation of input files for this stepwise perturbation protocol. It's ok to only support SOMD for the time being. I would start with a general 6 step protocol and revisit later to allow merging steps. Please use this thread to post questions on how the implementation could be made, and then work on a branch from devel.

lohedges commented 3 years ago

@JenkeScheen I've created a new branch feature_multi_step_perturbation. Please use this for your development work.

JenkeScheen commented 3 years ago

Thanks Lester, I'll use that one. The multistep_perts branch I made is now redundant in that case

lohedges commented 3 years ago

Ah, sorry, hadn't spotted that one. Yes, I'll delete the multistep_perts branch. I use feature_ as a prefix for all feature branches and fix_ for bug-fixes. Makes it easier to search and cross-reference things in larger projects.

Cheers.

lohedges commented 3 years ago

Just copying across comments here for posterity:

@lohedges

Perhaps I've misunderstood, but I would have thought that the perturbation terms would have just been the list of regular terms that we support, e.g. LJ, charge, bond, angle, dihedral, and improper. If you wanted to perturb charge and LJ only during a simulation then you could pass perturbation_terms = ["charge", "LJ"]. (Note that charge and LJ perturbation only apply to dummy atoms, so you don't need to explicitly state DummyToCharges.) Perhaps DummyToCharges turns on charges for the dummy atoms at lambda = 0, while leaving the charges for atoms that would be dummies at lambda = 1 unperturbed.

@JenkeScheen

I was planning on extending the API up to the point where I could readily simulate the steps as outlined by @jmichel80, which as far as I understand are mainly combinations of perturbations of the LJ, charge and bond terms and mixed to dummy/non-dummy atoms. I tried to name the steps intuitively but of course the terminology is up for discussion. @jmichel80 tell me if I'm misinterpreting your outline - constraining charge, bond, angle, dihedral, and improper respectively would be much easier to implement!

@jmichel80

charge and LJ perturbations applies to non dummy atoms as well (e.g. oxygen to carbon in example posted yesterday. The bonded terms perturbation includes bond, angle, dihedral and improper terms. Basically as sanity check you should get identical single point energies for the same MORPH input coordinates using a pert file for step 6 lambda 1.0 of the stepwise pahtway or a pert file for lambda 1.0 in the direct pathway (and step 1 lambda 0.0 should give the same SPE than lambda 0.0).

lohedges commented 3 years ago

Yes, it appears a bit more complex and less general than I had envisaged. This means that the way in which the pert file is written would be different depending on what perturbation schedule is chosen, it's not as simple as doing something like:

if "bond" in perturbation_terms:
    # Perturb the bonds.
else:
    # Same bond terms in both end states.

With this in mind it is probably better to have a perturbation (or similar) parameter (rather than terms) which writes a custom pert file for that type of perturbation, e.g. perturbation = discharge, or perturbation =bonded_terms`, etc. Sorry for the confusion.

I'll have a think about the easiest way to implement this.

jmichel80 commented 3 years ago

Also the steps are conditional, so step2 should be perturbing LJ parameters of interacting atoms becoming atoms, with the initial and final charges of those atoms set to those of the final molecule (0 in this case). And so on..

JenkeScheen commented 3 years ago

we could consider passing a multistep_pert=True and then writing out the different step folders (i.e. each containing their lambdas) with respective pert files? Potentially together with the multistep_pert we can pass a list multistep_lambdas that contains the number of lambdas per step (i.e. len(multistep_lambdas)==6). This way at least the first version of this feature would be easier to implement because it's all contained in the same sequence.

This would allow me to do some quick benchmarking wrt checking multistep vs native, and then at a later stage we can add more user control.

lohedges commented 3 years ago

Hmmm, it's much more fiddly that I thought. I was hoping that if it might be possible to implement something by having two lists representing the terms that are used in the two end states. They could have entries that refer to whether the term takes the value of either the 0 or 1 state, or if it's zeroed, e.g. "charges to 0" could be implemented with something like:

# A dictionary of terms for the lambda = 0 state.
terms0 = { "charge"    : "charge0",
           "LJ"        : "LJ0",
           "bond"      : "bond0",
           "angle"     : "angle0",
           "dihedral"  : "dihedral0",
           "improper"  : "improper0",
}

# A dictionary of terms for the lambda = 1 state.
# We use the same terms as lambda = 0, except the charge which is zeroed.
terms1 = { "charge"    : 0,
           "LJ"        : "LJ0",
           "bond"      : "bond0",
           "angle"     : "angle0",
           "dihedral"  : "dihedral0",
           "improper"  : "improper0",
}

protocol = BSS.Protocol.FreeEnergy(terms0=terms0, terms1=terms1)

However, what we want to happen is that we zero the charge only for the atoms that would become dummies in the lambda = 1 state, leaving those that don't the same.

I don't think it's possible to express the perturbations in a general and flexible way, so something custom would need to be done for each stage.

I'll keep thinking.

mark-mackey-cresset commented 3 years ago

Would one way to think of this be to have separate lambda values for the different terms i.e. have a "lambda_decharge", "lambda_decouplevdw", "lambda_bonded", "lambda_recharge" etc. You can then set a mapping from the global lambda to each of these, so that for example in a pure decharge calculation lambda_decharge would map 1:1 to the global lambda but the other lambdas would stay at 1.0. The current protocol just equates to all of the individual lambdas mapping 1:1 to the global value.

That would then allow for more individualised protocols, where we could for instance decide that LJ0-to-dummy and dummy-to-LJ0 could be carried out simultaneously in one step if it turns out that's more computationally efficient.

lohedges commented 3 years ago

GROMACS does things in a similar way, see here for details. To implement something like this would require updates to SOMD itself (assuming this can be done with OpenMM). It's also a bit different in our case since, e.g. lambda_decharge doesn't couple directly to the global lambda as it applies to a different set of atoms to those for which the charge term is modified during the normal perturbation.

jmichel80 commented 3 years ago

This could be done in OpenMM but would require a significant overhaul of SOMD. It's easier to work around limitations of the current implementation by building custom pert files.

JenkeScheen commented 3 years ago

However, what we want to happen is that we zero the charge only for the atoms that would become dummies in the lambda = 1 state, leaving those that don't the same.

I don't think it's possible to express the perturbations in a general and flexible way, so something custom would need to be done for each stage.

concerning this idea, would it not be efficient to do this per-atom in_toPertFile() with some sort of conditional that checks whether the atom is/will become dummy? Would using == _SireMol.Element("X") work for this purpose?

lohedges commented 3 years ago

Yes, we can check if atoms in the end states are dummies by using the element0 and element1 properties, e.g.:

# Is atom a dummy at lambda = 0?
if atom.property("element0") == _SireMol.Element("X"):
    ...

My concern is that the logic involved in setting up the pert files for the various stages would lead to a conditional block nightmare in, what is, and already complex and difficult to read function. Perhaps it would make sense to have a separate _toPertFile for each type of perturbation (using sections from the current one as templates). This would make it easier to edit/debug and wouldn't break the existing functionality. You'd then call the required function using the perturbation (or perturbation_type) argument that was passed to the protocol. We could have a main _toPertFile function that takes the normal arguments along with the additional perturbation parameter. This then calls the required sub-function, e.g. _toPertFileDischarge. Something like perturbation=full or perturbation=all or perturbation=standard would be the default option in the protocol and would trigger the current implementation. This approach would also make it easy for someone to add a new perturbation protocol, i.e. they just add a new keyword and a new _toPertFile function that writes the required pert file for that perturbation.

(Note that there could be additional parameters passed to various functions that aren't exposed at the top-level _toPertFile, e.g. the zero_dummy_dihedrals that is currently used in what would become _toPertFileAll. To access these non-standard options the user would need to call the functions directly.)

JenkeScheen commented 3 years ago

That sounds reasonable. I would imagine we won't have to copy-paste the whole _toPertFile six times but rather keep that as the master function and have it call sub-functions at key points based on perturbation_type? Sub-function calls can start after the atom checks, for instance. An added benefit would be that the main function gets cleaned up a bit.

JenkeScheen commented 3 years ago

I added downstream passing of pert_type which can be set as either "standard"(default) or "multistep"; feature branch should now be ready for extending toPertFile() to handle the multistep approach

lohedges commented 3 years ago

As you say, there will be some redundancy in the functionality between the types of perturbation so it would be good to encapsulate common routines. Sometimes there is a bit of a trade-off in terms of making things easy for the user to edit, which is often easiest if everything is in a single function. i.e. they could edit or monkey-patch an entire function, rather than having to edit things in additional sub-routines too.

With regards to the standard and multistep approach: does this now mean that the stages outlined at the start of this thread would all be generated in one go if you passed pert_type=multistep to the protcol? If so, we wouldn't have the flexibility of generating/running them independently, or easily adding new ones. It would also mean more modifications would be required to things like BioSImSpace.FreeEnergy.Solvation.

Also a BioSimSpace.Process object refers to a single simulation, e.g. a particular lambda window of a specific leg of the calculation. If the mutlistep option in the protocol means that we actually need to run six simulations, then this doesn't make sense in the current framework. This would be better handled by passing something like multistep as an option to BioSimSpace.FreeEnergy.Solvation, which then generates protocols and processes for each of the stages described at the start, e.g. perturbation_type=discharge, etc.

(I'm away for a few days so probably won't be able to comment further until Monday.)

JenkeScheen commented 3 years ago

I think I'll add the core functionality of either running standard or multistep in the next few days and then add per-step customisability once the workflow is in place (let's discuss the details of that when you're back from your leave)

Also a BioSimSpace.Process object refers to a single simulation, e.g. a particular lambda window of a specific leg of the calculation. If the mutlistep option in the protocol means that we actually need to run six simulations, then this doesn't make sense in the current framework. This would be better handled by passing something like multistep as an option to BioSimSpace.FreeEnergy.Solvation, which then generates protocols and processes for each of the stages described at the start, e.g. perturbation_type=discharge, etc.

Thanks, I'll adjust this. I was under the impression that the single simulation would be copied across all lambdas/legs.

I changed the code again to now handle user input in the way @lohedges suggested, i.e. passing a multistep bool to BioSimSpace.FreeEnergy.Solvation that from there either takes the standard approach (if False) or the multistep (if True). This means that Protocol input is the same for both cases. A simple way to use this syntax would be to write:

protocol = BSS.Protocol.FreeEnergy(runtime=4*BSS.Units.Time.nanosecond, num_lam=5)
freenrg = BSS.FreeEnergy.Solvation(solvated, protocol, work_dir="runs/proto_1", multistep=True)

, where the number of lambda windows per step is derived from Protocol (i.e. 5 in this case). Additionally, I added an optional steps dictionary where the user can specify how many lambda windows should be allocated to each step, such that input could e.g. be

steps = {term1 : 5,
term2 : 10,
term3 : 5,
term4 : 7,
term5 : 3,
term6 : 5}

protocol = BSS.Protocol.FreeEnergy(runtime=4*BSS.Units.Time.nanosecond, num_lam=5)
freenrg = BSS.FreeEnergy.Solvation(solvated, protocol, work_dir="runs/proto_1", multistep=True, steps=steps)

With the next bit of code I might need some help - I'm not clear on how I can pass a step type to _molecule so that I can start calling sub-functions of _molecule._toPertFile(). I suspect the route is through self._initialise_runner() and then in the bit where lambdas are being looped over but I fail to see how this connects to _molecule. If someone could point out these steps that would be very helpful!

JenkeScheen commented 3 years ago

thinking about this - when during benchmarking we decide we want to merge some steps it should be straightforward to make a step merge with the previous step if lambda in the step dictionary is set to 0, i.e. if term2 : 0, merge it with term1 etc, except for term1 where it would merge with term2.

lohedges commented 3 years ago

With the next bit of code I might need some help - I'm not clear on how I can pass a step type to _molecule so that I can start calling sub-functions of _molecule._toPertFile(). I suspect the route is through self._initialise_runner() and then in the bit where lambdas are being looped over but I fail to see how this connects to _molecule. If someone could point out these steps that would be very helpful!

The multistep option that is passed to the high-level FreeEnergy object needs to set up a bunch of individual simulation protocols for the low-level Processes. I think you need to add an option to the protocol that sets the type of perturbation for that simulation, e.g. perturbation_type. This can then be updated in the initialise_runner code, like how the lambda value is changed, e.g. protocol.setPerturbationType.

That said, this is an aspect of the current API that isn't too clean, i.e. you pass a protocol when you create a FreeEnergy object, which is then modified behind the scene. It might be cleaner to instead pass in the options which are passed through to the protocols for the individual legs, e.g. just pass in the vector of lambda values, etc. I did it the other way for consistency with the MD drivers, but it isn't making as much sense now that we want to support a wider range of options.

I'll have a think about how to best clean things up. We'll have to be careful since we don't want to break the existing functionality. Perhaps we can use a protocol if passed, or use the individual options to create a protocol if not.

JenkeScheen commented 3 years ago

Hi all,

I've taken a first pass at writing a _toPertFile() function which calls the sub-function _writePertAtoms(), with a pert_type parameter which takes "discharge_soft", "vanish_soft", "change_bonds" (not yet implemented), "change_hard", "grow_dummies" and "charge_dummies" as input, handling LJ/charge changes accordingly - these (preliminary) names correspond to the six chronological steps outlined above. @jmichel80 @lohedges if you could spare the time, could you have a look at the code excerpt below and let me know if I've written out the steps correctly? Even though the function writes a .pert file just fine for all pert_types I expect to have made some mistakes at places (mostly the last three steps ( "change_hard", "grow_dummies" and "charge_dummies")) because I have been staring at this for a while now. Here's the relevant code:

else:        
        # Given multistep protocol, assume print all atom records.
        for atom in sorted(mol.atoms(), key=lambda atom: atom_sorting_criteria(atom)):
            print(atom.property("element0"), "--->", atom.property("element1"))
            # Start atom record.
            file.write("    atom\n")

            # Get the initial/final Lennard-Jones properties.
            LJ0 = atom.property("LJ0");
            LJ1 = atom.property("LJ1");

            #######################################################################
            # set LJ/charge based on requested perturbed term.
            if pert_type == "discharge_soft":
                if atom.property("element0") == atom.property("element1"):
                    # In this step, only remove charges from non-perturbed atoms.
                    LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = atom.property("charge0").value()
                    charge1_value = 0.0
                else:
                    # If atom is perturbed, prevent other terms from changing.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge0").value()

            #######################################################################
            elif pert_type == "vanish_soft":
                if atom.property("element0") == atom.property("element1"):
                    print("VANISH")
                    # In this step, only vanish non-perturbed atoms.
                    # Atom should have been discharged in previous step.
                    LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    LJ1_value = (0.0, 0.0)
                    charge0_value = 0.0
                    charge1_value = 0.0
                else:
                    # If atom is perturbed, prevent other terms from changing.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge0").value()

            #######################################################################
            elif pert_type == "change_bonds":
                # this is handled in _writePertBonds()
                raise ValueError("'change_bonds' as pert_type can only be set for _writePertBonds()")

            #######################################################################
            elif pert_type == "change_hard":
                if atom.property("element0") == atom.property("element1"):
                    # In this step, non-perturbed atoms are charges and LJ are perturbed.
                    LJ0_value = (0.0, 0.0)
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = 0.0
                    charge1_value = atom.property("charge1").value()
                elif (
                    atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # If atom is to/from dummy, prevent changes.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge0").value()
                else:
                    # grow + charge perturbed non-dummy atoms.
                    LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = atom.property("charge0").value()
                    charge1_value = atom.property("charge1").value()

            #######################################################################
            elif pert_type == "grow_dummies":
                # In this step grow/shrink atoms that have dummy types.
                if atom.property("element0") == atom.property("element1"):
                    # In this step, non-perturbed atoms are already at lambda1 state.
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge1").value()
                elif (
                     atom.property("element0") == _SireMol.Element("X") or 
                     atom.property("element1") == _SireMol.Element("X")):
                    # If atom is to/from dummy, make changes.
                    # Grow the atom.
                    LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()

                    # Keep charge constant (handled in 'charge_dummies').
                    charge0_value = charge1_value = atom.property("charge0").value()
                else:
                    # Perturbed non-dummy atoms have been changed in previous step.
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge1").value()

            #######################################################################
            elif pert_type == "charge_dummies":
                # In this step recharge atoms that have dummy types.
                if atom.property("element0") == atom.property("element1"):
                    # In this step, non-perturbed atoms are already at lambda1 state.
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge1").value()
                elif (
                     atom.property("element0") == _SireMol.Element("X") or 
                     atom.property("element1") == _SireMol.Element("X")):
                    # If atom is to/from dummy, make changes.
                    # LJ has been handled in previous step ('grow_dummies').
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()

                    # Perturb charge.
                    charge0_value = atom.property("charge0").value()
                    charge1_value = atom.property("charge1").value()
                else:
                    # Perturbed non-dummy atoms have been changed in previous step.
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge1").value()

            # Write atom data.
            file.write("        name           %s\n" % atom.name().value())
            file.write("        initial_type   %s\n" % atom.property("ambertype0"))
            file.write("        final_type     %s\n" % atom.property("ambertype1"))
            file.write("        initial_LJ     %.5f %.5f\n" % (LJ0_value))
            file.write("        final_LJ       %.5f %.5f\n" % (LJ1_value))
            file.write("        initial_charge %.5f\n" % charge0_value)
            file.write("        final_charge   %.5f\n" % charge1_value)

            # End atom record.
            file.write("    endatom\n")

I'll work on the bonds side of things when I'm happy with the atoms part because that step seems more straightforward.

jmichel80 commented 3 years ago

This doesn't give the desired behaviour as far as I can see. Since the steps are conditional an implementation in loose pseudocode could look like:

Initialise assuming every final parameter is identical to the initial parameter

@@@ discharge-soft set charge1 = 0.0 only if atom.property(“ambertype1") == du

@@@ vanish-soft Starting from the results of having applied discharge_soft

do for each atom… if atom.property(“ambertype1”) = du Set LJ1_value = (0.0 0.0) and charge0 = charge1 = 0.0 set type0 = type1 = du

@@@ change_hard Starting from the results of having applied vanish_soft.. Interpolate state0/state1 charge and LJ parameters only for atoms that do not have a ‘du’ type in initial or end state if atom.property(“ambertype1”) = du Set LJ0_value = LJ1_value = (0.0 0.0)

@@@ change bonds

@@@ grow dummies Starting from the results of having applied change_hard For every hard atoms Set charge=charge1 ; LJ0=LJ1

If atom.property(“ambertype0”) == du: set LJ0_value = (0.0, 0.0) L1_value = target value

@@@ charge_dummies Starting from the results of having applied grow_dummies if atom.property(“ambertype0”) == du: Set LJ0_value = LJ1_value Charge0 = 0.0 charge 1 = target_value

JenkeScheen commented 3 years ago

Thanks @jmichel80! Very helpful. The block now looks like this:

 else:        
        # Given multistep protocol, assume print all atom records.
        for atom in sorted(mol.atoms(), key=lambda atom: atom_sorting_criteria(atom)):
            print(atom.property("element0"), "--->", atom.property("element1"))
            # Start atom record.
            file.write("    atom\n")

            # Get the initial/final Lennard-Jones properties.
            LJ0 = atom.property("LJ0");
            LJ1 = atom.property("LJ1");

            # set LJ/charge based on requested perturbed term.
            #######################################################################
            if pert_type == "discharge_soft":
                if (atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # In this step, only remove charges from soft-core perturbations.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()

                    charge0_value = atom.property("charge0").value()
                    charge1_value = 0.0
                else:
                    # If only hard atoms in perturbation, hold parameters.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge0").value()

            #######################################################################
            elif pert_type == "vanish_soft":
                if (atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # In this step, only remove LJ from soft-core perturbations.
                    LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    LJ1_value = (0.0, 0.0)

                    # soft discharge was previous step, so assume 0.0.
                    charge0_value = charge1_value = 0.0
                else:
                    # If only hard atoms in perturbation, hold parameters.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge0").value()

            #######################################################################
            elif pert_type == "change_bonds":
                # this is handled in _writePertBonds(); states should be set to final
                # states of previous step.
                if (atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # In previous steps, soft-core transformations were discharged and vanished.
                    LJ0_value = LJ1_value = (0.0, 0.0)
                    charge0_value = charge1_value = 0.0
                else:
                    # If only hard atoms in perturbation, hold parameters as in previous steps.
                    LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge0").value()
            #######################################################################
            elif pert_type == "change_hard":
                if (atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # In this step, soft-core perturbations are untouched.
                    LJ0_value = LJ1_value = (0.0, 0.0)
                    charge0_value = charge1_value = 0.0
                else:
                    # If only hard atoms in perturbation, change all parameters.
                    LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = atom.property("charge0").value()
                    charge1_value = atom.property("charge1").value()

            #######################################################################
            elif pert_type == "grow_soft":
                if (atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # In this step, soft-core perturbations are grown from 0.
                    LJ0_value = (0.0, 0.0)
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = 0.0
                    charge1_value = 0.0
                else:
                    # If only hard atoms in perturbation, parameters are already changed.
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge1").value()

            #######################################################################
            elif pert_type == "charge_soft":
                if (atom.property("element0") == _SireMol.Element("X") or 
                    atom.property("element1") == _SireMol.Element("X")):
                    # In this step, soft-core perturbations are charged from 0.
                    LJ0_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = 0.0
                    charge1_value = atom.property("charge1").value()
                else:
                    # If only hard atoms in perturbation, parameters are already changed.
                    LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
                    charge0_value = charge1_value = atom.property("charge1").value()

            # Write atom data.
            file.write("        name           %s\n" % atom.name().value())
            file.write("        initial_type   %s\n" % atom.property("ambertype0"))
            file.write("        final_type     %s\n" % atom.property("ambertype1"))
            file.write("        initial_LJ     %.5f %.5f\n" % (LJ0_value))
            file.write("        final_LJ       %.5f %.5f\n" % (LJ1_value))
            file.write("        initial_charge %.5f\n" % charge0_value)
            file.write("        final_charge   %.5f\n" % charge1_value)

            # End atom record.
            file.write("    endatom\n")

I'll get working on the bonds side of things now - I'll have to find a way to hold changing bond parameters depending on the 6pert_types as well.

JenkeScheen commented 3 years ago

Quick update - I now have a working _toPertFile.py that will write pert files for the six different steps. Additionally, prm7+rst7 files are written out to match the atom names in the pert files. A single run unfortunately does not show good agreement between multistep (6x3 windows) and onestep (both 18w and 3w). Red bars show the ddG values for multistep, onestep (18w) and onestep (3w): image

I'll run some replicates over the weekend to confirm there is a discrepancy and then try to work out what's going wrong, presumably one/multiple steps are written incorrectly.

lohedges commented 3 years ago

Hi @JenkeScheen. Just a quick sanity check... SInce you've made edits to the existing toPertFile, rather than having new ones for the various steps, have you checked that your "onestep" protocol gives the same pert file as generated by the original code? It would be good to verify that this is the case before trying to debug the other stages.

Cheers.

jmichel80 commented 3 years ago

Another sanity check. Can you check that the single point energy (loading the same coordinates, before doing any energy minimisation or MD integration but after applying perturbation templates) at lambda = 1.0 of step i is equal to that of lambda = 0.0 of step i+1 ? If not dig in the Sire Energy components of system.energy() to find where the discrepancy come from .

JenkeScheen commented 3 years ago

Hi all, thanks for the suggestions - both checks seem to fail. I see some LJ/coulomb differences between "standard" and vanilla, and adjacent single point energies don't match either. I'll spend some time trying to resolve this.

JenkeScheen commented 3 years ago

wrt @lohedges sanity-check, diff between .pert files shows different placing of dummy atoms which initially made me think that LJ/coulomb were written wrong. After closer inspection it appears that the dummy placement has random variation causing the entries to be at different lines:

c3 --> oh
c3 --> c3
hc --> ho
hc --> du
hc --> h1
hc --> h1
hc --> h1
hc --> du
c3 --> oh
c3 --> c3
hc --> ho
hc --> h1
hc --> h1
hc --> du
hc --> h1
hc --> du
c3 --> oh
c3 --> c3
hc --> ho
hc --> h1
hc --> h1
hc --> h1
hc --> du
hc --> du

Of course only some of these iterations will correspond with the pert files written by vanilla BSS (which also shuffles in the same way). I'm not sure exactly where the shuffling occurs (perhaps some dictionary ordering?) but I'm presuming this shouldn't be an issue in which case this sanity check passes. Now for SPEs..

lohedges commented 3 years ago

Hmm, I'll try to check this some time this afternoon. Paolo added functionality to sort the atoms (and all other terms) before writing to ensure that they were always written in the same order. This was really important for debugging since allowed a direct comparison between the files. The code uses a helper function atom_sorting_criteria which makes sure that they are sorted before writing. Are you still using this?

You can also write all atoms by passing print_all_atoms. It's possible the sorting is broken when only printing the perturbed atoms. If so, let me know and I'll try to figure out what's going wrong.

JenkeScheen commented 3 years ago

yes, that function seems to be inplace and called with for atom in sorted(mol.atoms(), key=lambda atom: atom_sorting_criteria(atom)). I did check with print_all_atoms but the mismatch still seems to occur in the same way.

lohedges commented 3 years ago

Thanks for confirming, I'll try to take a look after my meeting. I'm surprised they're random since it was definitely tested quite thoroughly. Hopefully it's just something simple.

JenkeScheen commented 3 years ago

thanks @lohedges, a simple way to reproduce my observation would be to run the standard ethane->methanol setup twice and checking between the two .pert files in whichever lambda.

lohedges commented 3 years ago

Ah, the issue is that when setting up the perturbation SOMD uses atom names, rather than indices, so they must be unique. I have code that generates new names when there are matches (these are random names that match the AMBER naming restrictions) and these are different between runs. I previously tried just counting up in number and appending, etc, but this was difficult for very large molecules since you quickly exhaust the allowed possibilities. I'll try to cook up something later.

(Another reason why it would be great to use atom number rather than name as the SOMD reference, although I do understand the benefits of the name approach too.)

lohedges commented 3 years ago

I can just seed the random number generator! Will push a fix shortly.

lohedges commented 3 years ago

Okay, that works. We now have reproducible atom names.

JenkeScheen commented 3 years ago

After working out some more kinds I've come to the point where only one of the adjacent SPEs doesn't match:

1, discharge_soft:
     Lambda:       Ethane->methanol                  methanol ->ethane
    - 0.0:        214292                               213980
    - 1.0:        144965                               213980
2, vanish_soft:         x
    - 0.0:        214294                               213980
    - 1.0:        144964                               213980
3, change_bonds:
    - 0.0:        144964                               213980
    - 1.0:        144977                                213982
4, change_hard:
    - 0.0:        144977                                213982
    - 1.0:        76691.2                               406320
5, grow_soft:
    - 0.0:        76691.2                               406320
    - 1.0:        76691.2                               470340
6, charge_soft:                                               x
    - 0.0:        76691.2                               406321
    - 1.0:        76691.2                               470339

The mismatch is marked by 'x'; it's quite puzzling to see the energies change between these steps as the .pert files seem to be correct, i.e. all terms are unchanged except for the two atoms with dummy endpoints, which both look like:

    1, discharge_soft:                     2, vanish_soft:
    atom                                   atom
        name           HG8F                name           HG8F
        initial_type   hc                  initial_type   hc
        final_type     du                  final_type     du
        initial_LJ     2.60018 0.02080  => initial_LJ     2.60018 0.02080
        final_LJ       2.60018 0.02080     final_LJ       0.00000 0.00000
        initial_charge 0.03145             initial_charge 0.00000
        final_charge   0.00000             final_charge   0.00000
    endatom

I'll dig into the Sire.energy() components to see what's going on here. A perhaps related observation is that the SPEs are not reversible between the two directions of the perturbation.

jmichel80 commented 3 years ago

Hi @JenkeScheen

    1, discharge_soft:                     2, vanish_soft:
    atom                                   atom
        name           HG8F                name           HG8F
        initial_type   hc                  initial_type   hc
        final_type     hc                  final_type     du
        initial_LJ     2.60018 0.02080  => initial_LJ     2.60018 0.02080
        final_LJ       2.60018 0.02080     final_LJ       0.00000 0.00000
        initial_charge 0.03145             initial_charge 0.00000
        final_charge   0.00000             final_charge   0.00000
    endatom

(only change to atom type du when you turn off the LJ term)

-Check that for the direct perturbation ethane-->methanol you have agreement of SPE at lambda 0.0 with methanol->ethane at lambda 1.0 (again using exactly the same coordinates for both inputs, which may require some hacking of the rst7 file to be consistent with the respectice parm7 files).

JenkeScheen commented 3 years ago

Hi @jmichel80, thanks for the input and I'm happy to report full linkage by changing the atom types with LJ (and some other tweaks):

1, discharge_soft:
     Lambda:       Ethane->methanol                  methanol ->ethane
    - 0.0:        214292                               213980
    - 1.0:        214294                               213980
2, vanish_soft:        
    - 0.0:        214294                               213980
    - 1.0:        144964                               213980
3, change_bonds:
    - 0.0:        144964                               213980
    - 1.0:        144977                               213982
4, change_hard:
    - 0.0:        144977                               213982
    - 1.0:        76691.2                              406320
5, grow_soft:
    - 0.0:        76691.2                              406320
    - 1.0:        76691.2                              470340
6, charge_soft:                                               
    - 0.0:        76691.2                              470340
    - 1.0:        76691.2                              470339

I'm going to run these FEPs to see if this time we get ddG reproduction compared to the standard protocol (i.e. ~-2.7 kcal/mol). If not I'll dig into using different input geometries because in this particular case reversibility still isn't there.

JenkeScheen commented 3 years ago

The ddGs unfortunately still don't match with the standard protocol in either direction (2 reps, 3 lambdas per step, 4ns per window): image

I'll have to dig around some more (in geometries/ bond parameters) but in the meantime here are examples of atom entries in .pert files for hc->h1, hc->du and du->hc, respectively. Let me know if you spot any inconsistencies I'm missing:

hc -> h1
STANDARD:
        name           HFNO
        initial_type   hc
        final_type     h1
        initial_LJ     2.60018 0.02080
        final_LJ       2.42200 0.02080
        initial_charge 0.03145
        final_charge   0.02870
MULTISTEP:
atom          | 1: disch_soft || 2: vanish_soft|| 3:change_bonds||   4:change_hard ||   5:grow_soft ||6:charge_soft|
name           HFNO             HFNO             HFNO             HFNO               HFNO            HFNO
initial_type   hc               hc               hc               hc                 h1              h1
final_type     hc               hc               hc               h1                 h1              h1
initial_LJ     2.60018 0.02080  2.60018 0.02080  2.60018 0.02080  2.60018 0.02080    2.42200 0.02080 2.42200 0.02080
final_LJ       2.60018 0.02080  2.60018 0.02080  2.60018 0.02080  2.42200 0.02080    2.42200 0.02080 2.42200 0.02080
initial_charge 0.03145          0.03145          0.03145          0.03145            0.02870         0.02870
final_charge   0.03145          0.03145          0.03145          0.02870            0.02870         0.02870
endatom     

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hc -> du
STANDARD:
        name           HG8F
        initial_type   hc
        final_type     du
        initial_LJ     2.60018 0.02080
        final_LJ       0.00000 0.00000
        initial_charge 0.03145
        final_charge   -0.00000
MULTISTEP:
atom          | 1: disch_soft || 2: vanish_soft|| 3:change_bonds||   4:change_hard ||   5:grow_soft ||6:charge_soft|
name           HG8F             HG8F             HG8F             HG8F               HG8F            HG8F
initial_type   hc               hc               du               du                 du              du
final_type     hc               du               du               du                 du              du
initial_LJ     2.60018 0.02080  2.60018 0.02080  0.00000 0.00000  0.00000 0.00000    0.00000 0.00000 0.00000 0.00000
final_LJ       2.60018 0.02080  0.00000 0.00000  0.00000 0.00000  0.00000 0.00000    0.00000 0.00000 0.00000 0.00000
initial_charge 0.03145          -0.00000         -0.00000         -0.00000           -0.00000        -0.00000
final_charge   -0.00000         -0.00000         -0.00000         -0.00000           -0.00000        -0.00000
endatom 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
du -> hc
STANDARD:
        name           DU01
        initial_type   du
        final_type     hc
        initial_LJ     0.00000 0.00000
        final_LJ       2.60018 0.02080
        initial_charge -0.00000
        final_charge   0.03145
MULTISTEP:
atom          | 1: disch_soft || 2: vanish_soft|| 3:change_bonds||   4:change_hard ||   5:grow_soft ||6:charge_soft|
name           DU01             DU01              DU01            DU01               DU01             DU01
initial_type   du               du                du              du                 du               hc
final_type     du               du                du              du                 hc               hc
initial_LJ     0.00000 0.00000  0.00000 0.00000   0.00000 0.00000 0.00000 0.00000    0.00000 0.00000  2.60018 0.02080
final_LJ       0.00000 0.00000  0.00000 0.00000   0.00000 0.00000 0.00000 0.00000    2.60018 0.02080  2.60018 0.02080
initial_charge -0.00000         -0.00000          -0.00000        -0.00000           -0.00000         -0.00000
final_charge   -0.00000         -0.00000          -0.00000        -0.00000           -0.00000         0.03145

Perhaps merging 3_change_bonds and 4_change_hard would make sense? I suspect there might be an issue when bond parameters have adjusted but hard atoms have not.

jmichel80 commented 3 years ago

When you do step 3_change_bonds, do you also change angle, dihedral and improper parameters ?

JenkeScheen commented 3 years ago

yes, those are all changed in the same way, i.e. parameter1 = parameter0 until step 3, then parameter0 = parameter1 after step 3.

jmichel80 commented 3 years ago

Can you post a detailed breakdown of the intermediate free energies in vacuum for both direction?

JenkeScheen commented 3 years ago
Interestingly for vacuum the FEs do match between standard and multistep, and we observe reversibility as well: Step ethane->methanol methanol->ethane
step1 -0.611885 0
step2 -0.067687 0
step3 0.216402 -0.204926
step4 3.233744 -3.173355
step5 0 0.069519
step6 0 0.611799
Multistep 2.770574 -2.696963
Standard 2.753582 -2.791482

Curious how we lose both features when involving waters.