Closed jmichel80 closed 3 years ago
Thanks for this Julien. I should be able to get started on writing a script that does this.. Do you think we'd run into trouble with this protocol when trying to transform A to B with mixed growing/shrinking, i.e.: ?
Should be covered by the example I posted. In some schemes (where e.g. you only need to convert some atoms into dummy atoms) you may not need steps 5 and 6.
This should be simple enough to implement in the current BioSImSpace set up. We would just need to add a parameter the the protocol for a single free energy leg that lists the terms that are perturbed (or are excluded from the perturbation if that's simpler). That way you could do something like:
protocol = BSS.Protocol.FreeEnergy(terms="charge")
This would write a pert file for the lambda leg that only includes the specified terms.
That would be ideal Lester, would you think it best that in this case we would declare six protocols and set up a directory with each? i.e.
BSS.FreeEnergy.Solvation(system, protocol_step1, work_dir="step1/")
...
BSS.FreeEnergy.Solvation(system, protocol_step6, work_dir="step6/")
Potentially this could give the user to define different numbers of lambda windows per step.
Yes, I think that would be best and provide the most amount of flexibility. If certain combinations of stages turn out to be particularly useful, or could be combined, then we could also add keywords to capture those kinds of protocols too.
FYI I believe Gromacs would also be able to handle this with one set of input files using a lambda schedule that allows interpolation of selected components at one time, but haven't looked in details.
GROMACS is nice in that everything is contained within the topology file and explicitly states what terms are perturbed and the values in each end state. It would be easy enough to adapt the multi-stage approach to GROMACS too by passing a list of the perturbed terms to the GroTop parser using a property map e.g.:
from Sire.IO import GroTop
gro_top = GroTop(system, {perturbation_terms = ["charge"]})
The parser would then extract the terms from the property map and only change the value between the end states for the ones that the user wants to perturb.
Hi @lohedges , thanks for the comments ! @JenkeScheen it will be easier to implement this feature directly in BioSimSpace to generate the panel of pert files you wish. Could you look into extending the API to support generation of input files for this stepwise perturbation protocol. It's ok to only support SOMD for the time being. I would start with a general 6 step protocol and revisit later to allow merging steps. Please use this thread to post questions on how the implementation could be made, and then work on a branch from devel.
@JenkeScheen I've created a new branch feature_multi_step_perturbation. Please use this for your development work.
Thanks Lester, I'll use that one. The multistep_perts branch I made is now redundant in that case
Ah, sorry, hadn't spotted that one. Yes, I'll delete the multistep_perts
branch. I use feature_
as a prefix for all feature branches and fix_
for bug-fixes. Makes it easier to search and cross-reference things in larger projects.
Cheers.
Just copying across comments here for posterity:
@lohedges
Perhaps I've misunderstood, but I would have thought that the perturbation terms would have just been the list of regular terms that we support, e.g. LJ, charge, bond, angle, dihedral, and improper. If you wanted to perturb charge and LJ only during a simulation then you could pass perturbation_terms = ["charge", "LJ"]. (Note that charge and LJ perturbation only apply to dummy atoms, so you don't need to explicitly state DummyToCharges.) Perhaps DummyToCharges turns on charges for the dummy atoms at lambda = 0, while leaving the charges for atoms that would be dummies at lambda = 1 unperturbed.
@JenkeScheen
I was planning on extending the API up to the point where I could readily simulate the steps as outlined by @jmichel80, which as far as I understand are mainly combinations of perturbations of the LJ, charge and bond terms and mixed to dummy/non-dummy atoms. I tried to name the steps intuitively but of course the terminology is up for discussion. @jmichel80 tell me if I'm misinterpreting your outline - constraining charge, bond, angle, dihedral, and improper respectively would be much easier to implement!
@jmichel80
charge and LJ perturbations applies to non dummy atoms as well (e.g. oxygen to carbon in example posted yesterday. The bonded terms perturbation includes bond, angle, dihedral and improper terms. Basically as sanity check you should get identical single point energies for the same MORPH input coordinates using a pert file for step 6 lambda 1.0 of the stepwise pahtway or a pert file for lambda 1.0 in the direct pathway (and step 1 lambda 0.0 should give the same SPE than lambda 0.0).
Yes, it appears a bit more complex and less general than I had envisaged. This means that the way in which the pert file is written would be different depending on what perturbation schedule is chosen, it's not as simple as doing something like:
if "bond" in perturbation_terms:
# Perturb the bonds.
else:
# Same bond terms in both end states.
With this in mind it is probably better to have a perturbation
(or similar) parameter (rather than terms
) which writes a custom pert file for that type of perturbation, e.g. perturbation = discharge
, or perturbation =
bonded_terms`, etc. Sorry for the confusion.
I'll have a think about the easiest way to implement this.
Also the steps are conditional, so step2 should be perturbing LJ parameters of interacting atoms becoming atoms, with the initial and final charges of those atoms set to those of the final molecule (0 in this case). And so on..
we could consider passing a multistep_pert=True
and then writing out the different step folders (i.e. each containing their lambdas) with respective pert files? Potentially together with the multistep_pert
we can pass a list multistep_lambdas
that contains the number of lambdas per step (i.e. len(multistep_lambdas)==6
). This way at least the first version of this feature would be easier to implement because it's all contained in the same sequence.
This would allow me to do some quick benchmarking wrt checking multistep vs native, and then at a later stage we can add more user control.
Hmmm, it's much more fiddly that I thought. I was hoping that if it might be possible to implement something by having two lists representing the terms that are used in the two end states. They could have entries that refer to whether the term takes the value of either the 0 or 1 state, or if it's zeroed, e.g. "charges to 0" could be implemented with something like:
# A dictionary of terms for the lambda = 0 state.
terms0 = { "charge" : "charge0",
"LJ" : "LJ0",
"bond" : "bond0",
"angle" : "angle0",
"dihedral" : "dihedral0",
"improper" : "improper0",
}
# A dictionary of terms for the lambda = 1 state.
# We use the same terms as lambda = 0, except the charge which is zeroed.
terms1 = { "charge" : 0,
"LJ" : "LJ0",
"bond" : "bond0",
"angle" : "angle0",
"dihedral" : "dihedral0",
"improper" : "improper0",
}
protocol = BSS.Protocol.FreeEnergy(terms0=terms0, terms1=terms1)
However, what we want to happen is that we zero the charge only for the atoms that would become dummies in the lambda = 1 state, leaving those that don't the same.
I don't think it's possible to express the perturbations in a general and flexible way, so something custom would need to be done for each stage.
I'll keep thinking.
Would one way to think of this be to have separate lambda values for the different terms i.e. have a "lambda_decharge", "lambda_decouplevdw", "lambda_bonded", "lambda_recharge" etc. You can then set a mapping from the global lambda to each of these, so that for example in a pure decharge calculation lambda_decharge would map 1:1 to the global lambda but the other lambdas would stay at 1.0. The current protocol just equates to all of the individual lambdas mapping 1:1 to the global value.
That would then allow for more individualised protocols, where we could for instance decide that LJ0-to-dummy and dummy-to-LJ0 could be carried out simultaneously in one step if it turns out that's more computationally efficient.
GROMACS does things in a similar way, see here for details. To implement something like this would require updates to SOMD itself (assuming this can be done with OpenMM). It's also a bit different in our case since, e.g. lambda_decharge
doesn't couple directly to the global lambda
as it applies to a different set of atoms to those for which the charge term is modified during the normal perturbation.
This could be done in OpenMM but would require a significant overhaul of SOMD. It's easier to work around limitations of the current implementation by building custom pert files.
However, what we want to happen is that we zero the charge only for the atoms that would become dummies in the lambda = 1 state, leaving those that don't the same.
I don't think it's possible to express the perturbations in a general and flexible way, so something custom would need to be done for each stage.
concerning this idea, would it not be efficient to do this per-atom in_toPertFile()
with some sort of conditional that checks whether the atom is/will become dummy? Would using == _SireMol.Element("X")
work for this purpose?
Yes, we can check if atoms in the end states are dummies by using the element0
and element1
properties, e.g.:
# Is atom a dummy at lambda = 0?
if atom.property("element0") == _SireMol.Element("X"):
...
My concern is that the logic involved in setting up the pert files for the various stages would lead to a conditional block nightmare in, what is, and already complex and difficult to read function. Perhaps it would make sense to have a separate _toPertFile
for each type of perturbation (using sections from the current one as templates). This would make it easier to edit/debug and wouldn't break the existing functionality. You'd then call the required function using the perturbation
(or perturbation_type
) argument that was passed to the protocol. We could have a main _toPertFile
function that takes the normal arguments along with the additional perturbation parameter. This then calls the required sub-function, e.g. _toPertFileDischarge
. Something like perturbation=full
or perturbation=all
or perturbation=standard
would be the default option in the protocol and would trigger the current implementation. This approach would also make it easy for someone to add a new perturbation protocol, i.e. they just add a new keyword and a new _toPertFile
function that writes the required pert file for that perturbation.
(Note that there could be additional parameters passed to various functions that aren't exposed at the top-level _toPertFile
, e.g. the zero_dummy_dihedrals
that is currently used in what would become _toPertFileAll
. To access these non-standard options the user would need to call the functions directly.)
That sounds reasonable. I would imagine we won't have to copy-paste the whole _toPertFile
six times but rather keep that as the master function and have it call sub-functions at key points based on perturbation_type
?
Sub-function calls can start after the atom checks, for instance. An added benefit would be that the main function gets cleaned up a bit.
I added downstream passing of pert_type
which can be set as either "standard"
(default) or "multistep"
; feature branch should now be ready for extending toPertFile()
to handle the multistep approach
As you say, there will be some redundancy in the functionality between the types of perturbation so it would be good to encapsulate common routines. Sometimes there is a bit of a trade-off in terms of making things easy for the user to edit, which is often easiest if everything is in a single function. i.e. they could edit or monkey-patch an entire function, rather than having to edit things in additional sub-routines too.
With regards to the standard
and multistep
approach: does this now mean that the stages outlined at the start of this thread would all be generated in one go if you passed pert_type=multistep
to the protcol? If so, we wouldn't have the flexibility of generating/running them independently, or easily adding new ones. It would also mean more modifications would be required to things like BioSImSpace.FreeEnergy.Solvation
.
Also a BioSimSpace.Process
object refers to a single simulation, e.g. a particular lambda window of a specific leg of the calculation. If the mutlistep
option in the protocol means that we actually need to run six simulations, then this doesn't make sense in the current framework. This would be better handled by passing something like multistep
as an option to BioSimSpace.FreeEnergy.Solvation
, which then generates protocols and processes for each of the stages described at the start, e.g. perturbation_type=discharge
, etc.
(I'm away for a few days so probably won't be able to comment further until Monday.)
I think I'll add the core functionality of either running standard
or multistep
in the next few days and then add per-step customisability once the workflow is in place (let's discuss the details of that when you're back from your leave)
Also a
BioSimSpace.Process
object refers to a single simulation, e.g. a particular lambda window of a specific leg of the calculation. If themutlistep
option in the protocol means that we actually need to run six simulations, then this doesn't make sense in the current framework. This would be better handled by passing something likemultistep
as an option toBioSimSpace.FreeEnergy.Solvation
, which then generates protocols and processes for each of the stages described at the start, e.g.perturbation_type=discharge
, etc.
Thanks, I'll adjust this. I was under the impression that the single simulation would be copied across all lambdas/legs.
I changed the code again to now handle user input in the way @lohedges suggested, i.e. passing a multistep
bool to BioSimSpace.FreeEnergy.Solvation
that from there either takes the standard approach (if False
) or the multistep (if True
). This means that Protocol
input is the same for both cases. A simple way to use this syntax would be to write:
protocol = BSS.Protocol.FreeEnergy(runtime=4*BSS.Units.Time.nanosecond, num_lam=5)
freenrg = BSS.FreeEnergy.Solvation(solvated, protocol, work_dir="runs/proto_1", multistep=True)
, where the number of lambda windows per step is derived from Protocol
(i.e. 5 in this case). Additionally, I added an optional steps
dictionary where the user can specify how many lambda windows should be allocated to each step, such that input could e.g. be
steps = {term1 : 5,
term2 : 10,
term3 : 5,
term4 : 7,
term5 : 3,
term6 : 5}
protocol = BSS.Protocol.FreeEnergy(runtime=4*BSS.Units.Time.nanosecond, num_lam=5)
freenrg = BSS.FreeEnergy.Solvation(solvated, protocol, work_dir="runs/proto_1", multistep=True, steps=steps)
With the next bit of code I might need some help - I'm not clear on how I can pass a step type to _molecule
so that I can start calling sub-functions of _molecule._toPertFile()
. I suspect the route is through self._initialise_runner()
and then in the bit where lambdas are being looped over but I fail to see how this connects to _molecule
. If someone could point out these steps that would be very helpful!
thinking about this - when during benchmarking we decide we want to merge some steps it should be straightforward to make a step merge with the previous step if lambda in the step
dictionary is set to 0, i.e. if term2 : 0, merge it with term1
etc, except for term1 where it would merge with term2.
With the next bit of code I might need some help - I'm not clear on how I can pass a step type to _molecule so that I can start calling sub-functions of _molecule._toPertFile(). I suspect the route is through self._initialise_runner() and then in the bit where lambdas are being looped over but I fail to see how this connects to _molecule. If someone could point out these steps that would be very helpful!
The multistep
option that is passed to the high-level FreeEnergy
object needs to set up a bunch of individual simulation protocols for the low-level Processes
. I think you need to add an option to the protocol that sets the type of perturbation for that simulation, e.g. perturbation_type
. This can then be updated in the initialise_runner
code, like how the lambda value is changed, e.g. protocol.setPerturbationType
.
That said, this is an aspect of the current API that isn't too clean, i.e. you pass a protocol when you create a FreeEnergy
object, which is then modified behind the scene. It might be cleaner to instead pass in the options which are passed through to the protocols for the individual legs, e.g. just pass in the vector of lambda values, etc. I did it the other way for consistency with the MD drivers, but it isn't making as much sense now that we want to support a wider range of options.
I'll have a think about how to best clean things up. We'll have to be careful since we don't want to break the existing functionality. Perhaps we can use a protocol if passed, or use the individual options to create a protocol if not.
Hi all,
I've taken a first pass at writing a _toPertFile()
function which calls the sub-function _writePertAtoms()
, with a pert_type parameter which takes "discharge_soft"
, "vanish_soft"
, "change_bonds"
(not yet implemented), "change_hard"
, "grow_dummies"
and "charge_dummies"
as input, handling LJ/charge changes accordingly - these (preliminary) names correspond to the six chronological steps outlined above.
@jmichel80 @lohedges if you could spare the time, could you have a look at the code excerpt below and let me know if I've written out the steps correctly? Even though the function writes a .pert file
just fine for all pert_type
s I expect to have made some mistakes at places (mostly the last three steps ( "change_hard"
, "grow_dummies"
and "charge_dummies"
)) because I have been staring at this for a while now. Here's the relevant code:
else:
# Given multistep protocol, assume print all atom records.
for atom in sorted(mol.atoms(), key=lambda atom: atom_sorting_criteria(atom)):
print(atom.property("element0"), "--->", atom.property("element1"))
# Start atom record.
file.write(" atom\n")
# Get the initial/final Lennard-Jones properties.
LJ0 = atom.property("LJ0");
LJ1 = atom.property("LJ1");
#######################################################################
# set LJ/charge based on requested perturbed term.
if pert_type == "discharge_soft":
if atom.property("element0") == atom.property("element1"):
# In this step, only remove charges from non-perturbed atoms.
LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = atom.property("charge0").value()
charge1_value = 0.0
else:
# If atom is perturbed, prevent other terms from changing.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = charge1_value = atom.property("charge0").value()
#######################################################################
elif pert_type == "vanish_soft":
if atom.property("element0") == atom.property("element1"):
print("VANISH")
# In this step, only vanish non-perturbed atoms.
# Atom should have been discharged in previous step.
LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
LJ1_value = (0.0, 0.0)
charge0_value = 0.0
charge1_value = 0.0
else:
# If atom is perturbed, prevent other terms from changing.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = charge1_value = atom.property("charge0").value()
#######################################################################
elif pert_type == "change_bonds":
# this is handled in _writePertBonds()
raise ValueError("'change_bonds' as pert_type can only be set for _writePertBonds()")
#######################################################################
elif pert_type == "change_hard":
if atom.property("element0") == atom.property("element1"):
# In this step, non-perturbed atoms are charges and LJ are perturbed.
LJ0_value = (0.0, 0.0)
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = 0.0
charge1_value = atom.property("charge1").value()
elif (
atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# If atom is to/from dummy, prevent changes.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = charge1_value = atom.property("charge0").value()
else:
# grow + charge perturbed non-dummy atoms.
LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = atom.property("charge0").value()
charge1_value = atom.property("charge1").value()
#######################################################################
elif pert_type == "grow_dummies":
# In this step grow/shrink atoms that have dummy types.
if atom.property("element0") == atom.property("element1"):
# In this step, non-perturbed atoms are already at lambda1 state.
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = charge1_value = atom.property("charge1").value()
elif (
atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# If atom is to/from dummy, make changes.
# Grow the atom.
LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
# Keep charge constant (handled in 'charge_dummies').
charge0_value = charge1_value = atom.property("charge0").value()
else:
# Perturbed non-dummy atoms have been changed in previous step.
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = charge1_value = atom.property("charge1").value()
#######################################################################
elif pert_type == "charge_dummies":
# In this step recharge atoms that have dummy types.
if atom.property("element0") == atom.property("element1"):
# In this step, non-perturbed atoms are already at lambda1 state.
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = charge1_value = atom.property("charge1").value()
elif (
atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# If atom is to/from dummy, make changes.
# LJ has been handled in previous step ('grow_dummies').
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
# Perturb charge.
charge0_value = atom.property("charge0").value()
charge1_value = atom.property("charge1").value()
else:
# Perturbed non-dummy atoms have been changed in previous step.
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = charge1_value = atom.property("charge1").value()
# Write atom data.
file.write(" name %s\n" % atom.name().value())
file.write(" initial_type %s\n" % atom.property("ambertype0"))
file.write(" final_type %s\n" % atom.property("ambertype1"))
file.write(" initial_LJ %.5f %.5f\n" % (LJ0_value))
file.write(" final_LJ %.5f %.5f\n" % (LJ1_value))
file.write(" initial_charge %.5f\n" % charge0_value)
file.write(" final_charge %.5f\n" % charge1_value)
# End atom record.
file.write(" endatom\n")
I'll work on the bonds side of things when I'm happy with the atoms part because that step seems more straightforward.
This doesn't give the desired behaviour as far as I can see. Since the steps are conditional an implementation in loose pseudocode could look like:
Initialise assuming every final parameter is identical to the initial parameter
@@@ discharge-soft set charge1 = 0.0 only if atom.property(“ambertype1") == du
@@@ vanish-soft Starting from the results of having applied discharge_soft
do for each atom… if atom.property(“ambertype1”) = du Set LJ1_value = (0.0 0.0) and charge0 = charge1 = 0.0 set type0 = type1 = du
@@@ change_hard Starting from the results of having applied vanish_soft.. Interpolate state0/state1 charge and LJ parameters only for atoms that do not have a ‘du’ type in initial or end state if atom.property(“ambertype1”) = du Set LJ0_value = LJ1_value = (0.0 0.0)
@@@ change bonds
@@@ grow dummies Starting from the results of having applied change_hard For every hard atoms Set charge=charge1 ; LJ0=LJ1
If atom.property(“ambertype0”) == du: set LJ0_value = (0.0, 0.0) L1_value = target value
@@@ charge_dummies Starting from the results of having applied grow_dummies if atom.property(“ambertype0”) == du: Set LJ0_value = LJ1_value Charge0 = 0.0 charge 1 = target_value
Thanks @jmichel80! Very helpful. The block now looks like this:
else:
# Given multistep protocol, assume print all atom records.
for atom in sorted(mol.atoms(), key=lambda atom: atom_sorting_criteria(atom)):
print(atom.property("element0"), "--->", atom.property("element1"))
# Start atom record.
file.write(" atom\n")
# Get the initial/final Lennard-Jones properties.
LJ0 = atom.property("LJ0");
LJ1 = atom.property("LJ1");
# set LJ/charge based on requested perturbed term.
#######################################################################
if pert_type == "discharge_soft":
if (atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# In this step, only remove charges from soft-core perturbations.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = atom.property("charge0").value()
charge1_value = 0.0
else:
# If only hard atoms in perturbation, hold parameters.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = charge1_value = atom.property("charge0").value()
#######################################################################
elif pert_type == "vanish_soft":
if (atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# In this step, only remove LJ from soft-core perturbations.
LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
LJ1_value = (0.0, 0.0)
# soft discharge was previous step, so assume 0.0.
charge0_value = charge1_value = 0.0
else:
# If only hard atoms in perturbation, hold parameters.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = charge1_value = atom.property("charge0").value()
#######################################################################
elif pert_type == "change_bonds":
# this is handled in _writePertBonds(); states should be set to final
# states of previous step.
if (atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# In previous steps, soft-core transformations were discharged and vanished.
LJ0_value = LJ1_value = (0.0, 0.0)
charge0_value = charge1_value = 0.0
else:
# If only hard atoms in perturbation, hold parameters as in previous steps.
LJ0_value = LJ1_value = LJ0.sigma().value(), LJ0.epsilon().value()
charge0_value = charge1_value = atom.property("charge0").value()
#######################################################################
elif pert_type == "change_hard":
if (atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# In this step, soft-core perturbations are untouched.
LJ0_value = LJ1_value = (0.0, 0.0)
charge0_value = charge1_value = 0.0
else:
# If only hard atoms in perturbation, change all parameters.
LJ0_value = LJ0.sigma().value(), LJ0.epsilon().value()
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = atom.property("charge0").value()
charge1_value = atom.property("charge1").value()
#######################################################################
elif pert_type == "grow_soft":
if (atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# In this step, soft-core perturbations are grown from 0.
LJ0_value = (0.0, 0.0)
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = 0.0
charge1_value = 0.0
else:
# If only hard atoms in perturbation, parameters are already changed.
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = charge1_value = atom.property("charge1").value()
#######################################################################
elif pert_type == "charge_soft":
if (atom.property("element0") == _SireMol.Element("X") or
atom.property("element1") == _SireMol.Element("X")):
# In this step, soft-core perturbations are charged from 0.
LJ0_value = LJ1.sigma().value(), LJ1.epsilon().value()
LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = 0.0
charge1_value = atom.property("charge1").value()
else:
# If only hard atoms in perturbation, parameters are already changed.
LJ0_value = LJ1_value = LJ1.sigma().value(), LJ1.epsilon().value()
charge0_value = charge1_value = atom.property("charge1").value()
# Write atom data.
file.write(" name %s\n" % atom.name().value())
file.write(" initial_type %s\n" % atom.property("ambertype0"))
file.write(" final_type %s\n" % atom.property("ambertype1"))
file.write(" initial_LJ %.5f %.5f\n" % (LJ0_value))
file.write(" final_LJ %.5f %.5f\n" % (LJ1_value))
file.write(" initial_charge %.5f\n" % charge0_value)
file.write(" final_charge %.5f\n" % charge1_value)
# End atom record.
file.write(" endatom\n")
I'll get working on the bonds side of things now - I'll have to find a way to hold changing bond parameters depending on the 6pert_type
s as well.
Quick update - I now have a working _toPertFile.py that will write pert files for the six different steps. Additionally, prm7+rst7 files are written out to match the atom names in the pert files. A single run unfortunately does not show good agreement between multistep (6x3 windows) and onestep (both 18w and 3w). Red bars show the ddG values for multistep, onestep (18w) and onestep (3w):
I'll run some replicates over the weekend to confirm there is a discrepancy and then try to work out what's going wrong, presumably one/multiple steps are written incorrectly.
Hi @JenkeScheen. Just a quick sanity check... SInce you've made edits to the existing toPertFile
, rather than having new ones for the various steps, have you checked that your "onestep" protocol gives the same pert file as generated by the original code? It would be good to verify that this is the case before trying to debug the other stages.
Cheers.
Another sanity check. Can you check that the single point energy (loading the same coordinates, before doing any energy minimisation or MD integration but after applying perturbation templates) at lambda = 1.0 of step i is equal to that of lambda = 0.0 of step i+1 ? If not dig in the Sire Energy components of system.energy() to find where the discrepancy come from .
Hi all, thanks for the suggestions - both checks seem to fail. I see some LJ/coulomb differences between "standard"
and vanilla, and adjacent single point energies don't match either. I'll spend some time trying to resolve this.
wrt @lohedges sanity-check, diff between .pert files shows different placing of dummy atoms which initially made me think that LJ/coulomb were written wrong. After closer inspection it appears that the dummy placement has random variation causing the entries to be at different lines:
c3 --> oh
c3 --> c3
hc --> ho
hc --> du
hc --> h1
hc --> h1
hc --> h1
hc --> du
c3 --> oh
c3 --> c3
hc --> ho
hc --> h1
hc --> h1
hc --> du
hc --> h1
hc --> du
c3 --> oh
c3 --> c3
hc --> ho
hc --> h1
hc --> h1
hc --> h1
hc --> du
hc --> du
Of course only some of these iterations will correspond with the pert files written by vanilla BSS (which also shuffles in the same way). I'm not sure exactly where the shuffling occurs (perhaps some dictionary ordering?) but I'm presuming this shouldn't be an issue in which case this sanity check passes. Now for SPEs..
Hmm, I'll try to check this some time this afternoon. Paolo added functionality to sort the atoms (and all other terms) before writing to ensure that they were always written in the same order. This was really important for debugging since allowed a direct comparison between the files. The code uses a helper function atom_sorting_criteria
which makes sure that they are sorted before writing. Are you still using this?
You can also write all atoms by passing print_all_atoms
. It's possible the sorting is broken when only printing the perturbed atoms. If so, let me know and I'll try to figure out what's going wrong.
yes, that function seems to be inplace and called with for atom in sorted(mol.atoms(), key=lambda atom: atom_sorting_criteria(atom))
. I did check with print_all_atoms
but the mismatch still seems to occur in the same way.
Thanks for confirming, I'll try to take a look after my meeting. I'm surprised they're random since it was definitely tested quite thoroughly. Hopefully it's just something simple.
thanks @lohedges, a simple way to reproduce my observation would be to run the standard ethane->methanol setup twice and checking between the two .pert files in whichever lambda.
Ah, the issue is that when setting up the perturbation SOMD uses atom names, rather than indices, so they must be unique. I have code that generates new names when there are matches (these are random names that match the AMBER naming restrictions) and these are different between runs. I previously tried just counting up in number and appending, etc, but this was difficult for very large molecules since you quickly exhaust the allowed possibilities. I'll try to cook up something later.
(Another reason why it would be great to use atom number rather than name as the SOMD reference, although I do understand the benefits of the name approach too.)
I can just seed the random number generator! Will push a fix shortly.
Okay, that works. We now have reproducible atom names.
After working out some more kinds I've come to the point where only one of the adjacent SPEs doesn't match:
1, discharge_soft:
Lambda: Ethane->methanol methanol ->ethane
- 0.0: 214292 213980
- 1.0: 144965 213980
2, vanish_soft: x
- 0.0: 214294 213980
- 1.0: 144964 213980
3, change_bonds:
- 0.0: 144964 213980
- 1.0: 144977 213982
4, change_hard:
- 0.0: 144977 213982
- 1.0: 76691.2 406320
5, grow_soft:
- 0.0: 76691.2 406320
- 1.0: 76691.2 470340
6, charge_soft: x
- 0.0: 76691.2 406321
- 1.0: 76691.2 470339
The mismatch is marked by 'x'; it's quite puzzling to see the energies change between these steps as the .pert files seem to be correct, i.e. all terms are unchanged except for the two atoms with dummy endpoints, which both look like:
1, discharge_soft: 2, vanish_soft:
atom atom
name HG8F name HG8F
initial_type hc initial_type hc
final_type du final_type du
initial_LJ 2.60018 0.02080 => initial_LJ 2.60018 0.02080
final_LJ 2.60018 0.02080 final_LJ 0.00000 0.00000
initial_charge 0.03145 initial_charge 0.00000
final_charge 0.00000 final_charge 0.00000
endatom
I'll dig into the Sire.energy() components to see what's going on here. A perhaps related observation is that the SPEs are not reversible between the two directions of the perturbation.
Hi @JenkeScheen
the SPEs are very high, so the LJ repulsive term is high. Very small inconsistencies in geometry could give you a shift in energy. Try to work with an input geometry that gives you low energies (e.g. ethane with coordinates already energy minimised)
in the step 1 discharge soft the pert file should read
1, discharge_soft: 2, vanish_soft:
atom atom
name HG8F name HG8F
initial_type hc initial_type hc
final_type hc final_type du
initial_LJ 2.60018 0.02080 => initial_LJ 2.60018 0.02080
final_LJ 2.60018 0.02080 final_LJ 0.00000 0.00000
initial_charge 0.03145 initial_charge 0.00000
final_charge 0.00000 final_charge 0.00000
endatom
(only change to atom type du when you turn off the LJ term)
-Check that for the direct perturbation ethane-->methanol you have agreement of SPE at lambda 0.0 with methanol->ethane at lambda 1.0 (again using exactly the same coordinates for both inputs, which may require some hacking of the rst7 file to be consistent with the respectice parm7 files).
Hi @jmichel80, thanks for the input and I'm happy to report full linkage by changing the atom types with LJ (and some other tweaks):
1, discharge_soft:
Lambda: Ethane->methanol methanol ->ethane
- 0.0: 214292 213980
- 1.0: 214294 213980
2, vanish_soft:
- 0.0: 214294 213980
- 1.0: 144964 213980
3, change_bonds:
- 0.0: 144964 213980
- 1.0: 144977 213982
4, change_hard:
- 0.0: 144977 213982
- 1.0: 76691.2 406320
5, grow_soft:
- 0.0: 76691.2 406320
- 1.0: 76691.2 470340
6, charge_soft:
- 0.0: 76691.2 470340
- 1.0: 76691.2 470339
I'm going to run these FEPs to see if this time we get ddG reproduction compared to the standard protocol (i.e. ~-2.7 kcal/mol). If not I'll dig into using different input geometries because in this particular case reversibility still isn't there.
The ddGs unfortunately still don't match with the standard protocol in either direction (2 reps, 3 lambdas per step, 4ns per window):
I'll have to dig around some more (in geometries/ bond parameters) but in the meantime here are examples of atom entries in .pert files for hc->h1, hc->du and du->hc, respectively. Let me know if you spot any inconsistencies I'm missing:
hc -> h1
STANDARD:
name HFNO
initial_type hc
final_type h1
initial_LJ 2.60018 0.02080
final_LJ 2.42200 0.02080
initial_charge 0.03145
final_charge 0.02870
MULTISTEP:
atom | 1: disch_soft || 2: vanish_soft|| 3:change_bonds|| 4:change_hard || 5:grow_soft ||6:charge_soft|
name HFNO HFNO HFNO HFNO HFNO HFNO
initial_type hc hc hc hc h1 h1
final_type hc hc hc h1 h1 h1
initial_LJ 2.60018 0.02080 2.60018 0.02080 2.60018 0.02080 2.60018 0.02080 2.42200 0.02080 2.42200 0.02080
final_LJ 2.60018 0.02080 2.60018 0.02080 2.60018 0.02080 2.42200 0.02080 2.42200 0.02080 2.42200 0.02080
initial_charge 0.03145 0.03145 0.03145 0.03145 0.02870 0.02870
final_charge 0.03145 0.03145 0.03145 0.02870 0.02870 0.02870
endatom
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hc -> du
STANDARD:
name HG8F
initial_type hc
final_type du
initial_LJ 2.60018 0.02080
final_LJ 0.00000 0.00000
initial_charge 0.03145
final_charge -0.00000
MULTISTEP:
atom | 1: disch_soft || 2: vanish_soft|| 3:change_bonds|| 4:change_hard || 5:grow_soft ||6:charge_soft|
name HG8F HG8F HG8F HG8F HG8F HG8F
initial_type hc hc du du du du
final_type hc du du du du du
initial_LJ 2.60018 0.02080 2.60018 0.02080 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
final_LJ 2.60018 0.02080 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
initial_charge 0.03145 -0.00000 -0.00000 -0.00000 -0.00000 -0.00000
final_charge -0.00000 -0.00000 -0.00000 -0.00000 -0.00000 -0.00000
endatom
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
du -> hc
STANDARD:
name DU01
initial_type du
final_type hc
initial_LJ 0.00000 0.00000
final_LJ 2.60018 0.02080
initial_charge -0.00000
final_charge 0.03145
MULTISTEP:
atom | 1: disch_soft || 2: vanish_soft|| 3:change_bonds|| 4:change_hard || 5:grow_soft ||6:charge_soft|
name DU01 DU01 DU01 DU01 DU01 DU01
initial_type du du du du du hc
final_type du du du du hc hc
initial_LJ 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 2.60018 0.02080
final_LJ 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 2.60018 0.02080 2.60018 0.02080
initial_charge -0.00000 -0.00000 -0.00000 -0.00000 -0.00000 -0.00000
final_charge -0.00000 -0.00000 -0.00000 -0.00000 -0.00000 0.03145
Perhaps merging 3_change_bonds and 4_change_hard would make sense? I suspect there might be an issue when bond parameters have adjusted but hard atoms have not.
When you do step 3_change_bonds, do you also change angle, dihedral and improper parameters ?
yes, those are all changed in the same way, i.e. parameter1 = parameter0 until step 3, then parameter0 = parameter1 after step 3.
Can you post a detailed breakdown of the intermediate free energies in vacuum for both direction?
Interestingly for vacuum the FEs do match between standard and multistep, and we observe reversibility as well: Step | ethane->methanol | methanol->ethane |
---|---|---|
step1 | -0.611885 | 0 |
step2 | -0.067687 | 0 |
step3 | 0.216402 | -0.204926 |
step4 | 3.233744 | -3.173355 |
step5 | 0 | 0.069519 |
step6 | 0 | 0.611799 |
Multistep | 2.770574 | -2.696963 |
Standard | 2.753582 | -2.791482 |
Curious how we lose both features when involving waters.
Opening this thread here for discussions with @JenkeScheen . The current focus is SOMD and initial tests will be done using simple python scripts but it would be useful to eventually support this from within BioSimSpace to facilitate integration with existing workflows and to roll out functionality to other free energy codes.
In some cases it may be desirable to implement a multi-staged lambda schedule where only some terms of the potential energy function are coupled to a lambda parameter at a given time. For instance, a general scheme that could apply to a morph that contains both atoms that need to become dummy atoms ('to dummy'), or start as dummy atoms ('from dummy'), or atoms that remain present throughout ('hard atoms') could be implemented using 6 steps described below.
Each step should have its own pert file. The same morph topology should be applicable throughout. This leads to six simulations each with their own lambda schedule to be determined. Each simulation should be analysed separately with analyse_freenrg mbar. The sum of the six resulting free energy changes should be identical (to within uncertainties) to that obtained from the direct path sketched at the top of the figure (in practice this would only be true for well behaved systems).
Some of the steps could be lumped together for efficiency, e.g. bonded terms + hard atoms charges/LJ if testing reveal these converged rapidly.