michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
76 stars 19 forks source link

Inconsistency between SOMD input files produced by BSS and by FESetup #74

Closed jmichel80 closed 4 years ago

jmichel80 commented 5 years ago

In this example the molecule 17 is mapped to molecule 8. In line with expected behaviour for SOMD hydrogen atom H21 in 17to8 perturbation should have an atomic mass of 12 g/mol in the parm file since it is perturbed to a carbon atom.

image

This is indeed the case in the FEP input produced by FESetup, whereas the one produced by BSS (via https://github.com/michellab/BioSimSpace/tree/devel/nodes/playground/prepareFEP.ipynb) has not updated the atomic masses (I thought we had fixed this previously?).

The atomic mass is element number 21 in the arrays below (first number on the fifth line)

Excerpt from fesetup/tyk_lig17~tyk_lig8/solvated/solvated.parm7
%FLAG MASS
%FORMAT(5E16.8)
  1.20100000E+01  1.20100000E+01  1.20100000E+01  1.20100000E+01  1.20100000E+01
  1.20100000E+01  3.54500000E+01  1.20100000E+01  1.60000000E+01  1.40100000E+01
  1.20100000E+01  1.20100000E+01  1.20100000E+01  1.40100000E+01  1.20100000E+01
  1.20100000E+01  1.40100000E+01  1.20100000E+01  1.60000000E+01  3.54500000E+01
  1.20100000E+01  1.00800000E+00  1.00800000E+00  1.00800000E+00  1.00800000E+00

Excerpt from bss/tyk_lig17~tyk_lig8/solvated/tyk_lig17~tyk_lig8_free.prm7
%FLAG MASS
%FORMAT(5E16.8)
   1.20100000E+1   1.20100000E+1   1.20100000E+1   1.20100000E+1   1.20100000E+1
   1.20100000E+1   3.54500000E+1   1.20100000E+1   1.60000000E+1   1.40100000E+1
   1.20100000E+1   1.20100000E+1   1.20100000E+1   1.40100000E+1   1.20100000E+1
   1.20100000E+1   1.40100000E+1   1.20100000E+1   1.60000000E+1   3.54500000E+1
   1.00800000E+0   1.00800000E+0   1.00800000E+0   1.00800000E+0   1.00800000E+0

I have attached a zip file with the FESEtup and BSS topologies, and a folder that contains the topologies of the two ligands before perturbation.

17-8.zip

lohedges commented 5 years ago

Hi Julien,

What I was told to do for SOMD was to write a pert file that describes the perturbation, then create parm7 and rst7 files corresponding to the lambda=0 state of the perturbation. For atoms that change type (rather than appearing or disappearing) the mass in the lambda=0 state is that of the atom type at lamba=0, i.e. hydrogen in this case. What you seem to be saying is that the mass should actually be that of what the atom perturbs to at lambda=1, i.e a carbon atom.

If this is the case, then the SOMD logic is incorrect for atoms that change state. (It would be nice the pert file and corresponding parm/rst were documented somewhere). Let me know what needs changing and I'll see if I can figure out a quick way to update the logic.

(Possibly this is why the D3R2018 results were junk!)

jmichel80 commented 5 years ago

Hi Lester, sorry there was miscommunication, we would have made your life easier by writing up specs for you. The 'rule' we came up with is that perturbed atoms have the mass of the heaviest end-state. So if H --> C use 12 g/mol throughout. If O --> N use 16 g/mol throughout etc... For dummy atoms we just use the mass of the non dummy end-state.

lohedges commented 5 years ago

Thanks for the clarification. The change that's required will be simple to implement. Just wanted to check a few things beforehand:

This is what my merging code has always done, so no problems here.

How is this interpreted by SOMD for the purposes of setting up the simulation with OpenMM? Wouldn't it need to know about the mass of both end states, since the nitrogen in your example above is actually a nitrogen at lambda=1. Does SOMD infer the opposite end state mass from the atom types in the pert file? If so, can't you just use the lambda=0 mass and work the lambda=1 state out anyway. (Sorry if I'm missing some logic here.) For GROMACS, the masses of both end states are given in the topology file so there is no ambiguity.

jmichel80 commented 5 years ago

Apart from Lennard-Jones and charge, which are zeroed, dummy atoms inherit the properties of the non-dummy end state (including the mass).

By default bonds and angles involving dummy atoms should use the parameters of the non-dummy end-state. So if you have C1-C2-H3 --> C1-H2-D3 then the bond between H2-D3 has the parameters of the bond between C2-H3 the angle C1-H2-D3 has the parameters of the angle between C1-C2-H3

For dihedral and impropers we implement two setup options. In the first case the dihedral and impropers terms involving dummy atoms are zeroed in the dummy end-state. In FESetup this was done by setting off all barrier heigh, phase and periodicity parameters to zero, However I believe this behavior was not optimum. It should be sufficient to set the barrier height to zero and keep periodicity and phase identical.

In the second case the parameters do not vary with the perturbation and are set to those of the non-dummy end-state. The first case should be the default option but it should be possible to override default behaviour for dihedral, for impropers, and for both during the parameters creation step should the user wishes to.

So in the perturbation C1-C2-C3-H4 --> C1-C2-H3-D4 the dihedral parameters of C1-C2-H3-D4 have a barrier heigh of 0 kcal/mol, and the same phase(s) and periodicity(ies) as for C1-C2-C3-H4

Meaning the pert file should contain an entry that looks like this

    dihedral
            atom0   C1
            atom1   C2
            atom2   C3
            atom3   H4
            initial_form 0.33 3.0 0.0 2.5 2.0 3.14159 
            final_form 0.0 3.0 0.0   0.0 2.0 3.14259
    enddihedral

assuming the C1-C2-C3-H4 dihedral was parameterise with two cosine functions.

For SOMD, perturbed atoms take the mass of the heaviest end-state.

SInce the parm7 file only contains one atomic mass, the atomic masses have to be updated by BSS.Process.Somd() when the SOMD input files are created by comparing the atomic masses of the elements in the merged molecules. This means there is loss of information upon writing the perturbed topology.

lohedges commented 5 years ago

Is your scheme for the dihedrals/impropers specific to SOMD, or would something like this be appropriate for GROMACS too? I'm just asking because I would need to implement this in BioSimSpace.Align.merge. (With, as you suggest, an optional parameter added to the function.)

For the mass: I get that there is only a single mass in the parm7 file, I was just wondering why it was chosen to be the heaviest atom and how the other end state mass is inferred and set by SOMD when configuring stuff for OpenMM.

I've already implemented code to fix the mass issue (not yet pushed). I'll have a think about how best to implement the dihedral/improper changes, as we need to be careful if this isn't something that would be default for all packages that support free energy perturbation.

jmichel80 commented 5 years ago

Is your scheme for the dihedrals/impropers specific to SOMD, or would something like this be appropriate for GROMACS too? I'm just asking because I would need to implement this in BioSimSpace.Align.merge. (With, as you suggest, an optional parameter added to the function.)

It should be appropriate, though needs testing. For the default behaviour I would implement the behaviour that was used in the reproducibility study here https://github.com/halx/relative-solvation-inputs/tree/master/SIMS/GROMACS/relative_unified_protocol

(Where at first glance it appears the barriers heights are constants for dihedrals involving dummy atoms).

For the mass: I get that there is only a single mass in the parm7 file, I was just wondering why it was chosen to be the heaviest atom and how the other end state mass is inferred and set by SOMD when configuring stuff for OpenMM.

In https://pubs.acs.org/doi/10.1021/acs.jctc.8b00544 we found initially that our relative free energy calculations with SOMD were systematically off from the other codes because we were constraining bonds to their equilibrium value and ignoring contribution of changes in bond parameters to hydration free energies. So we realised we needed to allow bonds to flex during a FEP if the bond parameters are changing. One issue is that with a timestep of 2 fs vibrations of bonds involving one hydrogen are too fast to be integrated reliably. GROMACS deals with that by using 1 fs for relative free energy calculations by default. In our case we wanted to retain a 2 fs time step for efficiency. So we came up with the idea of setting the mass of the perturbed hydrogen to that of the heavy atom it is changing to. Thus even when we simulate the bond with the equilibrium length and force constant suitable for a X-H bond the hydrogen is sufficient heavy that it doesn't vibrate too quickly between two time steps in a manner that would break the MD integrator.

For passing this information from Sire to OpenMM this happens here https://github.com/michellab/Sire/blob/devel/corelib/src/libs/SireMove/openmmfrenergyst.cpp around line 1695-1730.

                        else if (flag_constraint == HBONDS and flag_noperturbedconstraints)
                        {
                          const SireMol::Atom atom0 = molecule.select(two.atom0());
                          //double m0 = atom0.property("mass").value();
                          double m0 = system_openmm->getParticleMass(idx0);
                          const SireMol::Atom atom1 = molecule.select(two.atom1());
                          //double m1 = atom1.property("mass").value();
                          double m1 = system_openmm->getParticleMass(idx1);
                          double deltar = abs(rend-rstart);
                          double deltak = abs(bend-bstart);
                          // only constraint if m0 < 1.1 g.mol-1 or m1 < 1.1 g.mol-1
                          // AND the initial and final parameters differ
                          if (Debug)
                          {
                              qDebug() << " m0 " << m0 << " m1 " << m1 << "\n";
                              qDebug() << " deltar " << deltar << " " << " deltak " << deltak;
                          }
                          /* bonds that do not change parameters are constrained*/
                          double pert_eq_distance = solute_bond_perturbation_params[3] * Alchemical_value + (1.0 - Alchemical_value) * solute_bond_perturbation_params[2];
                          if (deltar < SMALL and deltak < SMALL)
                          {
                              system_openmm->addConstraint(idx0, idx1, pert_eq_distance);
                              if (Debug)
                              {
                                  qDebug() << "perturbed bond but no parameter changes so constrained " << atom0.name().toString()
                                           << "-" << atom1.name().toString() << "\n";
                              }
                          }
                          /* bonds that change parameters and have one of the atoms with a mass < HMASS are constrained*/
                          else if (m0 < HMASS or m1 < HMASS)
                          {
                              system_openmm->addConstraint(idx0, idx1, pert_eq_distance);
                              if (Debug)
                              {
                                  qDebug() << "perturbed bond parameter changes but involving " 
                                           << " light mass so constrained " << atom0.name().toString()
                                           << "- " << atom1.name().toString() << "\n";
                              } 
(..)

We get the atomic masses from the openmm system we constructed previously line 1083-1101

    for (int i = 0; i < nmols; ++i)
    {

        const int nats_mol = ws.nAtoms(i);

        const double *m = ws.massArray(i);

        MolNum molnum = moleculegroup.molNumAt(i);

        const ViewsOfMol &molview = moleculegroup[molnum].data();

        const Molecule &mol = molview.molecule();

        Selector<Atom> molatoms = mol.atoms();

        for (int j = 0; j < nats_mol; ++j)
        {
            /*JM 10/16 make sure that perturbed atoms have mass of heaviest end-state */
            system_openmm->addParticle(m[j]);

Where the masses come from a Sire atomic velocity workspace ws

lohedges commented 5 years ago

Thanks for the clarification. I'm still a little confused as to weather the heaviest mass fix applies only to hydrogens, or all atoms. Above you mention the issue with integrating hydrogen bonds, but early have an example of oxygen transforming to nitrogen.

Does the mass fix have any knock-on effect, i.e. at one end state you won't be simulating the true ligand since some of the atom masses will be different. Perhaps this is only makes a subtle difference to the calculation. (Affecting the entropy rather than energy?)

jmichel80 commented 5 years ago

The 2fs timestep integration errors only occur with bonds involving hydrogens as far as I am aware. This is because these bonds are a relatively short equilibrium distance and stiff spring constants.

The heaviest mass fix during the input files preparation stage is applied to all atoms being perturbed. This makes the mass of the A->B molecule identical to the mass of the B-->A molecule.

In theory changing atomic masses does not change excess free energies. The kinetic contribution to the partition function cancels out in a closed cycle as long as the masses of the perturbed molecules are consistent. Also SOMD (and other simulation engines as far as I am aware) compute changes in potential energy function along the lambda parameter to estimate free energy changes. That being said there could be subtle issues with the way discretisation of the equation of motions is done by various integrators (hence why I felt we should at least make the mass of the A->B molecule consistent with the B->A molecule).

In practice we haven't observed a detectable effect on relative hydration free energies in test systems (e.g. ethane->methanol). This is also why a number of codes implement 4fs integrators using atomic mass repartitioning schemes.


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Mon, Apr 1, 2019 at 9:28 AM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:

Thanks for the clarification. I'm still a little confused as to weather the heaviest mass fix applies only to hydrogens, or all atoms. Above you mention the issue with integrating hydrogen bonds, but early have an example of oxygen transforming to nitrogen.

Does the mass fix have any knock-on effect, i.e. at one end state you won't be simulating the true ligand since some of the atom masses will be different. Perhaps this is only makes a subtle difference to the calculation. (Affecting the entropy rather than energy?)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/74#issuecomment-478485035, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLd5NdMkPD62gGnH9b97VUX77P7Oso-ks5vccMNgaJpZM4cQQJ1.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 5 years ago

Thanks. I'll try to take a look at implementing the dihedral/improper edits this afternoon.

Thanks also for mentioning the integration time-step fix for GROMACS. At present the free energy protocol has a default time step of 2fs. I guess I could raise a warning if the MD engine chosen in GROMACS. (Perhaps automatically converting it too.)

lohedges commented 5 years ago

I've now implemented the mass and dihedral/improper fixes for SOMD. You can specify whether to zero the barrier height for dihedrals/impropers involving dummy atoms using the zero_dummy_dihedrals and zero_dummy_impropers keyword arguments. If you happen to have a useful ligand pair to use as a test case for this, that would be great.

I'm just working out how to best implement this in GROMACS too. Should be easy enough to pass some extra information using the property map that's passed to the GroTop parser.

jmichel80 commented 5 years ago

Hi @lohedges

I have create SOMD input files with the keyword options zero_dummy_dihedrals=True, zero_dummy_impropers=True to BSS.Protocol.FreeEnergy in this version of prepareFEP.ipynb

on the following inputs (using 31to32-custom-v2.map for the mapping).

31to32pert.zip

# Log the mapping used
writeLog(lig1, lig2, mapping)
BSS.IO.saveMolecules("merged_at_lam0.pdb", merged, "PDB", { "coordinates" : "coordinates0" , "element": "element0" })
# Generate package specific input
protocol = BSS.Protocol.FreeEnergy(runtime = 2*BSS.Units.Time.femtosecond, num_lam=3, zero_dummy_dihedrals=True, zero_dummy_impropers=True)
process = BSS.Process.Somd(system1, protocol)
process.getOutput()
cmd = "unzip -o somd.zip"
os.system(cmd)

This generates a suspicious pert file

    improper
        atom0          O15
        atom1          C16
        atom2          C21
        atom3          C17
        initial_form   1.1000 2.0 3.141594
        final_form     0.0 0.0 0.0
    endimproper
    improper
        atom0          C19
        atom1          C21
        atom2          C20
        atom3          H36
        initial_form   1.1000 2.0 3.141594
        final_form     0.0 0.0 0.0
    endimproper
    improper
        atom0          C21
        atom1          C19
        atom2          C20
        atom3          H36
        initial_form   0.0 0.0 0.0
        final_form     1.1000 2.0 3.141594
    endimproper
    improper
        atom0          O15
        atom1          C16
        atom2          C17
        atom3          C21
        initial_form   0.0 0.0 0.0
        final_form     1.1000 2.0 3.141594
    endimproper

Basically the improper around O15 C16 C21 C17 is equivalent to O15 C16 C17 C21 (and same for C19 C21 C20 H36 and C21 C19 C20 H36). So the perturbations turns off an improper whilst turning on the same improper. The code should have detected the impropers are equivalent. Also none of the atoms involved in the definition of these impropers are changing atom type during the perturbation, so they shouldn't appear in the pert file.

Also there are no dihedrals changing during the perturbation which cannot be correct. Dihedrals involving atoms H37 and H38N should go from non-zero value in the initial state to a zero value in the final state. Dihedrals involving atoms C37, H40, C36, H39, C35, H38D, N34Z should go from a null value to a defined value according to the atom types these atoms adopt in the final form.

You also want to pay attention to angles and torsions that involve atoms that are dummy in the initial state and other atoms that are dummy in the final state.

For instance H37 - C19 - N34Z should not have an angle defined or the angle should have null parameters (same for H38N-C18-C37)

image

lohedges commented 5 years ago

Thanks, I'll take a look at this tomorrow.

Just a quick note regarding the impropers. Matches are checked for using Sire.Mol.ImproperID. What you are saying should be equivalent impropers are not, i.e.

from SIre.Mol import ImproperID, AtomIdx

id0 = ImproperID(AtomIdx(0), AtomIdx(1), AtomIdx(2), AtomIdx(3))
# Swap the indices of the last two entries.
id1 = ImproperID(AtomIdx(0), AtomIdx(1), AtomIdx(3), AtomIdx(2))

id0 == id1
False

The order of atoms in an improper matters, right, or am I missing something? For bonds, angles, and dihedrals I use, unsurprisingly, BondID, AngleID, and DihedralID, respectively. These check for mirror IDs, since the atom order can be inverted without affecting the potential.

If I just need to check that the same atom indices appear in bonds, angles, dihedrals, and impropers, then I completely need to re-write the logic for the pert file writer and the Gromacs FEP parsing.

lohedges commented 5 years ago

Also, it's very hard to easily support the zero_dummy_dihedrals and zero_improper_dihedrals option for Gromacs, since the dihedrals and impropers get merged when writing. Perhaps this isn't a general option that we want to support for an FEP protocol, rather something specific for SOMD.

I'll think more about this to see if I can figure out how it could be implemented.

jmichel80 commented 5 years ago

Unfortunately impropers are more complicated than dihedrals because they can be defined in more than two ways. Did you check how this is done in FESetup ? I pointed this out previously, see around line 965 here

I suggest you check the comments in def make_pert_file() at the above mentioned link, there are a number of edge cases discussed throughout.

jmichel80 commented 5 years ago

Unfortunately impropers are more complicated than dihedrals because they can be defined in more than two ways. Did you check how this is done in FESetup ? I pointed this out previously, see around line 965 here

I suggest you check the comments in def make_pert_file() at the above mentioned link, there are a number of edge cases discussed throughout.

jmichel80 commented 5 years ago

Also, it's very hard to easily support the zero_dummy_dihedrals and zero_improper_dihedrals option for Gromacs,

What is the default behaviour then for Gromacs do not vary the dihedral or improper potential if one end-state has a dummy atom?

skfegan commented 5 years ago

Unfortunately impropers are more complicated than dihedrals because they can be defined in more than two ways. Did you check how this is done in FESetup ? I pointed this out previously, see around line 965 here

FYI: The official FESetup repository is https://github.com/CCPBioSim/fesetup. The master branch is the released version, and the dev branch has the latest commits. The pertfile.py you linked to can be found here.

lohedges commented 5 years ago

What is the default behaviour then for Gromacs do not vary the dihedral or improper potential if one end-state has a dummy atom?

Yes, that's right.

Unfortunately impropers are more complicated than dihedrals because they can be defined in more than two ways. Did you check how this is done in FESetup ? I pointed this out previously, see around line 965 here

Yes, I took a look at the time, but clearly not carefully enough. Apologies for my oversight. This is also related to the issue that I posted here, where you also refer to FESEtup too. I'll revisit this today since it seems like I'll need to implement some of this logic in BioSimSpace too.

As mentioned in the Sire issue above, a concern is that impropers are defined differently in the various topology file formats and we make no attempt to set the atom ordering correctly in the ImproperID when reading them. (To be fair, the ordering isn't really documented properly anywhere.) Also, information is lost when converting between file formats so that we may not even have any impropers when we get the the merge and pert file writing stage.

As an example, consider the following, which assumes that you are in the demo directory:

import BioSimSpace as BSS

# Read an AMBER format system.
s = BSS.IO.readMolecules(BSS.IO.glob("amber/ala*/*"))

# Does the first molecule have impropers?
s.getMolecules()[0]._sire_molecule.hasProperty("improper")
True

# Now convert to GROMACS format.
BSS.IO.saveMolecules("test", s, ["gro87", "grotop"])

# Read the GROMACS files.
s = BSS.IO.readMolecules(["test.grotop", "test.gro87"])

# Does the first molecule have impropers?
s.getMolecules()[0]._sire_molecule.hasProperty("improper")
False

We've lost the improper information since they've been converted to dihedrals. (Perhaps this is okay in this case since these impropers happen to be functionally equivalent to dihedrals, so would be would be handled correctly when generating the pert file anyway. Something to be aware of regardless.)

lohedges commented 5 years ago

I'm making progress on this and the pert file that I'm generating seems to resemble the output that you describe to be expected for the dihedral and improper terms.

You also mention angles...

You also want to pay attention to angles and torsions that involve atoms that are dummy in the initial state and other atoms that are dummy in the final state. For instance H37 - C19 - N34Z should not have an angle defined or the angle should have null parameters (same for H38N-C18-C37)

I don't see any angle terms in the generated pert file. (Nor did I before the updates to the code.) Am I missing something here?

Looking at the FESetup code, it also appears that you modify the element property for dummy atoms as well as the mass, i.e. it takes the end state with the largest number of protons.

lohedges commented 5 years ago

Scratch the above, after making some more edits I now see some angle terms. Thankfully the ones that you said should be excluded are not present.

jmichel80 commented 5 years ago

Looking at the FESetup code, it also appears that you modify the element property for dummy atoms as well as the mass, i.e. it takes the end state with the largest number of protons.

Seems reasonable. I don't think this has any effect on somd so I may not have noticed this modification previously.

jmichel80 commented 5 years ago
import BioSimSpace as BSS

# Read an AMBER format system.
s = BSS.IO.readMolecules(BSS.IO.glob("amber/ala*/*"))

# Does the first molecule have impropers?
s.getMolecules()[0]._sire_molecule.hasProperty("improper")
True

# Now convert to GROMACS format.
BSS.IO.saveMolecules("test", s, ["gro87", "grotop"])

# Read the GROMACS files.
s = BSS.IO.readMolecules(["test.grotop", "test.gro87"])

# Does the first molecule have impropers?
s.getMolecules()[0]._sire_molecule.hasProperty("improper")
False

Could a solution be to modify the grotop parser ? Topologically a dihedral is a subset of impropers where the 4 atoms used to define the dihedral share three consecutive bonds. If you have information about the topology of the molecule at hand you can decide whether to create a Dihedral or Improper after reading the parameters.

lohedges commented 5 years ago

I'm currently documenting the FESetup protocol. This includes differences between the SOMD and GROMACS perturbation topology, as well as the current implementation in BioSimSpace. This isn't exhaustive and I'll update as I understand things better. I just wanted to record everything in one place and open up some points for discussion. Feel free to add comments or edit.


Topology files:

SOMD

This contains the lambda = 0 state, where the mass and element records have been set to the end state with the largest value. Dummy atoms take the mass/element values of the non-dummy end state.

GROMACS

The topology file contains a complete description of the perturbation, i.e. all records are present for both the lambda = 0 and lambda = 1 states. If a record is missing for lambda = 1, then it is assumed that the state is unchanged and the lambda = 0 value is used.


Pert file:

SOMD requires and additional pert file that describes the perturbation, i.e. what is changing. As @jmichel80 mentions above, terms should only appear if they involve an atom that changes type.

Also none of the atoms involved in the definition of these impropers are changing atom type during the perturbation, so they shouldn't appear in the pert file.

This is probably okay for FESetup since it does the complete setup for you, i.e. it is guaranteed that both ligands are parameterised with the same force field. Although it's likely that users will do a complete FEP setup using BioSImSpace, we can't assume that this is the case. This means that it's possible that the two ligands could be parameterised with a different force field. If so, then any potential could change between the end states without the atoms changing type.

Is this something that we would want to support? At the moment we check that the two force fields are "compatible", which essentially means that they are AMBER-style, rather than being identical. We could detect cases where, e.g., a bond involving the same atom types has different terms in the two states , then raise an exception. Alternatively we could default to the properties from the initial state.

With regards to flagging the atoms that are perturbed: In the atom record section of the pert fille, the ambertype property is used to identify the initial and final atom states. Is this what should be used to determine whether an atom is perturbed, i.e. the ambertype changes? Or should one check that the element is different in each end state?

Below are some details regarding how different terms in the potential are handled in the pert file.

Bonds

Logic from FESetup:

        # If either atom in the initial/final states is a dummy, then the
        # parameters are kept constant throughout the perturbation
        #
        if idummy and fdummy:
            raise errors.SetupError('BUG: Both the initial and final states '
                                    'involve dummy atoms. (bond)')
        elif idummy:
            ipot = fpot
        elif fdummy:
            fpot = ipot

        if (idummy or fdummy) or (not samepotential):
            i_force = ipot[0]
            i_eq = ipot[1]
            f_force = fpot[0]
            f_eq = fpot[1]

            if shrinkdummybonds and idummy:
                i_eq = 0.2 # Angstrom
            if shrinkdummybonds and fdummy:
                f_eq = 0.2 # Angstrom

Bond terms are only written to the pert file if the potential changes, or if a dummy atom is present in the initial or final state. If a dummy is present, then the parameters are kept constant throughout the perturbation, i.e. they are set to the value in the non-dummy end state.

There is an additional option called shrinkdummybonds, which can be used to reduce the bond length for bonds involving dummy atoms. This is set to False by default.

Angles

Logic from FESetup:

       if idummy and fdummy:
            # This happens when we create 1-3 interactions involving two groups
            # of dummy atoms with some dummy in the initial state and some dummy
            # in the final state.  The best option is to use null parameters so
            # as not to artificially restraint the morph
            ipot = (0, 0)
            fpot = ipot
        elif idummy:
            ipot = fpot
        elif fdummy:
            fpot = ipot

        if (idummy or fdummy) or (not samepotential):
            i_force = ipot[0]
            i_eq = ipot[1]
            f_force = fpot[0]
            f_eq = fpot[1]

            if turnoffdummyangles and idummy:
                if (not at0.name().value().startsWith('DU') or
                    not at2.name().value().startsWith('DU') ):
                    i_force = 0.0
            if turnoffdummyangles and fdummy:
                if (not at0.name().value().startsWith('DU') or
                    not at2.name().value().startsWith('DU') ):
                    i_force = 0.0

Once again, terms are only written if there is a change in potential, or when dummy atoms are present. If dummy atoms are present, the non-dummy end state is used, apart from when dummy atoms are present in both end states, where null parameters are used throughout.

There is an additional option, turnoffdummyangles, which can be used to zero the force term when dummy atoms are present. This is also set to False by default.

Dihedrals

Logic from FESetup:

        # Unlike bonds/angles, dihedrals with dummies have no energy
        #
        if idummy and fdummy:
            # JM 03/13 This can happen in some morphs. Then use a null potential
            # throughout.
            ipot = [0.0, 0.0, 0.0]
            fpot = [0.0, 0.0, 0.0]
        elif idummy:
            if allidummy and not zero_dih_dummies:
                ipot = fpot
            else:
                ipot = [0.0, 0.0, 0.0]
        elif fdummy:
            if allfdummy and not zero_dih_dummies:
                fpot = ipot
            else:
                fpot = [0.0, 0.0, 0.0]

        # leap creates for some unkown reason zero torsions
        if len(ipot) == 3 and len(fpot) == 3 and \
               ipot[0] == 0.0 and fpot[0] == 0.0:
            continue

        if (idummy or fdummy) or (not samepotential):
        ....

Several cases are handled here:

Finally, there are cases where there are dihedrals present in the final topology that aren't present in the initial state. Here the initial state is set to a null dihedral. Nothing is done with regards to the presence of dummy atoms, but maybe dummy atoms can never be involved these dihedrals?

for fdihedral in unmapped_fdihedrals:
        fat0 = lig_final.select( fdihedral.atom0() ).index()
        fat1 = lig_final.select( fdihedral.atom1() ).index()
        fat2 = lig_final.select( fdihedral.atom2() ).index()
        fat3 = lig_final.select( fdihedral.atom3() ).index()

        fpot = params_final.getParams(fdihedral)
        ipot = [0.0, 0.0, 0.0]

For all potentials, BioSimSpace has logic to work out which are unique to lambda = 0, unique to lambda = 1, and which are shared. I was implementing the above, i.e. zeroing the unmatched end state.

Impropers

As discussed above, because of the inconsistent ordering of dihedral atoms, FESetup performs an exhaustive search when looking for matching impropers in the initial and final state, i.e. if there is a common set of atom indices in the two impropers then it is the same.

Logic from FESetup:

        if idummy and fdummy:
            ipot = [ 0.0, 0.0, 0.0 ]
            fpot = [ 0.0, 0.0, 0.0 ]
        elif idummy:
            if allidummy and not zero_dih_dummies:
                ipot = fpot
            else:
                ipot = [0.0, 0.0, 0.0]
        elif fdummy:
            if allfdummy and not zero_dih_dummies:
                fpot = ipot
            else:
                fpot = [0.0, 0.0, 0.0]

Same as for the dihedral case. Note that there isn't a separate option to zero the improper if dummies are present, i.e. zero_dih_dummies is used. This means you can't independently zero the dihedral and improper terms.

There is also logic to deal with unmatched impropers in the final state. (Same as dihedral case). There is a bit of additional code to handle the reverse situation:

        if not fpot:
            if not map_at0 or not map_at1 or not map_at2 or not map_at3:
                fpot = 'todefine'
                fdummy = True
            else:
                # JM 02/13 Found a case where one final improper does not exist,
                # but it does in the initial state...
                # R-NH2 --> R-NO2 this is at least for the gaff force field. It
                # seems then that a valid option is to guess a null improper in
                # those cases?

                fpot = [0.0, 0.0, 0.0]

As mentioned previously, BIoSimSpace automatically deals with this for all potentials.


Differences between SOMD and GROMACS

I've had a quick look through the GROMACS code in FESetup to try to see how the perturbed topology is written, and whether it is consistent with the SOMD rules described above. This is a little tricky, since the code is spread over multiple files. However, from a scan of the various keywords mentioned above I don't find any reference of shrinkdummybonds, turnoffdummyangles, or zero_dih_dummies.

Are these things SOMD specific? If so, then we can't really support them at the level of the free energy protocol. Perhaps the differences between MD engines mean that it's not possible to generate exactly the same perturbation, so we should strive to create the best practice protocol for whatever engine is chosen. If you think that there's no reason why these options couldn't be supported by GROMACS, then I could try to update the GroTop parser with additional logic.

At present, the GROMACS parser has no additional logic for handling terms involving dummy atoms. If this is wrong, then I'll need to update this. I'm not sure what the various MD engines do internally when setting up FEP simulations, i.e. do they detect the presence of dummy atoms and set things accordingly?

jmichel80 commented 5 years ago

Hi @lohedges sounds like the best solution is to implement package specific 'best practice' protocols and worry later about whether some operations could be handled at the level of free energy protocol. That is we modify the way GROMACS setup perturbations only if we realise later this is needed. @ppxasjsm should be able to advise on this once she has completed her tests.

jmichel80 commented 5 years ago

Also, to help you validate the SOMD pertfile implementation I suggest you carry out the following tests:

A) Using the input files here check you can generate similar looking pert files to those [here] (https://github.com/halx/relative-solvation-inputs/tree/master/SIMS/SOMD/relative) and here

Note that GROMACS has different relative protocol setups which involve splitting a perturbation in sequential steps. GROMACS can also generate a perturbation in one step and which one should be used is still a matter of debate. See the discussion here

B) For each perturbation carry out a vacuum free energy calculation using ca. 20 windows for 1 ns each (should be be enough sampling).

C) Test consistency of single point energies between A-->B in vacuum at lambda 0.0, lambda 1.0 and B-->A at lambda 0.0 and lambda 1.0.

This is a bit more time consuming because the input files must have the same coordinates, though it may be that energy minimisation before single point energy calculation is sufficient and there is no need to manually tweak input coordinates to run this test.

lohedges commented 5 years ago

Thanks. I agree about having package specific protocols. I think ultimately we will have something like Protocol.BindingFreeEnergy, with derived protocols for the different engines, e.g. Protocol.SomdBindingFreeEnergy. (The user could choose to use SOMD directly directly by passing the derived protocol.) Similar could be done for the MD engines too. When I get time, I'll have a think about refactoring the protocol stuff.

As suggeted, I'll try reproducing some of the data you linked to. Before I make any more edits, could you confirm whether zero_dih_dummies actually should be the default? (Perhaps this is a config file option that you always use.) Also, should I detect perturbations by changes in ambertype, or element?

Cheers.

jmichel80 commented 5 years ago

Before I make any more edits, could you confirm whether zero_dih_dummies actually should be the default?

Good question...We've done this by default previously, but I wonder if not zeroing dihedral dummies may be advantageous. I'm happy to have this option on by default, but I will likely generate perturbations with zero_dih_dummies set to False in the near future and compare results.

Also, should I detect perturbations by changes in ambertype, or element?

I think the logic used previously relies only on ambertype to detect changes (and partial charges are always assumed to change). This makes sense given that force field parameters (apart from charges) are defined per atom types. Also you can have perturbations that do not change the element (e.g. hc-->h1; ca->c3 ; ...).

I can see that relying on ambertypes could create issue as we gradually increase support other codes and non-AMBER force fields, but I don't have a clear way of dealing with that yet.

A more general solution would be to compare forcefield parameters to decide if a perturbation occurs.

It would be useful if you read this recent paper and think how the current implementation approach could cope with this type of forcefield.

lohedges commented 5 years ago

I'll leave the dihedral/improper zeroing options as arguments to pert file generating function, but will remove them from the protocol object (until I create package specific protocols). That way you can modify the pert file when doing your tests, but we don't advertise a feature that isn't supported elsewhere.

Do you also want to include the other options too, i.e. shrinkdummybonds and turnoffdummyangles, or are those options deprecated?

jmichel80 commented 5 years ago

Do you also want to include the other options too, i.e. shrinkdummybonds and turnoffdummyangles, or are those options deprecated?

I would leave them in set to False by default. These options may be needed for some types of FEPs that we don't run at the moment but could in the future.


Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Fri, Apr 5, 2019 at 2:15 PM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:

I'll leave the dihedral/improper zeroing options as arguments to pert file generating function, but will remove them from the protocol object (until I create package specific protocols). That way you can modify the pert file when doing your tests, but we don't advertise a feature that isn't supported elsewhere.

Do you also want to include the other options too, i.e. shrinkdummybonds and turnoffdummyangles, or are those options deprecated?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/74#issuecomment-480271063, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLd5Nsy1CWYnGBmiLtvow0AjXwNUx8Mks5vd0x_gaJpZM4cQQJ1.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 5 years ago

Here is this logic for the turnoffdummyangles option:

            if turnoffdummyangles and idummy:
                if (not at0.name().value().startsWith('DU') or
                    not at2.name().value().startsWith('DU') ):
                    i_force = 1.0
            if turnoffdummyangles and fdummy:
                if (not at0.name().value().startsWith('DU') or
                    not at2.name().value().startsWith('DU') ):
                    i_force = 0.0

The first conditional check sets the force constant of the initial angle to one, the second sets the force constant of the same angle to zero. Shouldn't the first case set the initial force to zero and the second set final force to zero?

Also, the check looks like it is seeing whether at least one of the terminal atoms in each angle isn't a dummy. However, for both the initial and final state check it is comparing the same atom names. Shouldn't it be checking whether the atoms are dummies in the initial and final states respectively?

Perhaps I'm not understanding quite what this option is doing? However, if it is indeed a bug, I'll post an issue on the FESetup GitHub page.

jmichel80 commented 5 years ago

Yes that code makes little sense....Clearly shows this option hasn't been tested. Sorry for back tracking on my earlier comments but if the reference implementation is not trustworthy I don't wish to suggest you spend time implementing features that are not currently used and won't be tested soon. It sounds like we should just skip turnoffdummyangles and perhaps also shrinkdummybonds for the time being.

On Fri, Apr 5, 2019 at 3:27 PM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:

Here is this logic for the turnoffdummyangles option:

        if turnoffdummyangles and idummy:
            if (not at0.name<http://at0.name>().value().startsWith('DU') or
                not at2.name<http://at2.name>().value().startsWith('DU') ):
                i_force = 1.0
        if turnoffdummyangles and fdummy:
            if (not at0.name<http://at0.name>().value().startsWith('DU') or
                not at2.name<http://at2.name>().value().startsWith('DU') ):
                i_force = 0.0

The first conditional check sets the force constant of the initial angle to one, the second sets the force constant of the same angle to zero. Shouldn't the first case set the initial force to zero and the second set final force to zero?

Also, the check looks like it is seeing whether at least one of the terminal atoms in each angle isn't a dummy. However, for both the initial and final state check it is comparing the same atom names. Shouldn't it be checking whether the atoms are dummies in the initial and final states respectively?

Perhaps I'm not understanding quite what this option is doing? However, if it is indeed a bug, I'll post an issue on the FESetup GitHub page.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/74#issuecomment-480295721, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLd5MbpLTverDli6gRnQTPfCRDmzd3eks5vd107gaJpZM4cQQJ1.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

jmichel80 commented 5 years ago

Having thought about it again it is possible that the logic implemented in FESetup to create a pert file relies on the parm7 file of the perturbed molecule having been created by leap and parmcheck previously to add parameters for bonded terms involving dummy atoms. The Sire parm writer may use another logic to write a merged molecule.

So it may be that the FESetup implementation must be modified in fact.

To recap, the behaviour we aim for SOMD is:

lohedges commented 5 years ago

Thanks for recap, what you've written agrees with my interpretation of the FESetup logic, and is what I implemented in BioSimSpace on Friday. I'll check the improper section to see if I can figure out what's wrong.

With regards to the dummy atom parameters: The Sire AmberPrm parsers knows nothing about the merged molecule, i.e. everything is handled by BioSimSpace. During the creation of the pert file, BioSimSpace modifies the atom masses and elements of the dummy atoms in the underlying Sire molecule. During a SOMD free energy setup up, BioSimSpace then writes the pert file to disk along with the lambda = 0 state of the system (which includes the modified merged molecule). If there are dummy atoms in the initial states, then the prm7 file will contain records for whatever potential terms these atoms are involved in.

jmichel80 commented 5 years ago

Thanks for recap, what you've written agrees with my interpretation of the FESetup logic, and is what I implemented in BioSimSpace on Friday. I'll check the improper section to see if I can figure out what's wrong.

With regards to the dummy atom parameters: The Sire AmberPrm parsers knows nothing about the merged molecule, i.e. everything is handled by BioSimSpace. During the creation of the pert file, BioSimSpace modifies the atom masses and elements of the dummy atoms in the underlying Sire molecule. During a SOMD free energy setup up, BioSimSpace then writes the pert file to disk along with the lambda = 0 state of the system (which includes the modified merged molecule). If there are dummy atoms in the initial states, then the prm7 file will contain records for whatever potential terms these atoms are involved in.

Can you point me to the relevant bits of the code in the library so I can take a look ? The possible difference with FESetup is that the terms involving dummy atoms have null parameters when the prm7 file is constructed. I want to check what BSS does in this case since it presumably doesn't rely on a call to parmchk to work out parameters for degrees of freedom involving dummy atoms.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 5 years ago

The pert file is written by the function starting on line 673 here. At the very end of this function all of the properties are set the those in the lambda = 0 state, apart from the fixes to the mass and element.

To see how the above is used when setting up a SOMD FEP simulation you can look at the section starting at line 171 here. This bit has some logic to ensure that the merged molecule is the first entry in the prm file, etc.

jmichel80 commented 5 years ago

Thanks. I've gone through this, a few small comments

** Angles

-Angles unique to lambda 0 or lambda 1.

—> Raise a warning if this happens. It may not be difficult to turn off the harmonic turn. I would like to see examples of mappings where an angle is unique to lambda 0 or 1 but doesn’t involve dummy atoms. Does this actually happen?

Rather than perturb to/from a null angle set initial/final_force to 0 keep inital/final_equil to the theta value of the other state.Should be easier to converge free energy changes.

Scanning through _merge() it looks like the code just ignores angles that involve dummy atoms in initial and final states so these won’t have an energy (whereas FESetup was assigning a null parameter, both solutions are ok).

*** Dihedrals that are unique to lambda 0/1

Rather than perturb to/from a null angle set k in initla/final form to and keep periodicity and phase values equal to those of the other state.Should be easier to converge free energy changes.

For debugging purposes it would be useful to return a list of shared and unique bonds, angles, dihedrals, impropers

On Mon, Apr 8, 2019 at 9:21 AM Lester Hedges notifications@github.com<mailto:notifications@github.com> wrote:

The pert file is written by the function starting on line 673 herehttps://github.com/michellab/BioSimSpace/blob/devel/python/BioSimSpace/_SireWrappers/_molecule.py#L673. At the very endhttps://github.com/michellab/BioSimSpace/blob/devel/python/BioSimSpace/_SireWrappers/_molecule.py#L1522 of this function all of the properties are set the those in the lambda = 0 state, apart from the fixes to the mass and element.

To see how the above is used when setting up a SOMD FEP simulation you can look at the section starting at line 171 herehttps://github.com/michellab/BioSimSpace/blob/devel/python/BioSimSpace/Process/_somd.py#L171. This bit has some logic to ensure that the merged molecule is the first entry in the prm file, etc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/74#issuecomment-480733630, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALLd5C3R_TaOFT2q8Cd2WURPW6_M7A18ks5vevwkgaJpZM4cQQJ1.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 5 years ago

Thanks.

Raise a warning if this happens. It may not be difficult to turn off the harmonic turn. I would like to see examples of mappings where an angle is unique to lambda 0 or 1 but doesn’t involve dummy atoms. Does this actually happen?

I'm not sure if it can occur, but thought it was safest to implement code to catch all possible cases for each term in the potential, i.e. unique to lambda = 0, unique to lambda = 1, and shared. (Apart from bonds, which can't involve two dummy atoms.) If there are cases that are impossible (or would be invalid) then I'm happy to report warnings or throw exceptions. The merge code has some fairly robust internal self-consistency checking so most of these things would be caught there anyway, i.e. before the pert file is every written.

Rather than perturb to/from a null angle set k in initla/final form to and keep periodicity and phase values equal to those of the other state.Should be easier to converge free energy changes.

Sure thing, I'll add this in. You also mention the same thing in your comment regarding dihedrals. Was this a copy paste error, or should I do the same for dihedrals/impropers. (FESetup uses null potentials for unique dihedrals/impropers.)

lohedges commented 5 years ago

Scanning through _merge() it looks like the code just ignores angles that involve dummy atoms in initial and final states so these won’t have an energy (whereas FESetup was assigning a null parameter, both solutions are ok).

I'm not sure this is the case. Take a look at, e.g. the angles section here. The angles at lambda = 0 consist of all of the angles from molecule0, plus any angle from molecule1 that involve an atom that is unique to molecule1, i.e. not part of the mapping. The opposite is true at lambda = 1, i.e. all angles from molecule1 and those from molecule0 that contain atoms unique to molecule0.

jmichel80 commented 5 years ago

Trying to debug why the impropers are not created correctly in the pert file

The code here looks suspicious

            # lambda = 0.
            for idx0 in impropers0_idx.keys():
                is_shared = False
                for idx1 in impropers1_idx.keys():
                    if idx0.equivalent(idx1):
                        impropers_shared_idx[idx0] = (impropers0_idx[idx0], impropers1_idx[idx1])
                        is_shared = True
                if not is_shared:
                    impropers0_unique_idx[idx0] = impropers0_idx[idx0]

            # lambda = 1.
            for idx1 in impropers1_idx.keys():
                is_shared = False
                for idx0 in impropers0_idx.keys():
                    if idx1.equivalent(idx0):
                        impropers1_unique_idx[idx1] = impropers1_idx[idx1]
                        is_shared = True
                if not is_shared:
                    impropers_shared_idx[idx1] = (impropers0_idx[idx0], impropers1_idx[idx1])

Shouldn't the second block read

            # lambda = 1.
            for idx1 in impropers1_idx.keys():
                is_shared = False
                for idx0 in impropers0_idx.keys():
                    if idx1.equivalent(idx0):
                        impropers_shared_idx[idx1] = (impropers0_idx[idx0], impropers1_idx[idx1])
                        is_shared = True
                if not is_shared:
                    impropers1_unique_idx[idx1] = impropers1_idx[idx1]

Also why not break about of the inner for loop once/if a equivalent improper has been found ?

Implementing the above code changes reduces the number of perturbed impropers from 24 to 3 on the test molecules I am working on.

The 3 perturbed impropers left involve atoms that are dummy in the initial state

image

    improper
        atom0          C18
        atom1          C31
        atom2          H21
        atom3          C33
        initial_form   0.0000 2.0 3.141594
        final_form     3.6250 2.0 3.141594
    endimproper
    improper
        atom0          C30
        atom1          H21
        atom2          C31
        atom3          H37
        initial_form   0.0000 2.0 3.141594
        final_form     3.6250 2.0 3.141594
    endimproper
    improper
        atom0          H21
        atom1          C34
        atom2          C33
        atom3          H38
        initial_form   0.0000 2.0 3.141594
        final_form     3.6250 2.0 3.141594
    endimproper

The ideal behaviour would be to have the barrier height of the initial form set to 0 if zero_dummy_impropers is set to True, otherwise initial form should be equal to final form.

Also i note the keyword argument zero_dummy_impropers is not used anywhere in def _toPertFile(

Input and code to reproduce the above behaviour 17to8.zip

~/sire.app/bin/python ~/software/devel/BioSimSpace/nodes/playground/prepareFEP.py --input1 tyk_lig17/vacuum.parm7 tyk_lig17/vacuum.rst7 --input2 tyk_lig8/vacuum.parm7 tyk_lig8/vacuum.rst7 --output TEST^

jmichel80 commented 5 years ago

A few more question marks about the impropers code

(here)[https://github.com/michellab/BioSimSpace/blob/devel/python/BioSimSpace/_SireWrappers/_molecule.py#L1346-L1357]

amber_dihedral = _SireMM.AmberDihedral(dihedral.function(), _SireCAS.Symbol("phi")) --> amber_dihedral = _SireMM.AmberDihedral(improper.function(), _SireCAS.Symbol("phi"))

?

Same on L1391, L1429, L1430.

jmichel80 commented 5 years ago

Definitely bugs. The above code changes now give

    improper
        atom0          C30
        atom1          H21
        atom2          C31
        atom3          H37
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper
    improper
        atom0          H21
        atom1          C34
        atom2          C33
        atom3          H38
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper
    improper
        atom0          C18
        atom1          C31
        atom2          H21
        atom3          C33
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

which has the correct final form for these GAFF impropers. Now we just need an option to setting initial form= final form

lohedges commented 5 years ago

Thanks, these should now be fixed in devel. Remarkably, all of these bugs were introduced in my attempts to implement the FESetup logic on Friday! So much for making things better. Beforehand, everything was okay, other than the handling of terms involving dummy atoms.

jmichel80 commented 5 years ago

Hi @lohedges

I still think the implementation is not doing what we said it should. If I set zero_dummy_impropers = False (default behaviour of _toPertFile)

I get this type of output on the17t8 example for H21 C34 C33 H38

    improper
        atom0          H21
        atom1          C34
        atom2          C33
        atom3          H38
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

Whereas I was expecting this for H21 C34 C33 H38

    improper
        atom0          H21
        atom1          C34
        atom2          C33
        atom3          H38
        initial_form   1.1000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

If I set zero_dummy_impropers = True

I get what is the expected output for H21 C34 C33 H38

    improper
        atom0          H21
        atom1          C34
        atom2          C33
        atom3          H38
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

I also get 3 new variable impropers in addition to H21 C34 C33 H38, C30 H21 C31 H37, C18 C31 H21 C33

    improper
        atom0          C29
        atom1          C33
        atom2          C34
        atom3          H39
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

    improper
        atom0          C29
        atom1          C31
        atom2          C30
        atom3          H36
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

    improper
        atom0          C30
        atom1          C34
        atom2          C29
        atom3          H35
        initial_form   0.0000 2.0 3.141594
        final_form     1.1000 2.0 3.141594
    endimproper

To recap zero_dummy_impropers = False

with initial_form 1.1000 2.0 3.141594 final_form 1.1000 2.0 3.141594

UNLESS the above improper parameters of the merged molecule are saved in the molecule returned by _toPertFile. However based on the API I don’t think this is the case.

*zero_dummy_impropers = True Should give 6 dummy entries in the pert file H21 C34 C33 H38 C30 H21 C31 H37 C18 C31 H21 C33 C29 C33 C34 H39 C29 C31 C30 H36 C30 C34 C29 H35

with initial_form 0.0 2.0 3.141594 final_form 1.1000 2.0 3.141594

I will run some tests on the behaviour of zero_dummy_dihedrals later

lohedges commented 5 years ago

It's because I was implementing the exact FESetup logic that I was describing earlier in this thread, i.e. the zero_dummy_dihedrals and zero_dummy_impropers options only take effect if all atoms involved in the dihedral/improper are dummies. If 3 or less atoms are dummies, then a null potential is always used. See below for the dihedral/improper code from FESetup.

        if idummy and fdummy:
            ipot = [ 0.0, 0.0, 0.0 ]
            fpot = [ 0.0, 0.0, 0.0 ]
        elif idummy:
            if allidummy and not zero_dih_dummies:
                ipot = fpot
            else:
                ipot = [0.0, 0.0, 0.0]
        elif fdummy:
            if allfdummy and not zero_dih_dummies:
                fpot = ipot
            else:
                fpot = [0.0, 0.0, 0.0]

Is this incorrect? I can implement the logic that you want, i.e. zero_dummy_dihedrals and zero_dummy_impropers applying to any number of dummy atoms. I'll post an FESetup issue if you think this is the desired behaviour.

jmichel80 commented 5 years ago

But then this is the output produced by FESetup for this system which is consistent with the code you described but not the output produced by BSS . In that case the 3 'all dummy' impropers have initial_form = final_form and those involving a mix of dummy and non dummy atoms have a null potential for initial_form.

        improper
                atom0   DU31
                atom1   DU34
                atom2   DU30
                atom3   DU35
                initial_form 1.1 2.0 3.141594
                final_form 1.1 2.0 3.141594
        endimproper
        improper
                atom0   DU30
                atom1   DU32
                atom2   DU31
                atom3   DU36
                initial_form 1.1 2.0 3.141594
                final_form 1.1 2.0 3.141594
        endimproper
        improper
                atom0   DU30
                atom1   DU33
                atom2   DU34
                atom3   DU39
                initial_form 1.1 2.0 3.141594
                final_form 1.1 2.0 3.141594
        endimproper
        improper
                atom0   H21
                atom1   DU34
                atom2   DU33
                atom3   DU38
                initial_form 0.0 0.0 0.0
                final_form 1.1 2.0 3.141594
        endimproper
        improper
                atom0   C18
                atom1   DU32
                atom2   H21
                atom3   DU33
                initial_form 0.0 0.0 0.0
                final_form 1.1 2.0 3.141594
        endimproper
        improper
                atom0   DU31
                atom1   H21
                atom2   DU32
                atom3   DU37
                initial_form 0.0 0.0 0.0
                final_form 1.1 2.0 3.141594
        endimproper

So part of the confusion is that I have written guidelines on what the code should do, but it turns out it is not what FESetup is doing...In light of this.

zero_dummy_impropers = True should give H21 C34 C33 H38 C30 H21 C31 H37 C18 C31 H21 C33 with initial_form 0.0 2.0 3.141594 final_form 1.1000 2.0 3.141594

and C29 C33 C34 H39 C29 C31 C30 H36 C30 C34 C29 H35 with initial_form 0.0 2.0 3.141594 final_form 1.1000 2.0 3.141594

zero_dummy_impropers = False should give H21 C34 C33 H38 C30 H21 C31 H37 C18 C31 H21 C33 with initial_form 0.0 2.0 3.141594 final_form 1.1000 2.0 3.141594

and C29 C33 C34 H39 C29 C31 C30 H36 C30 C34 C29 H35 with initial_form 1.1000 2.0 3.141594 final_form 1.1000 2.0 3.141594

Possibly we don't get this behaviour in BSS at the moment because of the following code here

                    # Only write record if the improper parameters change.
                    if zero_k or amber_dihedral0 != amber_dihedral1:

And to confirm...since _toPertFile returns the topology of molecule0 is it safe to assume that any atoms that are dummy in the initial state will not have a potential defined in the list of of bonds, angles, dihedrals and impropers written to the prm7 file ? And that the correct value of the potentials to use for these internal degrees of freedom will only come from the information written in the pert file ?

lohedges commented 5 years ago

Possibly we don't get this behaviour in BSS at the moment because of the following code here.

Yes, that could be why. I'll have a check.

And to confirm...since _toPertFile returns the topology of molecule0 is it safe to assume that any atoms that are dummy in the initial state will not have a potential defined in the list of of bonds, angles, dihedrals and impropers written to the prm7 file ? And that the correct value of the potentials to use for these internal degrees of freedom will only come from the information written in the pert file ?

No, toPertFile returns the topology of the merged molecule at lambda = 0, with the atom masses and elements of dummy atoms patched. There will be bonds, angles, dihedrals and impropers written to the prm7 file for any atom in the lambda = 0 state that is part of molecule1 (not part of the mapping, i.e. a dummy).

jmichel80 commented 5 years ago

And to confirm...since _toPertFile returns the topology of molecule0 is it safe to assume that any atoms that are dummy in the initial state will not have a potential defined in the list of of bonds, angles, dihedrals and impropers written to the prm7 file ? And that the correct value of the potentials to use for these internal degrees of freedom will only come from the information written in the pert file ?

(...) There will be bonds, angles, dihedrals and impropers written to the prm7 file for any atom in the lambda = 0 state that is part of molecule1.

Ok looks like I need to study def _merge() to work out the exact logic used to assign parameters to these degrees of freedom. What I want to check is that the pert file is not going to miss 'patching' some potentials defined in the prm7 file. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

lohedges commented 5 years ago

During the merge, terms involving dummy atoms are set to the value of the opposite end state, i.e. a dihedral involving a dummy atom at lambda = 0 will be equal to the potential at lambda = 1, where the atom is not a dummy. The pert file should then patch these by matching against the atom names involved in the potential. (At least that's what I thought.)

lohedges commented 5 years ago

I can confirm that the improper terms involving 4 dummy atoms are absent from the pert file because of this line:

 # Only write record if the improper parameters change.
 if zero_k or amber_dihedral0 != amber_dihedral1:

However, as mentioned above, the correct potential from lambda = 1 will already be present in the prm7 file so this should be okay. I can easily add another flag that forces a write to the pert file if all atoms are dummies.

jmichel80 commented 5 years ago

However, as mentioned above, the correct potential from lambda = 1 will already be present in the prm7 file so this should be okay

Ok I understand how it works in the new implementation which will help debug.

I can easily add another flag that forces a write to the pert file if all atoms are dummies.

I think we should do that nonetheless for dihedrals and impropers because we already write bonds and angles that have same initial and final parameters in the pert file (because they involve at least one dummy atom). This makes it easier to troubleshoot issues as well by reading the pert file.