michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Wrong nonbonded exceptions from a PartialMolecule #241

Closed msuruzhon closed 3 years ago

msuruzhon commented 3 years ago

Hello,

I when I create a PartialMolecule from a selection of atoms and write it out as a PRM7 file, it seems to remove all nonbonded exceptions in the process.

Here is a zip file that reproduces that behaviour: Archive.zip

Is this behaviour related to the way I am generating the PartialMolecule or is this a bug?

Many thanks.

lohedges commented 3 years ago

It looks like Sire.IO.PartialMolecule is literally extracting the corresponding terms from the intrascale matrix of the merged molecule, rather than adjusting them on the basis of the resulting molecule.

Original methanol topology:

%FLAG NUMBER_EXCLUDED_ATOMS
%FORMAT(10I8)
       5       4       3       2       1       1
%FLAG NONBONDED_PARM_INDEX
%FORMAT(10I8)
       1       2       4       7       2       3       5       8       4       5
       6       9       7       8       9      10
%FLAG EXCLUDED_ATOMS_LIST
%FORMAT(10I8)
       2       3       4       5       6       3       4       5       6       4
       5       6       5       6       6       0

Lambda=0 end state (methanol) including dummies:

%FLAG NUMBER_EXCLUDED_ATOMS
%FORMAT(10I8)
       1       1       1       1       1       1       1       1
%FLAG NONBONDED_PARM_INDEX
%FORMAT(10I8)
       1       2       4       7       2       3       5       8       4       5
       6       9       7       8       9      10
%FLAG EXCLUDED_ATOMS_LIST
%FORMAT(10I8)
       0       0       0       0       0       0       0       0

Partial molecule excluding dummies:

%FLAG NUMBER_EXCLUDED_ATOMS
%FORMAT(10I8)
       1       1       1       1       1       1
%FLAG NONBONDED_PARM_INDEX
%FORMAT(10I8)
       1       2       4       7       2       3       5       8       4       5
       6       9       7       8       9      10
%FLAG EXCLUDED_ATOMS_LIST
%FORMAT(10I8)
       0       0       0       0       0       0

I'll check that the lambda=0 end state intrascale matrix is correct. We have internal tests that check that the energies of the original molecules agree with those of partial molecules extracted from the merged molecule. This also validates that the intrascale matrix is self-consistent. (I've also checked these by using the molecules extracted with toRegularMolecule rather than using a property map and the energies also agree.)

lohedges commented 3 years ago

Perhaps the reason for this use case will become clear on Thursday, but I'm confused why you can't just use the original aligned molecule rather than extracting the end state from the merged molecule and removing the dummies. Obviously the atom indexing / numbering could be different in the merged molecule, but this wouldn't be preserved on write to AMBER format files anyway.

Just to note that the following in your script should be unnecessary:

# Get the coordinates.
coordinates = merged_mol._sire_object.property("coordinates0")

# Generate a merged_mol with all dummies present.
merged_mol = merged_mol.copy()._toRegularMolecule(is_lambda1=False)

# Set the coordinates to those at lambda = 0.
merged_mol._sire_object = merged_mol._sire_object.edit().setProperty("coordinates", coordinates).commit()

When you get the lambda=0 state from the merged molecule by using ._toRegularMolecule the coordinates0 property is already copied to coordinates. (The function essentially overwrites every prop with either prop0 or prop1 depending on which end state is requested.)

If you need to remove dummies to get things to work with AMBER, you can pass convert_amber_dummers=True to replace the ambertype and element properties of dummies with those of the opposite end state. This allows you to run regular simulations for a system containing a merged molecule, e.g. you could extract one end state and run an equilibration.

msuruzhon commented 3 years ago

Thanks for the feedback Lester, the reason I can't use the original aligned molecule is that the common core atoms must have the same ordering between both endstates, since AMBER assumes that. That's why the easiest solution (as far as I could see) is to just take the merged endstate and remove all the dummies. Removing all dummies is also needed, because as far as I know AMBER requires that you have no dummies if a soft-core potential is used.

In terms of extracting the coordinates, this code is part of a larger function which always copies the coordinates at lambda=0, no matter which endstate is extracted. For some reason I found that the same-atom coordinates between both endstates didn't always match so I opted to use the one at lambda=0 in all cases. I guess that's another point I should probably raise because I am not sure if this is intended behaviour or not.

lohedges commented 3 years ago

Thanks for clarifying. I guess we can discuss on Thursday. I'd also be happy to video chat afterwards once I've had a chance to think things over.

Regarding the end state coordinates: Atom properties should correspond to the those of the reference molecule in the end state, or the reference molecule from the opposite state in the case of a dummy atom. The exceptions are the charge and LJ properties, which are zeroed for dummy atoms. This means that the atom coordinates of the MCS section aren't the same at lambda=0 and lambda=1. (The should be close, though, because of the alignment step.)

lohedges commented 3 years ago

Hi again,

It looks like I've encountered this issue within BioSimSpace before and have used the Sire.IO.GroTop parser to reconstruct the instrascale matrix. (It does this when the property isn't present.) Adding the following (fomatted for readability) to your code should hopefully fix things, and means you can do the whole thing with Sire/BioSimSpace, i.e. not needing ParmEd, or any intermediate files.

# Create a partial merged_mol and extract the atoms.
partial_merged_mol = _SireMol.PartialMolecule(merged_mol._sire_object, selection) \
                    .extract()                                                    \
                    .molecule()

# Remove the incorrect intrascale property.
partial_merged_mol = partial_merged_mol.edit()    \
                    .removeProperty("intrascale") \
                    .molecule()                   \
                    .commit()

# Recreate a BioSimSpace molecule object..
merged_mol = _Molecule(partial_merged_mol)

# Parse the molecule as a GROMACS topology, which will recover the intrascale
# matrix.
gro_top = _SireIO.GroTop(merged_mol.toSystem()._sire_object)

# Convert back to a Sire system.
gro_sys = gro_top.toSystem()

# Add the intrascale property back into the merged molecule.
edit_mol = merged_mol._sire_object.edit()
edit_mol = edit_mol.setProperty("intrascale", gro_sys[_SireMol.MolIdx(0)].property("intrascale"))
merged_mol = _Molecule(edit_mol.commit())

BSS.IO.saveMolecules("merged_test", merged_mol, "prm7")

The non-bonded exceptions in the merged_test.prm7 file agree with those from the original input. I've only tested this using a few merges, so let me know if you still experience issues.

msuruzhon commented 3 years ago

Hi Lester,

Thanks for that, this fixes things for me too. I guess we can close this issue then?

lohedges commented 3 years ago

Great, pleased to hear. Yes, I'll close this for now. The issue is now documented so is searchable by others.

I've also added (not yet pushed) an extract method to BioSimSpace._SireWrappers.Molecule that does this for you, and also _toAmberMolecule that returns the dummy free end state along with the indices of the dummy atoms. (Analagous in functionality to _toRegularMolecule.) I'll let you know when I've pushed so you can test if out if needed. (These allowed me to write some unit tests to check that things were working as expected.)

lohedges commented 3 years ago

Okay, you can now do the following:

m0_no_dummies, m0_dummy_indices = merged_mol._toAmberMolecule()
m1_no_dummies, m1_dummy_indices = merged_mol._toAmberMolecule(is_lamba1=True)