michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

BioSimSpace.Process.Somd.GetSystem() Error #250

Closed msuruzhon closed 2 years ago

msuruzhon commented 2 years ago

Hi,

I have been running into issues when loading back a system run with SOMD. When I run these input files with the following script:

import BioSimSpace as BSS

water_mol = BSS.IO.readMolecules(["water.prm7", "water.rst7"])
mol0 = BSS.IO.readMolecules(["methane_vac.parm7", "methane_vac.rst7"])[0]
mol1 = BSS.IO.readMolecules(["methanol_vac.parm7", "methanol_vac.rst7"])[0]
merged_mol = BSS.Align.merge(mol1, mol0).toSystem()

# system = merged_mol + water_mol doesn't work either but system = merged_mol does
system = water_mol + merged_mol

protocol = BSS.Protocol.Minimisation(steps=1)
process = BSS.Process.Somd(
    system,
    protocol,
    name="minimise",
    work_dir="protein_test/minimise"
)
process.start()
process.wait()
protein = process.getSystem()

if protein is None:
    raise ValueError("process.getSystem() failed")

I run into an issue when reading back the system using BioSimSpace.Proess.SOMD.getSystem(). At least part of the bug sems to be the different molecule ordering in the old_system and the new_system, but the bug still persists even if I swap the two molecules (as remarked in the comments). I don't see a bug when I only use one molecule, even if it is perturbable.

This behaviour is observed on macOS but I imagine it is observable on Linux too.

Many thanks.

lohedges commented 2 years ago

Thanks for this. Unfortunately SOMD doesn't preserve molecular ordering on startup (apart from the perturbed ligand) so I've had to implement some tricks to ensure that coordinates are copied back correctly. This actually isn't documented anywhere in SOMD and it only became apparent when I was fixing another bug. I imagine that this isn't working correctly in all cases. I'll take a look when I can.

lohedges commented 2 years ago

Just to note that SOMD requires that the perturbable ligand is the first molecule (you can reindex it in the config, but it is easiest to swap it), hence the reordering that you tried not making any difference. We do this swap behind the scenes, but remap things when copying coordinates.

lohedges commented 2 years ago

The issue is that I match atoms between the original and loaded system by number, which is unique if you create a system directly with Sire. However, it you use BioSimSpace to add two molecules, then the atoms aren't renumbered when creating the system. I'll try to figure out a workaround, e.g. renumbering things if necessary. It would be much easier if SOMD actually preserved ordering, but that is probably more difficult to fix in the short term.

system = water_mol + merged_mol

Here the composite system would have duplicate atom (and residue) numbers.

lohedges commented 2 years ago

Okay, I had a few spare minutes and I think I've fixed it. Within BioSimSpace we now ensure that atom and residue numbers are unique when combining molecules in any way, i.e. creating a Molecules container, or adding molecules to an existing System object. This isn't strictly needed by Sire (only molecule numbers should be unique) but it makes working with SOMD a whole lot easier, since we can (hopefully) remap the atoms after a process.

I've pushed an update but will run a few more checks to make sure that things are working for more complex systems.

msuruzhon commented 2 years ago

This sounds great, many thanks for that Lester! I will test it as well and will report if there are any issues.

msuruzhon commented 2 years ago

I have found that if the perturbed molecule is not at the beginning of the System, this can result in problems while reloading back using BioSimSpace.Process.Somd.getSystem(), since the resulting PDB file has a different order to the cached system. However, if the re-ordered system is stored permanently within BioSimSpace.Process.Sire._setup():

def _setup():
        ...
        self._system = system
        return self._input_files

I don't get such problems. Would that change be viable or would it break something else in the code? As it stands now, only systems with the merged molecule in the beginning work properly and this seems to be unexpected behaviour. Otherwise I think it would make sense to just throw an error during __init__() if that's a hard requirement.

lohedges commented 2 years ago

Ah, bummer. You are right, since we re-order the system for SOMD (perturbable molecule first) the atom numbering will no longer match the passed system. Why is it always that our own program (SOMD) causes us the most issues :-)

I'll try to adjust things so that it indexes the perturbable molecule in the configuration file instead, although it would still need to be renamed. I'll test this here when I get a chance.

lohedges commented 2 years ago

Okay, I've adjusted the code so that we now use the perturbed residue number option to specify the ligand rather than re-ordering things so that it is the first molecule in the system. This means that the input to SOMD will match the orignal system passed to Somd.Process by the user, regardless of where the perturbable molecule is in the original system, and we should still be able to match the re-ordered atoms by number when calling getSystem().

Let me know if this still doesn't work and I'll have another think. As you say, having a hard constraint on the input system (single perturbable molecule that is placed first) is another option, but it would be nice to avoid this if possible.

lohedges commented 2 years ago

FYI: There was actually a bug in SOMD that meant that this approach (using the perturbed residue number wouldn't work properly). See here for details. The fix has now been merged.

msuruzhon commented 2 years ago

Thanks Lester, I am currently always making sure that the ligand is the first molecule in all my systems, so I always circumvent that problem. Since the feature-amber and devel branches have diverged, I haven't had the chance to properly merge them and test the fixes. Maybe we can leave this issue open at least until we merge the two branches and then I will be able to test the fixes more extensively.

msuruzhon commented 2 years ago

Thanks @lohedges this seems to be working now for me as well.