michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

[FEATURE] Restraints #203

Closed lohedges closed 3 years ago

lohedges commented 3 years ago

This issue thread is intended to discuss improved restraint handling during equlibration in BioSimSpace. We currently only support backbone restraints using the restraint_backbone keyword, which relies on correct atom naming for formats supported by the various MD engines in order to work correctly. This makes it hard to take an AMBER format file and run a restrained equlibration in GROMACS, or vice-versa. It is clear that we need to support a wider range of restraint options and provide greater flexibility to the user.

A possible solution is to use a restraint option that could take the values None, "backbone", "heavy", "non-water", etc, as well as a list of atom indices if the user wants to restrain specific atoms, or a combination of the above. The latter would require the ability to get the required indices using our built in search functionality in a consistent and reliable way.

The following describes the way in which the supported engines implement restraints:

Suggestions for how to search for atoms matching a restraint:

lohedges commented 3 years ago

Looking at other tools, e.g. MDTraj, ParmEd, MDAnalysis, they all select backbone atoms by name, mapped to a particular set of residues, also matched by name.

jmichel80 commented 3 years ago

A few comments

I don't think it is possible to do ''backbone'' without using dictionaries of atom names.

lohedges commented 3 years ago

Thanks for the info. Sorry, what I mean about SOMD it wouldn't work easily for the options we propose, since it doesn't have per-atom flexibility. As you say, I don't think you could restraint backbone atoms so I disable restraints with SOMD when equilibrating at present. Since we now support OpenMM, I don't think there's a particular need to use SOMD for non-FEP simulations anyway so it's not too much of an issue.

I agree with your comments in the other thread and will focus on GROMACS and OpenMM as a start.

lohedges commented 3 years ago

OpenMM : setting masses to zero freezes atoms during the integration. This is algorithmically quite different from applying a position restraint, although it may work for the purposes of equilibration.

Yes, I agree it's a little different, but it is what was suggested by the OpenMM folks. It's, apparently, the only way to get the OpenMM restraint system to play nicely when working in the NPT ensemble. See here for some details.

lohedges commented 3 years ago

Ironically it sounds like backbone restraints are the hardest thing here. Perhaps we should support that option but warn the user that it might be problematic if we convert between formats?

jmichel80 commented 3 years ago

that sounds a reasonable compromise

Dr. Julien Michel, Senior Lecturer Room 263, School of Chemistry University of Edinburgh David Brewster road Edinburgh, EH9 3FJ United Kingdom phone: +44 (0)131 650 4797 http://www.julienmichel.net/

On Mon, Apr 19, 2021 at 4:08 PM Lester Hedges @.**@.>> wrote:

Ironically it sounds like backbone restraints are the hardest thing here. Perhaps we should support that option but warn the user that it might be problematic if we convert between formats?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/michellab/BioSimSpace/issues/203#issuecomment-822544505, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACZN3ZGP3KTUCLKQHGYABXTTJRBN5ANCNFSM43F6PTZQ.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

lohedges commented 3 years ago

I just want to check that we shouldn't restrain ions (at least free ions) in any case. Perhaps backbone, heavy, and all are better names, with clear descriptions of what they refer to, i.e. non-water, non-ion, etc.

jmichel80 commented 3 years ago

i would treat ions similarly to water, so if restraining all protein atoms letting ions and water move.

lohedges commented 3 years ago

I've made good progress with this and I've now updated the code to handle the new restraint system for all of the MD engines that we support. The user can now choose from keyword restraint types: "backbone", "heavy", or "all", or explicitly pass a list of atom indices.

The most difficult part was making sure that we can correctly obtain absolute or relative atom indices (absolute to the system, or relative to a molecule) in an efficient way. (We already did some of this for generating PLUMED files, but needed more flexibility.) There were also complications for GROMACS, since you apply restraints for each GROMACS molecule type, so we needed a way of mapping molecules in the system to their respective types. This was possible using the Sire.IO.GroTop parser directly with a little massaging.

One limitation with GROMACS is that, since the restraints apply to molecule types, not molecules, there isn't a way to apply restraints to one molecule of a particular type, i.e. they are applied to all molecules of that type. I'm not sure that this will be a problem in practice since we're only really considering protein-ligand complexes, and solvent / ions aren't considered for the restraints.

As mentioned in the other thread, GROMACS also complains when you turn on restraints for a system that was loaded from non-GROMACS files, e.g.

'gmx grompp' reported the following warnings:
WARNING 1 [file gromacs.top, line 256]:
  1890 non-matching atom names
  atom names from /home/lester/Code/BioSimSpace/demo/tmp/gromacs.top will be
  used
  atom names from /home/lester/Code/BioSimSpace/demo/tmp/gromacs.gro will
  be ignored

Since this is just a warning (which are elevated to errors by default) I simply detect it, and re-run gmx grompp with --maxwarn 1.

Since SOMD doesn't fit with the new restraint system (at least without work), I just raise an exception if the user tries to run a restrained equilibration protocol which suggests that they try a different engine.

Unrelated, but also in this feature branch: I've added support for running metadynamics and steered molecular dynamics simulations using AMBER.

I'll run a some trial simulations tomorrow morning and will make a pull-request when I'm happy.

lohedges commented 3 years ago

Oh, and the restrain_backbone option is flagged as being deprecated. It still works for now, just being converted to restraint="backbone" behind the scenes. If both the restraint and restrain_backbone parameters are set, then restraint takes precedence and a warning is raised.

lohedges commented 3 years ago

Just to note that I've exposed some of the restraint functionality to the user to try to make things as flexible as possible. Hopefully this will facilitate things like:

equilibration with restraints on backbone atoms and ligand

This could be done with:

# Here system is a protein-ligand complex

# Get the indices of backbone atoms in the protein.
backbone = system.getRestraintAtoms("backbone")

# Now get the indices of all ligand indices. (Here the ligand is molecule index 1 in the system.)
ligand = system.getRestraintAtoms("all", 1)

# Create the equilbration protocol.
protocol = BSS.Protocol.Equilibration(restraint=backbone+ligand)
lohedges commented 3 years ago

Unfortunately it turns out that AMBER has a limit of 256 characters for the restraint mask, so we can't flexibly restrain atoms by index. (Even though I'd written some nice code to find contiguous blocks, etc.) This means that we can only really perform restraints for large proteins using a name mask. This isn't too much of an issue for backbone restraints, since those match a name template anyway, but it would mean that we can't reliably handle an arbitrary list of atom indices, or combine matches like shown above. It also means that the code won't work if converting from a different format, where the atoms don't follow the AMBER naming conventions.

lohedges commented 3 years ago

Some discussions and possible kludges in this thread.

kexul commented 3 years ago
# Here system is a protein-ligand complex
# Get the indices of backbone atoms in the protein.
backbone = system.getRestraintAtoms("backbone")
# Now get the indices of all ligand indices. (Here the ligand is molecule index 1 in the system.)
ligand = system.getRestraintAtoms("all", 1)
# Create the equilbration protocol.
protocol = BSS.Protocol.Equilibration(restraint=backbone+ligand)

Hi @lohedges , I've used some similar code with yours but an error occurred.

Here is the code:

import BioSimSpace as BSS
import logging 

def run_process(system, protocol, work_dir):
    """
    Run system with protocol, get the system afterwards
    Parameters
    ----------
    system: sire system
    protocol: BioSimSpace Protocol

    Returns
    -------
    sire system
    """
    process = BSS.Process.Gromacs(system, protocol, work_dir=work_dir, ignore_warnings=True)
    process.addArgs({'-ntmpi':1, '-ntomp':'30', '-gpu_id':0})

    process.start()
    process.wait()
    logging.info(f'Potential energy: {process.getCurrentPotentialEnergy()}')
    system = process.getSystem()
    return system

mols = []
for ligand in ['C1=CC=CC=C1',  'CC1=CC=CC=C1']:
    process = BSS.Parameters.gaff(ligand)
    parametrised_molecule = process.getMolecule()
    mols.append(parametrised_molecule)

mapping = BSS.Align.matchAtoms(mols[0], mols[1])
mols[0] = BSS.Align.rmsdAlign(mols[0], mols[1], mapping)
merged = BSS.Align.merge(mols[0], mols[1], mapping)

solvated_system = BSS.Solvent.tip3p(merged, box=3*[5*BSS.Units.Length.nanometer], work_dir='solvation')

minimise_protocol = BSS.Protocol.Minimisation(250)
minimised_sys = run_process(solvated_system, minimise_protocol, work_dir='minimize')

restraint_heavy = minimised_sys.getRestraintAtoms('heavy')
pert_mol = minimised_sys.getPerturbableMolecules()[0]
restraint_ligand = minimised_sys.getRestraintAtoms('all', minimised_sys.getIndex(pert_mol))

nvt_protocol = BSS.Protocol.Equilibration(runtime=5*BSS.Units.Time.picosecond, 
                                       temperature=300*BSS.Units.Temperature.kelvin, 
                                       restraint=restraint_heavy)

equilibrated_sys = run_process(minimised_sys, nvt_protocol, work_dir='nvt')

Output:

Traceback (most recent call last):
  File "/root/group_ceph/FEP/prepare/test_case/git/test.py", line 56, in <module>
    equilibrated_sys = run_process(minimised_sys, nvt_protocol, work_dir='nvt')
  File "/root/group_ceph/FEP/prepare/test_case/git/test.py", line 24, in run_process
    process = BSS.Process.Gromacs(system, protocol, work_dir=work_dir, ignore_warnings=True)
  File "/root/group_ceph/FEP/prepare/BioSimSpace/Process/_gromacs.py", line 156, in __init__
    self._setup()
  File "/root/group_ceph/FEP/prepare/BioSimSpace/Process/_gromacs.py", line 203, in _setup
    self._generate_config()
  File "/root/group_ceph/FEP/prepare/BioSimSpace/Process/_gromacs.py", line 375, in _generate_config
    self._add_position_restraints(config)
  File "/root/group_ceph/FEP/prepare/BioSimSpace/Process/_gromacs.py", line 2166, in _add_position_restraints
    raise ValueError(msg) from None
ValueError: Unable to find restrained atom in the system?
lohedges commented 3 years ago

Thanks, I'll reproduce here later. I imagine it's something to do with the system being perturbable and the naming for elements (which is used for the search) being different in perturbable and non-perturbable molecules. We used to extract the lambda = 0 state for GROMACS so all molecules would have a property called element. Now we can simulate the perturbable system directly, so the perturbable molecules have a property called element0 instead.

kexul commented 3 years ago

Thanks! 😄

I would like to point out that passing an str type of restraint worked perfectly for a perturbable system.

nvt_protocol = BSS.Protocol.Equilibration(runtime=5*BSS.Units.Time.picosecond, 
                                       temperature=300*BSS.Units.Temperature.kelvin, 
                                       restraint='heavy')
kexul commented 3 years ago

I found that invoke self._reset_mappings() in the beginning of _getRelativeIndices do the trick for me. https://github.com/michellab/BioSimSpace/blob/bb4d0faa989b347158b034e83e1bfdc26266d13e/python/BioSimSpace/_SireWrappers/_system.py#L1490

lohedges commented 3 years ago

Yes, that's the main issue. (It should be done in the constructor, but isn't for some reason. I must have deleted it by accident at some point.) However, it still doesn't work properly due to the presence of perturbable and non-perturbable molecules in the same system. I'm just trying to fix this now.

kexul commented 3 years ago

Yes, that's the main issue. However, it still doesn't work properly due to the presence of perturbable and non-perturbable molecules in the same system. I'm just trying to fix this now.

Get it√ Thanks for your efforts! Much appreciated!

lohedges commented 3 years ago

Okay, I think this is fixed. Let me know if you come across any more issues.

Cheers.

kexul commented 3 years ago

Okay, I think this is fixed. Let me know if you come across any more issues.

Thanks! I'll test it and report.

kexul commented 2 years ago

Hi @lohedges , would the dummy atoms be restrained when using position restraints?

In the following picture, the initial positions of the atoms outlined by two red boxes are near the real atoms, but they behave differently after equilibration with position restraints: 99b12d81d5043223ee8fe4fb3acc114

lohedges commented 2 years ago

Hmmm, good question. The restraint handling is by element, so dummies won't be restrained if you, say, restrain heavy atoms. They should be if done by index, though. It also depends on how the dummies are handled for non-FEP simulations by the various engines, since they aren't dummy atoms in the normal sense. I'll check during the week.

Cheers.

kexul commented 2 years ago

The restraint handling is by element, so dummies won't be restrained if you, say, restrain heavy atoms. They should be if done by index, though.

Thanks, I did use restraint=heavy in NPT equilibration, I'll add an index-based restraint and try again.

lohedges commented 2 years ago

No problem. You can combine restraints so should be able to do something like.

idxs = system.getRestraintAtoms("backbone") + [system.getIndex(atom) for atom in system.search("element XX", property_map={"element" : "element0"})]

(Here I'm assuming the lambda=0 state.)

kexul commented 2 years ago

OK, index-based restraints solved my problem, now the two molecules after equilibration overlap fairly well. c27ad69acd0c72f32d65aa9b324e3dd

idxs = system.getRestraintAtoms("backbone") + [system.getIndex(atom) for atom in system.search("element XX", property_map={"element" : "element0"})]

Thanks for the code, new trick learned 😄

I used the following code since BioSimSpace kindly removes duplicated indexes internally.

sys.getRestraintAtoms('all', sys.getIndex(sys.getPerturbableMolecules()[0])) + sys.getRestraintAtoms('heavy', allow_zero_matches=True)