openmm / openmm-torch

OpenMM plugin to define forces with neural networks
185 stars 24 forks source link

Protein unfolds during hybrid ML/MM simulations #34

Open victorprincipe opened 3 years ago

victorprincipe commented 3 years ago

Hi,

I'm trying to run hybrid ML/MM simulations of a protein-ligand system in which the ML model is used for the ligand and a handful of binding site residues. However, certain parts of the protein binding site unfold very quickly, whereas this did not happen when the ML model was used on only the ligand. I was just generally wondering if these ML models can be used for only certain atoms within the same macromolecule, or if they can (currently) only be used for the entirety of a molecule? Here is a part of the code for the production simulation:

from __future__ import print_function
from simtk.openmm import *
from simtk.openmm.app import *
from parmed.openmm import *
from simtk import unit
from sys import stdout
import sys
from openmmml import MLPotential

def production(prmtop_file, crd_file, positions, velocities):
    prmtop = AmberPrmtopFile(prmtop_file)
    inpcrd = AmberInpcrdFile(crd_file)
    #use ANI-2x potential for ML system
    potential = MLPotential('ani2x')
    #index ligand atoms and binding site residue atoms in the solvated Tyk2+ligand system
    ml_atoms=[atom.index for atom in prmtop.topology.atoms() if atom.residue.id== "289"
             or atom.residue.id== "14"
             or atom.residue.id== "95"
             or atom.residue.id== "138"
             or atom.residue.id== "151"
             or atom.residue.id== "92"
             or atom.residue.id== "94"
             or atom.residue.id== "152"
             or atom.residue.id== "139"
             or atom.residue.id== "22"
             or atom.residue.id== "89"
             or atom.residue.id== "12"
             or atom.residue.id== "24"
             or atom.residue.id== "91"
             or atom.residue.id== "93"
             or atom.residue.id== "15"
             or atom.residue.id== "141"
             or atom.residue.id== "90"
             or atom.residue.id== "71"
             or atom.residue.id== "16"
             or atom.residue.id== "17"
             or atom.residue.id== "39"
             or atom.residue.id== "96"]
    #create traditional MM system from topology file
    mm_system = prmtop.createSystem(nonbondedMethod=PME,
    nonbondedCutoff=1.0*unit.nanometers, constraints=HBonds, rigidWater=True,
    ewaldErrorTolerance=0.0005)
    #apply ANI-2x ML potential to the ligand atoms
    ml_system = potential.createMixedSystem(prmtop.topology, mm_system, ml_atoms)
    # NPT for production:
    ml_system.addForce(MonteCarloBarostat(1*unit.atmospheres, 300*unit.kelvin, 25))
peastman commented 3 years ago

In principle it should work for a subset of a molecule, though I haven't tried doing that. One thing to be aware of is that ANI only supports neutral molecules. If any of the residues you're using it for has a charged side chain, it won't work correctly.

victorprincipe commented 3 years ago

Ah that'll be it! I used ANI on a charged Arginine residue so that's probably where it went wrong. I'll try again without using ANI on that residue and see what happens. Thanks!

raimis commented 3 years ago

Currently createMixedSystem uses the mechanical embedding, where ML and MM regions interact only via non-bonded terms. This means that the peptide bonds of all the residues included into the ML region are cut (https://github.com/openmm/openmm-ml/blob/4f0546678eeb9b2703d28d3fce7f678802640701/openmmml/mlpotential.py#L236).

In principle, it would possible to adapt one of QM/MM schemes to handle the covalent bonds between the different regions, but it isn't trivial. Check these reviews for inspiration (https://link.springer.com/article/10.1007/s00214-006-0143-z, http://iopenshell.usc.edu/chem545/lectures2011/QMMM_Thiel_Review_2009.pdf).

raimis commented 3 years ago

@peastman it would make sense createMixedSystem to throw an exception if there is a bond between the ML and MM regions.

peastman commented 3 years ago

I don't think that's quite correct. A bond, angle, or torsion only gets removed if all atoms it involves are in the ML region. Bonds that connect an MM atom to an ML atom are preserved. That said, I don't think this will really give correct results unless your ML potential was specifically parametrized for it. Especially with the torsions, you'll get both ML and MM terms contributing to the same torsion which isn't correct.

For the moment, I think it's safest to stick to only using ML for entire molecules.

victorprincipe commented 3 years ago

Thanks for the suggestions. I've just finished running simulations using ANI on only uncharged binding site residues but the protein still unfolded throughout the simulation. As @peastman said, I guess ML can only be used for entire molecules at the moment.

peastman commented 3 years ago

Hopefully we'll eventually be able to go beyond that. But I think it will require potentials that are specifically designed for it.

raimis commented 2 years ago

This is more issues for https://github.com/openmm/openmm-ml. @peastman could you move it there?