openmm / openmm

OpenMM is a toolkit for molecular simulation using high performance GPU code.
1.47k stars 514 forks source link

Metadynamics in OpenMM #2126

Closed peastman closed 5 years ago

peastman commented 6 years ago

I have an initial implementation of metadynamics that I'd love to get feedback on (especially from @msultan). It's still only very slightly tested, and there are more features I want to add, but it's basically functional. It requires the latest development code for OpenMM, since it uses a feature I just added this week.

https://web.stanford.edu/~peastman/metadynamics.py

Here's an example of a script that uses it to accelerate sampling.

from simtk.openmm import *
from simtk.openmm.app import *
from simtk.unit import *
from metadynamics import *
import matplotlib.pyplot as plot

# Create a System for alanine dipeptide in water.

pdb = PDBFile('/Users/peastman/workspace/openmm/wrappers/python/tests/systems/alanine-dipeptide-explicit.pdb')
forcefield = ForceField('amber14-all.xml', 'amber14/spce.xml')
system = forcefield.createSystem(pdb.topology, nonbondedMethod=PME, constraints=HBonds)

# Define collective variables for phi and psi.

cv1 = CustomTorsionForce('theta')
cv1.addTorsion(1, 6, 8, 14)
phi = BiasVariable(cv1, -np.pi, np.pi, 0.5, True)
cv2 = CustomTorsionForce('theta')
cv2.addTorsion(6, 8, 14, 16)
psi = BiasVariable(cv2, -np.pi, np.pi, 0.5, True)

# Set up the simulation.

meta = Metadynamics(system, [phi, psi], 300.0*kelvin, 1000.0*kelvin, 1.0*kilojoules_per_mole, 100)
integrator = LangevinIntegrator(300*kelvin, 1.0/picosecond, 0.002*picoseconds)
simulation = Simulation(pdb.topology, system, integrator)
simulation.context.setPositions(pdb.positions)

# Run the simulation and plot the free energy landscape.

meta.step(simulation, 50000)
plot.imshow(meta.getFreeEnergy())
plot.show()
jchodera commented 6 years ago

@peastman : Have you seen our cookiecutter for Python comp chem projects?

https://github.com/choderalab/cookiecutter-compchem

This makes it a breeze to set up a project for GitHub where your users can easily install the code and its dependencies, and includes the infrastructure for travis tests and documentation for free. Might be easier than passing around unversioned Python files via a webserver.

msultan commented 6 years ago

This looks pretty great @peastman . I will try and test this today after i manage to fix my openmm installation which seems to have stopped working for some odd reason.

A few things:

1). I would recommend swapping out the \delta T parameter for the biasfactor which is related to the temp and delta T by a simple transform. This would make the Metadynamics module more compatible with other Metadynamics implementations out there.

2). Having some kind of restart facility is likely going to be critical for even small systems. Easiest way would be to have a HILLS grid file that can store how much bias has been deposited at all seen value of the CV.

3). [Nice to have but not necessary] Multiple walkers that can asynchronously share biases. This would allow users to submit lots of low priority jobs to clusters and run multiple walkers.

Once I have my OpenMM working, I will test the actual script and report on other things. Thanks for doing this! I already have tICA coordinates and even simple NNs implemented as CustomCVs so it should be easy to extend this module very quickly.

peastman commented 6 years ago

I would recommend swapping out the \delta T parameter for the biasfactor which is related to the temp and delta T by a simple transform.

Ok, that's simple to do.

Having some kind of restart facility is likely going to be critical for even small systems.

That just means writing out and reloading the _bias field. I can add convenience functions for doing that.

Multiple walkers that can asynchronously share biases. This would allow users to submit lots of low priority jobs to clusters and run multiple walkers.

Can you describe how that would work? Are you imagining something that would automatically integrate with the queuing system, manage MPI communication between nodes, etc.? If so, that's probably beyond the scope of this script, but a much more comprehensive application could do it.

msultan commented 6 years ago

Can you describe how that would work? Are you imagining something that would automatically integrate with the queuing system, manage MPI communication between nodes, etc.? If so, that's probably beyond the scope of this script, but a much more comprehensive application could do it.

I was thinking like a shared folder where each walker writes out its bias every N timesteps and reads the deposited bias written out by the other walkers. The outside application could write nice wrappers around it but there needs to be a core functionality to be able to read the biases deposited by other walkers without the need for MPI communication.

This would also help get cluster resources more easily since its far easier to get 10 single gpu jobs on most compute clusters than 1 ten gpu job.

peastman commented 6 years ago

That would be pretty simple to implement. The idea is that you would start all the jobs independently, possibly at different times? Each job would periodically add a file to the directory with its latest additions to the bias, and load in any new files that have been created since the last time it checked?

msultan commented 6 years ago

Yep exactly , there would be a new hyper param which controls how often you want the biases to be synced and how many walkers the current walker should expect to check.

The tricky bit is to make sure that we only load the delta biases between cars consecutive checks but I am sure you can think of an efficient way to do that. I think plumed writes out the Unix timestamp and then reads only the biases after the previous check.

Sent from my iPhone

On Jul 22, 2018, at 3:34 PM, peastman notifications@github.com wrote:

That would be pretty simple to implement. The idea is that you would start all the jobs independently, possibly at different times? Each job would periodically add a file to the directory with its latest additions to the bias, and load in any new files that have been created since the last time it checked?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

peastman commented 6 years ago

Does it matter how many walkers there are? Or even if they're running at the same time? All it needs to know is that new files will occasionally appear in a particular directory. The bias it applies should be the sum of all files in that directory, plus any that it has added since it last checked.

ljmartin commented 6 years ago

Would it be possible to run multiple independent metadynamics biases within a single simulation? This is useful in proteins with rotational symmetry such as ion channels. Multiple metaD repeats (using identical binding sites around the protein) can be run in a single simulation this way.

Thanks for providing this!

peastman commented 6 years ago

So basically instead of having one tabulated function that records the bias as a function of all CVs, it would add several functions that each applied a bias as a function of just a few CVs? In principle that should be simple. I'd have to think about the interface, and how saving and loading biases would work.

ljmartin commented 6 years ago

Thats correct. Its analogous to multiple walkers but instead of having multiple simulations with a shared bias, you have one simulation with multiple biases. In PLUMED it's handled by writing to multiple, separate HILLS files that are all defined individually (along with the biases) in the input. Anyway its really useful already so cheers plumed file example with five biases:

COM ATOMS=27405,27415,27436,27455,27472,27486,27498,27510,27529 LABEL=aot
COM ATOMS=27541,27551,27572,27591,27608,27622,27634,27646,27665 LABEL=bot
COM ATOMS=27677,27687,27708,27727,27744,27758,27770,27782,27801 LABEL=cot
COM ATOMS=27813,27823,27844,27863,27880,27894,27906,27918,27937 LABEL=dot
COM ATOMS=27949,27959,27980,27999,28016,28030,28042,28054,28073 LABEL=eot
posa: POSITION ATOM=aot
posb: POSITION ATOM=bot 
posc: POSITION ATOM=cot
posd: POSITION ATOM=dot
pose: POSITION ATOM=eot
mda: METAD ARG=posa.z PACE=1000 HEIGHT=0.05 SIGMA=0.1 FILE=HILLS.a GRID_MIN=-4.5 GRID_MAX=7.5
mdb: METAD ARG=posb.z PACE=1000 HEIGHT=0.05 SIGMA=0.1 FILE=HILLS.b GRID_MIN=-4.5 GRID_MAX=7.5
mdc: METAD ARG=posc.z PACE=1000 HEIGHT=0.05 SIGMA=0.1 FILE=HILLS.c GRID_MIN=-4.5 GRID_MAX=7.5
mdd: METAD ARG=posd.z PACE=1000 HEIGHT=0.05 SIGMA=0.1 FILE=HILLS.d GRID_MIN=-4.5 GRID_MAX=7.5
mde: METAD ARG=pose.z PACE=1000 HEIGHT=0.05 SIGMA=0.1 FILE=HILLS.e GRID_MIN=-4.5 GRID_MAX=7.5
peastman commented 6 years ago

Here's an updated version:

https://simtk.org/svn/pyopenmm/trunk/simtk/pyopenmm/extras/metadynamics.py

This includes all the changes suggested in https://github.com/pandegroup/openmm/issues/2126#issuecomment-406885164.

hsidky commented 6 years ago

Hi! I just came across this.

I'm a postdoc in Andy Ferguson's group at UChicago and I actually was awarded a MolSSI software fellowship to do just this + some more stuff. I plan to implement an OpenMM plugin for adaptive biasing methods. It would include MetaD, umbrella sampling, ABF, and a number of other methods too, some of which I've developed in a package I worked on previously (https://aip.scitation.org/doi/abs/10.1063/1.5008853) . I also want to offer native support for TF computational graphs, custom functions with autodiff, and a few other things which would really make it easy for folks to rapidly prototype CVs and sample on machine-learned or on-the-fly CVs.

@peastman Would you be interested in/open to me integrating what you've already written into this plugin? If you're not itching to release this ASAP, I will begin work on the plugin next week and I'm already supported for the next 6 months, at least. This week I'm currently at the MolSSI HQ for a bootcamp. I will definitely appreciate your support because there are a few technical challenges that I anticipate.

@msultan It would be wonderful if you are interested in contributing as well since you have a lot of experience with this too!

If this all sounds good let me know your preferred communication channel, as I suspect this open issue isn't it.

Regards, Hythem

peastman commented 6 years ago

Awesome! I do plan to keep developing this code, and it likely will eventually be integrated into OpenMM. But you should feel free to make use of it in your own plugin. Especially if you're creating an actual plugin so it can be used from C++ as well as Python, that would be very useful. The code is MIT licensed.

I also want to offer native support for TF computational graphs, custom functions with autodiff, and a few other things which would really make it easy for folks to rapidly prototype CVs and sample on machine-learned or on-the-fly CVs.

Very neat! I created a very early proof of concept of a plugin for neural networks: https://github.com/pandegroup/openmm/issues/1995. I implemented it with Caffe2, but immediately started running into issues with the lack of maturity of the Caffe2/PyTorch codebase. I've thought about switching it to use Tensorflow instead, but haven't gotten around to actually doing it. Plus Tensorflow has its own problems that would make things more difficult than they ought to be.

hsidky commented 6 years ago

Great, and thanks for the info!

msultan commented 6 years ago

So i will be limited to working on this during the weekends since I am currently busy with my job. However, I am more than happy to help in any small way whenever I have time.

ljmartin commented 5 years ago

Hi all, I tried using this and came across an error with the units module. Here's the code:

from simtk.openmm import *
from simtk.openmm.app import *
from simtk.unit import *
from metadynamics import *

n_steps = 2000
system = System()
system.addParticle(1)
force = CustomExternalForce('10*(x-1)^2*(x+1)^2 + y^2 + z^2')
force.addParticle(0, [])
system.addForce(force)

cv1 = CustomCentroidBondForce(1, 'x1')
cv1.addGroup([0])
cv1.addBond([0])
x = BiasVariable(cv1, -10, 10, 0.5, True)

meta = Metadynamics(system, [cv1], 300.0*kelvin, 1000*kelvin, 1.0*kilojoules_per_mole, 100)

which returns:

 File "run.py", line 22, in <module>
    meta = Metadynamics(system, [cv1], 300.0*kelvin, 1000*kelvin, 1.0*kilojoules_per_mole, 100)
  File "/Users/lmar3213/Documents/notebooks/metadynamics/metadynamics.py", line 76, in __init__
    if biasFactor < 1.0:
  File "/Users/lmar3213/miniconda3/lib/python3.6/site-packages/simtk/unit/quantity.py", line 288, in __lt__
    return self._value < (other.value_in_unit(self.unit))
AttributeError: 'float' object has no attribute 'value_in_unit'

Seemed pretty easy to fix so I changed the line if biasFactor < 1.0: to if biasFactor/unit.kelvin < 1.0: But then the same script above returns:

Traceback (most recent call last):
  File "run.py", line 22, in <module>
    meta = Metadynamics(system, [cv1], 300.0*kelvin, 1000*kelvin, 1.0*kilojoules_per_mole, 100)
  File "/Users/lmar3213/Documents/notebooks/metadynamics/metadynamics.py", line 91, in __init__
    self._selfBias = np.zeros(tuple(v.gridWidth for v in variables))
  File "/Users/lmar3213/Documents/notebooks/metadynamics/metadynamics.py", line 91, in <genexpr>
    self._selfBias = np.zeros(tuple(v.gridWidth for v in variables))
  File "/Users/lmar3213/miniconda3/lib/python3.6/site-packages/simtk/openmm/openmm.py", line 7308, in <lambda>
    __getattr__ = lambda self, name: _swig_getattr(self, CustomCentroidBondForce, name)
  File "/Users/lmar3213/miniconda3/lib/python3.6/site-packages/simtk/openmm/openmm.py", line 74, in _swig_getattr
    return _swig_getattr_nondynamic(self, class_type, name, 0)
  File "/Users/lmar3213/miniconda3/lib/python3.6/site-packages/simtk/openmm/openmm.py", line 69, in _swig_getattr_nondynamic
    return object.__getattr__(self, name)
AttributeError: type object 'object' has no attribute '__getattr__'

and so now Im lost :( any hints would be appreciated

jchodera commented 5 years ago

It looks like @peastman updated the arguments to the Metadynamics function to read

def __init__(self, system, variables, temperature, biasFactor, height, frequency, saveFrequency=None, biasDir=None):
        """Create a Metadynamics object.

        Parameters
        ----------
        system: System
            the System to simulate.  A CustomCVForce implementing the bias is created and
            added to the System.
        variables: list of BiasVariables
            the collective variables to sample
        temperature: temperature
            the temperature at which the simulation is being run.  This is used in computing
            the free energy.
        biasFactor: float
            used in scaling the height of the Gaussians added to the bias.  The collective
            variables are sampled as if the effective temperature of the simulation were
            temperature*biasFactor.
        height: energy
            the initial height of the Gaussians to add
        frequency: int
            the interval in time steps at which Gaussians should be added to the bias potential
        saveFrequency: int (optional)
            the interval in time steps at which to write out the current biases to disk.  At
            the same it it writes biases, it also checks for updated biases written by other
            processes and loads them in.  This must be a multiple of frequency.
        biasDir: str (optional)
            the directory to which biases should be written, and from which biases written by
            other processes should be loaded
        """

so you probably want

meta = Metadynamics(system, [cv1], 300.0*kelvin, 1000./300., 1.0*kilojoules_per_mole, 100)

@peastman: How about we put this in some sort of toolkit that people can conda install along with some examples? Maybe openmmtools?

peastman commented 5 years ago

How about we put this in some sort of toolkit that people can conda install along with some examples?

I plan for this to eventually become part of OpenMM.

ljmartin commented 5 years ago

Thanks for that, my example code was a bit under done.

So I tried to come up with a small test case to try it and I have one that works, but it seems to become really slow when adding a second (or super slow with a third) biased variable:

from simtk.openmm import *
from simtk.openmm.app import *
from simtk.unit import *
from metadynamics import *
import matplotlib.pyplot as plot
import datetime

system = System()
system.addParticle(1)
force = CustomExternalForce('5*(x-1)^2*(x+1)^2 + y^2 + z^2')
force.addParticle(0, [])
system.addForce(force)

topology = Topology()
chain = topology.addChain()
residue = topology.addResidue('BAH', chain)
topology.addAtom('C', element.carbon, residue)

topology.setPeriodicBoxVectors( ((100, 0, 0), (0, 100, 0), (0, 0 ,100)) )

cv1 = CustomExternalForce('x')
cv1.addParticle(0, [])
x = BiasVariable(cv1, -10, 10, 0.5, False)

cv2 = CustomExternalForce('y')
cv2.addParticle(0, [])
y = BiasVariable(cv2, -10, 10, 0.5, False)

#cv3 = CustomExternalForce('z')
#cv3.addParticle(0, [])
#z = BiasVariable(cv3, -10, 10, 0.5, False)

meta = Metadynamics(system, [x], 300.0*kelvin, 3, 1.0*kilojoules_per_mole, 10)

integrator = mm.LangevinIntegrator(300, 1, 0.02)
coordlist = list()
simulation = Simulation(topology, system, integrator)
simulation.context.setPositions([[0, 0, 0]])

print('starting')
before = datetime.datetime.now()
for i in range(0,20):
    print(i, end='\r')
    coordlist.append(simulation.context.getState(getPositions=True).getPositions(asNumpy=True)._value)
    meta.step(simulation, 1)

print(datetime.datetime.now()-before)
print('done')

With just the [x] variable being biased, the output is:

starting
0:00:00.026917
done

but changing [x] to [x, y] to the meta line gives:

starting
0:00:04.144451
done

I tried three variables but it's been sitting for a few minutes with no response. Am I doing it wrong or is that normal behaviour?

peastman commented 5 years ago

Consider what you've asked it to do. You told it that each variable can be anywhere in a 20 nm range (from -10 to 10), and that the width of the Gaussians should be only 0.5 nm. Since you didn't tell it how many grid points to use, it defaults to making the grid spacing 1/5 of the Gaussian width. So you have 200 grid points along each axis. That's fine with only one variable, but with three variables you get a 200x200x200 table: 8 million total grid points. And then you tell it to update that table (which means recomputing the spline coefficients) every 10 steps!

If you want to have three bias variables, you need to reduce the size of the table along each axis. Use a smaller range, a larger Gaussian width, or both. And you probably also want to update the table less frequently. After it adds a Gaussian to the potential, you should give it enough time to diffuse away before you add the next one.

ljmartin commented 5 years ago

Got it! Thanks. Is the entire table updated every hill addition or just the local area? I tried to create a grid size that is reasonably comparable to PLUMED2 which, from memory, doesn't see such a big difference when going from 1 to 2 to 3 variables. I'll try a direct comparison using one system now.

peastman commented 5 years ago

OpenMM uses natural cubic splines to interpolate tabulated functions, and those are nonlocal. So even if you just modify a local region, it still has to recalculate spline coefficients outside that region.

ljmartin commented 5 years ago

So I made a test system of a water box + ions in VMD and biased one ion across x,y,z dimensions using either metadynamics.py or openmm_plumed via Plumed2.4. Hopefully I haven't misunderstood the documentation in plumed or metadynamics.py, but I think they have the same grids (6nmx6nmx6nm with grid size being 1/5th sigma=0.25nm). The openmm_plumed version gets ~90ns/day on my machine and the metadynamics.py version gets ~0.7ns/day. It seems to be the self._force.updateParametersInContext(context) within _addGaussian that's the bottleneck, taking ~50 seconds every 200steps (which is the stride length).

There's certainly an argument to be made that this is a bad colvar setup, but there are situations where there is a small sample-able volume that requires a grid of a large size, and I guess it still works even as a comparative exercise.

Necessary docs are attached, I used openmm 7.2.1 for openmm_plumed and 7.3 for the metadynamics.py. metad_test.tar.gz

jchodera commented 5 years ago

We've also found poor performance with CustomCVForce in general even if few atoms are involved in collective variables, and are hoping this can be sped up somehow!

jchodera commented 5 years ago

I plan for this to eventually become part of OpenMM.

@peastman: But how do we suggest modifications, extensions, or changes to it in the interim? Mailing around code is not a good interim solution. What license does it have? People can't actually use, modify, or redistribute it legally without one. It should have revision-controlled home and clearly delineated license somewhere until it is mature enough to find its way into OpenMM proper.

peastman commented 5 years ago

It's in the PyOpenMM repository, which is version controlled, same as some of the other extensions. It's MIT licensed, like everything in that repository.

jchodera commented 5 years ago

Here's the trunk, which doesn't seem to contain an explicit license file. Also, maybe it's time to migrate that to github?

peastman commented 5 years ago

I don't think this problem has anything to do with CustomCVForce in general being slow. It's really the cost of rebuilding the splines. Plumed probably uses some type that have local support, so it can just recalculate coefficients in a small area.

jiayeguo commented 5 years ago

I have been playing with the updated metadynamics.py recently (which is quite helpful) and have two suggestions for potential improvement:

  1. groups={31} could be change to groups=int--- according to #1408, the “groups” parameter in getState() accepts 32-bit int (where the exponent is the real group number).

  2. dist = np.abs(np.arange(0, 1, 1.0/v.gridWidth) - x) assumes that the length of dist is equal to v.gridWidth, which is not always the case due to floating-point rounding error (I tested v.gridWidth in [1, 2, ..., 2000] and in 140 cases len(dist) = v.gridWidth + 1). I would recommend using dist = np.abs(np.linspace(0, 1.0, num = v.gridWidth) - x) instead for better stability.

peastman commented 5 years ago

I would recommend using dist = np.abs(np.linspace(0, 1.0, num = v.gridWidth) - x) instead for better stability.

Good point, thanks! I'll fix that.

jchodera commented 5 years ago

@jiayeguo and @peastman : What would you think about us putting an implementation of metadynamics into our openmmtools toolkit?

molecularmachinist commented 5 years ago

Excuse me for speaking out of turn, but that would be awesome.

On Sat, Feb 2, 2019 at 6:23 PM John Chodera notifications@github.com wrote:

@jiayeguo https://github.com/jiayeguo and @peastman https://github.com/peastman : What would you think about us putting an implementation of metadynamics into our openmmtools https://openmmtools.readthedocs.io/en/latest/ toolkit?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pandegroup/openmm/issues/2126#issuecomment-459982555, or mute the thread https://github.com/notifications/unsubscribe-auth/AFtyaKb7CgJKLYYD70naY4p7K9FECkm2ks5vJcmSgaJpZM4VZZCX .

-- Shreyas Sanjay Kaptan

jiayeguo commented 5 years ago

@jiayeguo and @peastman : What would you think about us putting an implementation of metadynamics into our openmmtools toolkit?

Yes I think that's a good idea.

peastman commented 5 years ago

As noted above (https://github.com/pandegroup/openmm/issues/2126#issuecomment-431720989), my plan to is add this to OpenMM. I initially released it separately just to get feedback on the design and let it mature a bit before making it an official feature.

peastman commented 5 years ago

This is now merged into the main repository (#2352).

VysakhRamachandran commented 1 year ago

How do I restart metadynamics simulation that was running in openmm.app.metadynamics.Metadynamics?

peastman commented 1 year ago

Please open a new issue, rather than adding unrelated questions to old issues that have already been closed. Be sure you've read the API documentation, which described how to use it.