michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

[TUTORIAL] Funnel metadynamics #194

Open lohedges opened 3 years ago

lohedges commented 3 years ago

This is a thread to discuss the creation of a tutorial showing how to implement funnel metadynamics within BioSimSpace.

jmichel80 commented 3 years ago

Adding current diagram summarising the use case:

funnel_md-white

dlukauskis commented 3 years ago

I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.

  1. Missing definition for the ligand center of mass, after the definition of p1 and p2 points. lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id

  2. Missing sigma parameter for the extent CV, so instead of metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...

    it should be

    metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...

lohedges commented 3 years ago

Thanks for catching these. I'll be able to push a fix tomorrow.

On Mon, 22 Mar 2021, 13:02 dlukauskis, @.***> wrote:

I've ran a fun-metaD sim that was setup for gromacs and I got a few errors.

1.

Missing definition for the ligand center of mass, after the definition of p1 and p2 points. lig: COM ATOMS=first_ligand_atom_id-last_ligand_atom_id 2.

Missing sigma parameter for the extent CV, so instead of metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025 ...

it should be

metad: METAD ARG=pp.proj,pp.ext SIGMA=0.025,0.05 ...

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/michellab/BioSimSpace/issues/194#issuecomment-804043374, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE6K3IIUF5ATUZHWHMVRDDTE45W3ANCNFSM4YPIREAQ .

dlukauskis commented 3 years ago

I've made some functions that draw the funnel for NGLview, here's the code. The idea behind using NGLview is to rapidly check that the funnel points out the right way and includes/excludes the right bits of the protein. @lohedges, could you have a go at making my functions built into BSS code?

lohedges commented 3 years ago

Thanks, @dlukauskis. I've fixed the issues with the GROMACS PLUMED file and have added a check so that the ProjectionOnAxis.cpp file is only copied to the working directory when the PLUMED version is less than 2.7.

The plotting functionality looks great. I'll start implementing this now. Just to check... you define three vectors: origin, v1, and v2. Both origin and v1 are identical in this case. Which system did you use to define the values of v1 and v2? Are these just the COM of p0 and p1 used to define the funnel CV? (From reading the function doc strings, it would seem so.) I see you include a solvated.pdb file. Does this correspond to one of the systems from your original tutorial?

dlukauskis commented 3 years ago

Oh yeah, origin = v1 = p1 and v2 = p2. This was based on some older bit of code so I forgot to remove all of the redundancies like this. Solvated.pdb is 2WI3, which is almost identical to 2WI2 from the tutorial, except for a slightly different binding pose.

lohedges commented 3 years ago

I've now implemented the visualisation code natively in BioSimSpace. You can do the following to generate a BioSimSpace.Notebook.View object, e.g. using files from your tutorial:

import BioSimSpace as BSS

# Load the system.
system = BSS.IO.readMolecules(["input/2WI2.prmtop", "input/2WI2.rst7"])

# Create the funnel parameters.
p0, p1 = BSS.Metadynamics.CollectiveVariable.makeFunnel(system)

# Define the collective variable.
cv = BSS.Metadynamics.CollectiveVariable.Funnel(p0, p1)

# Create a view.
view = BSS.Metadynamics.CollectiveVariable.viewFunnel(system, cv)

This can then be used to visualise the funnel.

View the entire solvated system and funnel:

view.system()

View the protein, ligand, and funnel: (Assuming the protein and ligand are the first two molecules)

view.molecules([0, 1, -1])

Just view the funnel:

view.molecule(-1)

Let me know if I've missed anything, or if you come across any issues.

dlukauskis commented 3 years ago

Yep, that visualisation code works really well! I noticed one issue with the PLUMED file still, where line 473 in _plumed.py colvar_string = "lig: COM=%d-%d" % (idx+1, idx+num_atoms) should be colvar_string = "lig: COM ATOMS=%d-%d" % (idx+1, idx+num_atoms)

I'm also attaching a metadynamics.py for OpenMM 7.4.2 that allows to print hill heights. metadynamics.zip

lohedges commented 3 years ago

Thanks, I'll try to figure out how we can monkey-patch the version of OpenMM that we bundle (either in the binary or from conda). It looks like nothing has changed in the metadynamics code for OpenMM 7.5, so this should work for the conda-forge build of OpenMM too.

dlukauskis commented 3 years ago

This is a slightly better version of metadynamics.py that I sent yesterday, removed some lines that were in the older version for testing and debugging. Also, checkout the changes I've made to the typical OpenMM run.py that makes it write a HILLS file. run.zip

lohedges commented 3 years ago

Fantastic, thanks for these. I'll try to incorporate the changes tomorrow morning and report on progress during the meeting. For simplicity I'll probably bundle your modified metadynamics script and copy it to the working directory at run-time, e.g. like we do for the additional PLUMED file.

lohedges commented 3 years ago

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

lohedges commented 3 years ago

We might also need to write the OpenMM colvar to a text file, but that should be easy enough to do, or check for the COLVAR.npy file and convert it within the PLUMED code (or just use the NumPy binary directly).

dlukauskis commented 3 years ago

Okay, I think I've updated our OpenMM metadynamics driver to add the implementation for your PLUMED compatible HILLS file. Could you take a look at the openmm.py script that is generated to see if it looks okay?

Sure.

When you get a chance, could you upload some output from a simulation (COLVAR and HILLS files) so that I can check that it works with our current PLUMED analysis wrappers. Once that works, I'll expose those to the OpenMM process object so the user can perform on-the-fly analysis of the funnel metadynamics simulation.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would. HILLS file written by my code is formatted to be identical to HILLS made by PLUMED. I've tested getting 2D and 1D FES from my HILLS by feeding that file into PLUMED and it spits out FES that are identical to ones made by OpenMM. Here's the zipped HILLS file.

lohedges commented 3 years ago

Thanks, I'm glad that the FES agree with OpenMM :+1: I'll run the HILLS file through our own sum_hills wrapper when I get a chance to make sure all is okay.

The COLVAR.npy only contains the CV values, proj and ext, and the hill height in rows 0 and 1 and 2, respectively. It doesn't contain all the same info that a COLVAR file produced with PLUMED would.

That's okay. I only use the file for getting instantaneous and time series values of the collective variables, with some tricks to make sure that I extract the correct variables based on their names in the original PLUMED file. I should just be able to use the information in your file for the same purposes. I could just use the HILLS file too, although in general (using PLUMED) I write to the COLVAR more frequently than HILLS.

dlukauskis commented 3 years ago

I'm trying to use OpenMM to minimize and equilibrate as part of my fun-metaD tutorial. I've ran into an issue with minimization using OpenMM, just giving me a generic simtk.openmm.OpenMMException: Particle coordinate is nan Then I checked the RST7 and PRM7 files and they look really off when I visualise them using VMD. The protein and the ligand bonds are fine, but the solvent seems broken. This might be an issue with read/write of the files. Here are the inputs.

lohedges commented 3 years ago

Was the system completely prepared in BIoSimSpace, or was this a fully solvated system that you loaded in initially? If the latter, could you upload those files too. We do some internal conversions to make sure that waters are formatted correctly for different MD engines, so something might have gone wrong here? (I've tested that I get the same doing round-trip conversions this way, though.)

dlukauskis commented 3 years ago

Here are all the inputs and the script that I used to set up and run the minimization. I started from a protein PDB with some water I discard and a ligand MOL2.

lohedges commented 3 years ago

Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.

lohedges commented 3 years ago

Actually, we don't convert the water topology when running with OpenMM, only with AMBER or GROMACS. This means that OpenMM is using AMBER format files with a GROMACS water topology. If I explicitly convert the topology then it still fails, i.e. doing the following before creating the process.

solvated._set_water_topology("AMBER")

I'll try using the GROMACS files directly with OpenMM to see whether that works.

I'm not sure of the issue with this system, since this is the basic setup for pretty much all of our simulations and I haven't come across many blowups like this.

lohedges commented 3 years ago

I think it's an issue with the protein. If I just solvate the protein, I see the same blow up. If I just solvate the ligand, then it works.

lohedges commented 3 years ago

For reference, the minimisation crashes if you just use the protein in vacuum, so I don't think it's anything specifically to do with the water molecules in the system.

lohedges commented 3 years ago

Running the minimisation with GROMACS (using the solvated system directly) works so something must be going wrong with the conversion to AMBER, which is what is used for OpenMM. I'll see if preventing the water model conversion makes any difference.

For reference, GROMACS doesn't immediately crash, but the energies that it reports do look somewhat suspect.

dlukauskis commented 3 years ago

I've abandoned the protein PDB that was causing the issues and chose another one. Minimisation works fine now. I caught a bug that occurs when you try to run equilibration with CUDA as a platform, see my pull request.

I now get a strange issue when using process.getSystem() simply doesn't work. I'm attaching a zip with input files and the script.

lohedges commented 3 years ago

I'll try to test this later. I see that there are multiple process.getSystem() calls in your script. Are they all failing, or just the final one for equilibration? I see that you have files corresponding to the minimised system in the directory, so assume that the minimisation ran file. It might not be returning anything since the equilibration failed, so you might want to wrap the calls to process.getSystem() inside of a block that checks to see if the process raised an error, e.g.:

if not process.isError():
    equilibrated = process.getSystem()
else:
    print(process.stdout(50))
    raise Exception("The process failed!")

(It should warn you if an error occured when calling getSystem though.)

Another possiblilty is that the trajectory frame indexing is incorrect so it can't find the last frame when calling getSystem, so returns nothing instead. I'll test this with a short simulation.

lohedges commented 3 years ago

Just tested a shorter CPU simulation of 0.01 nanoseconds and it works for me, i.e. I don't get any errors and there are equilibrated.* files in my working directory. Could you check the version that you are using since I believe that I did make modification to the way that I call getSystem for the OpenMM driver. Since it reads a trajectory to get the most recent frame, I've made tweaks so that it doesn't need to load the whole trajectory into memory.

I'll try a longer simulation in case MDTraj/MDAnalysis are simply failing when the trajectory is too large. If this is the problem, then you might want to adjust the report_interval and restart_interval that are passed to the protocol constructor. By default we report quite frequently, so you might want to adjust these for a longer simulation.

lohedges commented 3 years ago

I've abandoned the protein PDB that was causing the issues and chose another one.

Yes, this is rather weird. Trust us to choose a system that misbehaves!

I did a bit more testing and the protein in vacuum works with AMBER and GROMACS, but not OpenMM. If I actually take the files generated by tLEaP and use those with OpenMM directly, i.e. not loading via BioSimSpace and writing back out, then it fails with the same error. As such, I don't think it's an issue with the way that we're parsing the file, rather something peculiar about this particular system with OpenMM. Perhaps it is more sensitive to the accuracy of the initial coordinates for some reason, or its own parser is misinterpreting things?

The protein and the ligand bonds are fine, but the solvent seems broken.

I think this because these are AMBER format files containing a GROMACS format water topology, i.e. naming conventions, etc. We don't care about the naming internally (neither does GROMACS or OpenMM) but some third-party tools do, and there is a difference between the way that fast three-point waters are represented. I imagine VMD is expecting AMBER format waters when you load an AMBER topology file.

Since we're using AMBER format files with OpenMM, I've updated the code so that it converts the water topology before write to keep things consistent. (These edits are in a feature branch that I'll merge across on Monday. It shouldn't affect the current simulations, though.)

I've also tested that process.getSystem() returns correctly from the minimisation stage and all seems okay at my end. (They way getSystem works is slightly different depending on the protocol, owing to the different files that are written.)

dlukauskis commented 3 years ago

Minimisation process.getSystem() worked fine for me as well. I tried running the equilibration for 0.01 ns and using the CPU and it did indeed work. Then I tried the 0.01 ns with CUDA and that worked too. As soon as I switch to 0.1 ns with either CPU or CUDA, I get system return error I mentioned. If I set report_interval=1000, restart_interval=1000, the problem persists despite the trajectory having only 50 frames.

lohedges commented 3 years ago

I wonder if there's a rounding issue with the way I compute the fraction of the trajectory that has completed in order to work out the current frame when you run getSystem. Just checking now. Will report back when I know more.

lohedges commented 3 years ago

Yes, I think that's the issue. It is caught as an exception in getFrame, but in getSystem I use a try/except block since you don't want it to fail when running interactively if nothing has been written to the trajectory file yet. I'll just cap the frac_complete at 1 in getSystem so that we never ask for a frame outside of the limits.

(OpenMM is reporting a time of slightly greater than num_steps * timestep for the last step.)

lohedges commented 3 years ago

I think this should now be fixed. I've pushed to the feature_steered_md branch, rather than devel. You can either pull from there, or wait until I merge across the changes once Adele has tested. (Probably Monday.)

dlukauskis commented 3 years ago

Can confirm that your changes to _openmm.py can extract the last frame now.

dlukauskis commented 3 years ago

I'm testing the fun-metaD part now and I got some bugs involving printing the HILLS file. Instead of

# Run the simulation.
steps = 500000
cycles = 500
steps_per_cycle = int(500000/cycles)
last_index = 0
for x in range(0, cycles):
    meta.step(simulation, steps_per_cycle)
    current_cvs = np.array(list(meta.getCollectiveVariables(simulation)) + [meta.getHillHeight(simulation)])
    colvar_array = np.append(colvar_array, [current_cvs], axis=0)
    np.save('COLVAR.npy', colvar_array)
    for index in range(last_index, np.shape(colvar_array)[0]):
        line = colvar_array[index]
        time = int(record_colvar_every.value_in_unit(picoseconds) * index)
        write_line = f'{time:15} {line[0]:20.16f} {line[1]:20.16f}          {sigma_proj}           {sigma_ext} {line[2]:20.16f}            {bias}\n'
        file.write(write_line)
    last_index = index

It should be

# Run the simulation.
steps = 500000
cycles = 500
steps_per_cycle = int(500000/cycles)
for x in range(0, cycles):
    meta.step(simulation, steps_per_cycle)
    current_cvs = np.array(list(meta.getCollectiveVariables(simulation)) + [meta.getHillHeight(simulation)])
    colvar_array = np.append(colvar_array, [current_cvs], axis=0)
    np.save('COLVAR.npy', colvar_array)
    line = colvar_array[x]
    time = x*2
    write_line = f'{time:15} {line[0]:20.16f} {line[1]:20.16f}          {sigma_proj}           {sigma_ext} {line[2]:20.16f}            {bias}\n'
    file.write(write_line)

Obviously, the time = x*2 needs to be smart about the timestep, here it's just default 2 fs.

lohedges commented 3 years ago

Thanks, I'll fix this after lunch.

dlukauskis commented 3 years ago

I noticed the HILLS file is missing one last line e.g. the final line is 998 ps, out of a 1000 ps simulation. I checked and this bit of the code fixes this issue:

# Create PLUMED compatible HILLS file.
file = open('HILLS','w')
file.write('#! FIELDS time pp.proj pp.ext sigma_pp.proj sigma_pp.ext height biasf\n')
file.write('#! SET multivariate false\n')
file.write('#! SET kerneltype gaussian\n')

# Initialise the collective variable array.
current_cvs = np.array(list(meta.getCollectiveVariables(simulation)) + [meta.getHillHeight(simulation)])
colvar_array = np.array([current_cvs])

line = colvar_array[0]
time = 0
write_line = f'{time:15} {line[0]:20.16f} {line[1]:20.16f}          {sigma_proj}           {sigma_ext} {line[2]:20.16f}            {bias}\n'
file.write(write_line)

# Run the simulation.
steps = 500000
cycles = 500
steps_per_cycle = int(steps/cycles)
for x in range(0, cycles):
    meta.step(simulation, steps_per_cycle)
    current_cvs = np.array(list(meta.getCollectiveVariables(simulation)) + [meta.getHillHeight(simulation)])
    colvar_array = np.append(colvar_array, [current_cvs], axis=0)
    np.save('COLVAR.npy', colvar_array)
    line = colvar_array[x+1]
    time = (x+1)*2
    write_line = f'{time:15} {line[0]:20.16f} {line[1]:20.16f}          {sigma_proj}           {sigma_ext} {line[2]:20.16f}            {bias}\n'
    file.write(write_line)
lohedges commented 3 years ago

Just to check:

time = x*2

Obviously, the time = x*2 needs to be smart about the timestep, here it's just default 2 fs.

The time unit should be picoseconds, right?

dlukauskis commented 3 years ago

Yes, PLUMED uses picoseconds by default.

lohedges commented 3 years ago

I think I've implemented the changes that you suggested. I'm still working on the feature branch at the moment.

dlukauskis commented 3 years ago

Thanks for that, also have a look at my comment.

lohedges commented 3 years ago

I've now merged across into devel and will make further fixes there.

dlukauskis commented 3 years ago

I'm testing the entire pipeline right now and things seem to be running smoothly. I noticed that the fun-metaD sim crashes with default funnel parameters and, looking at the COLVAR, it has to do with the length of the funnel (upper_bound). What happens is when the funnel is too long with respect to the box, the ligand can diffuse across the periodic boundary of the box and go from say proj = 3.9 to -0.2. This triggers the restraints, suddenly spiking the energy and causing the whole thing to crash. I've seen this before as well.

Fix - I've already tested making the funnel shorter than half of the box length and that fixes the problem. We need a safety check that compares the upper_bound value with the shortest length of the box. If upper_bound > box_length/2, trigger an error requesting the user to shorten the funnel or choose a bigger box.

lohedges commented 3 years ago

Great, I think your fix will be easy to implement since we need to pass the system to set up the CV so the box information will already be present at that stage. I'll add this in as soon as possible.

dlukauskis commented 3 years ago

Here are the analysis plots I'll need for my tutorial. Lester, could you wrap these within BSS?

lohedges commented 3 years ago

I think all of those plots are already wrapped within our PLUMED driver. However, they are currently bound to a process object so that you can dynamically generate plots will the simulation is running, e.g. to monitor the time evolution of the collective variables, or check the free energy for convergence. (See the metadynamics example notebook in the demo directory of the notebook server.)

I'll look at making the functions standalone, so that you can call them by just passing in the path to a directory containing the simulation output.

lohedges commented 3 years ago

To test the analysis could you post the original files you used to set up the process, as well as the script used to set things up. (In case you tweaked any defaults.) To perform the analysis through our PLUMED driver I'll need to reconstruct the CV, since it is required so that the analysis knows what CV names to look for and what units to use.

lohedges commented 3 years ago

Essentially you can do something like:

# Create the PLUMED wrapper.
plumed = BSS.Process._plumed.Plumed("path/to/your/process/work_dir")

# Create the PLUMED config for this protocol. This maps CV names to their indices and units.
plumed.createConfig(system, protocol)

# Get the time-series of CV index 0. If you have a multi-component CV, then this should
# return the first component.
cv0 = plumed.getCollectiveVariable(0, time_series=True)

# Can also get other CVs, free-energies, etc. These can then be passed to BSS.Notebook.plot.

Once I've made sure this works (it's normally bound to a process, rather than being standalone) I can expose it in the BioSimSpace.Process.__init__.py file.

dlukauskis commented 3 years ago

To test the analysis could you post the original files you used to set up the process, as well as the script used to set things up.

Check my previous message, it's got all the input files along with it.

Essentially you can do something like: ...

I'll have a look at how it works with the test sims I have already.

lohedges commented 3 years ago

I tried that system, but can't generate the CV parameters since the GROMACS naming conventions don't allow us to detect alpha carbons in the usual way. (I could pass a name mapping if I know what it is.) Did you orignally use rst7/prm7 files and convert to GROMACS for the purposes of running the simulation? The OpenMM driver uses AMBER by default:

import BioSimSpace as BSS

s = BSS.IO.readMolecules("steepwall_run/system.*")

p0, p1 = BSS.Metadynamics.CollectiveVariable.makeFunnel(s)

Gives...

ValueError: No alpha carbon atoms found within 10 Angstrom of the binding pocket center. Try explicitly setting the center using a 'BioSimSpace.Types.Coordinate' or using a different option for 'alpha_carbon_name'.

Perhaps you used a specific coordinate for the binding pocket?

It doesn't matter to much, since I can set up any funnel CV that has the same units, just is good to keep things consistent.

dlukauskis commented 3 years ago

Sorry I forgot to mention it was a guest-host system with no CA atoms :) I'm attaching a HILLS file with openmm.py from a proper protein-ligand system.

lohedges commented 3 years ago

Thanks. There will be a few wrinkles with the current analysis since I need to adapt it to handle a NumPy COLVAR file. Shouldn't be too hard thought. Think I can sort this out early next week.