openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

first protein ligand simulation set ups using openforcefield-rdkit #107

Open pschmidtke opened 6 years ago

pschmidtke commented 6 years ago

Hi,

as requested by @davidlmobley here (https://github.com/openforcefield/openforcefield/issues/28#issuecomment-382541238) I stop hijacking other issues and create a dedicated one to outline what I'm trying to achieve using openforcefield.

As a start I want to set up a proper ligand protein simulation using only the rdkit integration started by @hjuinj . So the current short term things I'm testing are:

On the mid-term, if this first test integration works, i'll also integrate a protein chunk creation + harmonic restraints into the simulation as described here : https://www.nature.com/articles/nchem.2660. Most of this is already done but not tested with parmed/openforcefield prepared systems yet.

On the long run, I'd be happy to simulate systems only prepared via openforcefield (also the protein)...so this would require a translation of the protein force fields to offxml.

I'll push my work for now here (https://github.com/pschmidtke/openforcefield) into the pschmidtke_rdk branch, as I don't have access to this repo here. Note that I also do not have an openeye license and that the whole point of this integration is to provide a fully free version of our dynamic undocking approach (I currently rely on MOE to parametrize the ligands using.....parm@frosst ;) ).

Also, I'm a noob in openmm, am still discovering a lot (and I like it a lot) and will probably sometimes ask noob questions, like where is what or how to do the most basic thing in the world, sorry for that :)

I'll post issues related to the offxml or other openforcefield related things here.

Thanks in advance for your help!

hjuinj commented 6 years ago

Morning Peter,

I think I know why you have exception caught with this CCCCn1c(Cc2cc(OC)c(OC)c(OC)c2Cl)nc2c(N)ncnc12 and the new offxml file. I think it is actually to do with the aromaticity model.

quoting the problematic region you saw:

Topological atom sets not assigned parameters: (4, 5) : 0 mymol 4 0 mymol 5 (4, 27) : 0 mymol 4 0 mymol 27

and this figure: screenshot from 2018-04-19 07-41-30

This is the hetero-ring which the openeye MDL aromaticity model does not recognize as aromatic while the rdkit (which I guess you are using 2017.03 or before?) does think it is aromatic. Sifting through the new offxml file there does not seem to be bond smirks for tertiary aromatic nitrogens and hence the error.

I have been looking at the newest rdkit version which implements the MDL aromaticity model. but I had some issues which I reported here and hopefully this has been resolved in the most recent beta-trial version of rdkit 2018.03 (they have the conda install-able out last week if you wish to give it a try, conda install -c rdkit/label/beta rdkit, although I don't think it will directly solve the problem). I have a WIP version on my laptop which I can try to work a bit more on. Unfortunately it is kind of a bad timing for me as I am set to go on holiday later today for two weeks. I can work on it between travels but I cannot promise anything, sorry.

@davidlmobley

pschmidtke commented 6 years ago

thanks @hjuinj . Seems logical, as I observed the weird geometries (with previous ffxml) either on tertiary aromatic nitrogens, but also a few others (but usually nitrogens). No hurry, I also have real work ;) I can continue to set up my things on systems without these types of atoms.

davidlmobley commented 6 years ago

Thanks for this, @pschmidtke . I'll revisit in more detail soon, but just to respond to the SMILES/aromaticity model: Definitely if you're using the older version of RDKit the aromaticity model means something very different and you'll get a lot of discrepancies in substructure matches. Shuzhe (@hjuinj ) put in a great deal of work into (a) tracking these down, and (b) working with the RDKit developers to get a comparable model put into place for the 2013 beta. I think ultimately he was able to get all the energies (cross-comparing between OE and RDKit implementations) to agree, but that presumably requires the WIP stuff he has on his laptop. :)

We're glad it looks like this will be able to be useful to you.

One thing which might or might not be useful to you is that we've actually done all the hydration free energy calculations in the FreeSolv database again with SMIRNOFF, using water, obviously (see our preprint on biorxiv), and scripts for this are available online. You may find this one useful: https://github.com/MobleyLab/SMIRNOFF_paper_code/blob/master/FreeSolv/scripts/create_input_files.py -- at least, it shows one particular workflow that will successfully get you OpenMM input files for a set of "solute in water" systems. It DOES rely on (open source) packmol for adding water molecules, which may not be as good as adding things to a pre-equilibrated box. But it does also work reliably/robustly (642 hydration free energy calculations run, a couple of times!). :)

pschmidtke commented 6 years ago

Hey, thanks for the info. As I said no pressure @hjuinj ;) The time being I can continue my integration work with the ligands I can parametrize, so that should be good for a proof of concept. The solvation example will definitely be of interest, i'll check that out and ping you if I run into issues on that. Thanks again for your guidance.

jchodera commented 5 years ago

@pschmidtke : Is this issue still relevant now that we have the toolkit 0.2 release featuring RDKit support?