openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
313 stars 92 forks source link

Allow registration of paths from other force field packages #28

Closed davidlmobley closed 5 years ago

davidlmobley commented 7 years ago

Per https://github.com/open-forcefield-group/openforcefield/issues/22#issuecomment-298111887

OpenMM recently added a way in which other conda packages could register their install paths with ForceField so that these directories would be searched too. We should probably add this capability so that, for example, the forcefield packages we deliver can be separate packages from the openforcefield toolkit.

pandegroup/openmm#1789

davidlmobley commented 7 years ago

@bannanc - note that this would finally provide a way to reduce the number of places we have to have smirnoff99Frosst.

jchodera commented 7 years ago

I propose we name the entry point group something clear like openforcefield.forcefield_installation_directory.

pschmidtke commented 6 years ago

Should I in theory be able to load all other xml files (for instance amber99sbildn.xml) using the Forcefield class from openforcefield (OE, or RDkit)?

davidlmobley commented 6 years ago

@pschmidtke : Note that not all XML files use the openforcefield XML format; in fact, at this point, the only openforcefield XML files we are aware of are those in our repositories. While someone could port amber99sbildn.xml into openforcefield XML format, I do not believe such a port yet exists.

We actually just renamed our file extension to .offxml to alleviate this point of confusion. Normal OpenMM XML files are NOT openforcefield XML files.

jchodera commented 6 years ago

@davidlmobley : In principle, we could have openforcefield ForceField detect whether these are OpenMM or SMIRNOFF XML files and behave appropriately, but I think we're moving toward a significantly different API for the openforcefield createSystem() call since all of the options affecting force/energy computation will be encoded in the XML. I think our plan to make these two routes fundamentally different---and highlighting that difference with .xml vs .offxml extensions is the right way to go.

pschmidtke commented 6 years ago

Any hints on how to do such a port @davidlmobley, documentation available somewhere ?

davidlmobley commented 6 years ago

@pschmidtke - we've been thinking a bit about porting protein FFs into the format, too. If you're interested in it perhaps we can do a call to discuss.

Basically, it would involve coming up with SMIRKS patterns that correspond to the atom types used in the existing amino acids; to make sure these don't get used anywhere else, one would probably want them to match the individual amino acids and ONLY individual the amino acids. Still, this should not be hard for someone who is good with SMARTS/SMIRKS; we could probably help. Once this is done, one would then translating the existing protein force fields into SMIRNOFF format basically by taking an AMBER format parameter file, replacing the atom types with the SMIRKS patterns generated above, and then using our converter tool to convert these into SMIRNOFF XML format. The result would then require some human curation as there would be some redundancy that would need to be removed, I believe. I could probably write instructions on how to do most of this. Upshot: It would be relatively human intensive but straightforward.

HOWEVER there is one blockage/bottleneck: Currently we don't have support for LibraryCharges in place (normal AMBER family protein force fields assign charges based on look-ups given the environment). We've laid out how this will work in the format (https://github.com/openforcefield/openforcefield/blob/topology/The-SMIRNOFF-force-field-format.md) and #86 should implement it, but we've been short on developer time. @jchodera , any ETAs?

Anyway, if this is something you're interested in helping with in any way, let me know.

pschmidtke commented 6 years ago

Yes i found the smirnoffish parm 99.dat. I guess more or less the same is required for amino acids etc. How would loading a protein from pdb and then parametrization in open forcefield look like afterwards? Everything will be seamless once the offxml available? As for charges an option as a start would be also to allow reading in prmtop files with parmed and then translate to openforcefield, no?

davidlmobley commented 6 years ago

Yes i found the smirnoffish parm 99.dat.

Yes, basically we invented an intermediate format that just brings in the parameters from AMBER and we just have to recode where they get applied with a SMIRKS.

How would loading a protein from pdb and then parametrization in open forcefield look like afterwards? Everything will be seamless once the offxml available?

Should be, yes, assuming the SMIRKS patterns are correct and no one botches the hierarchy. But this should be very doable and I think in the long-term is a much better option than currently how protein force fields are handled. (Big picture: Imagine you could have a biopolymer that can have standard residues -- in which case your normal protein parameters get applied -- as well as nonstandard amino acids/covalent modifications/etc. that just get smoothly applied (being treated by the "general" force field) without having to do any special work to make the two aspects connect up. It would be pretty great and SHOULD be possible in this world once we (a) port in protein FFs, and (b) come up with a fragmentation scheme for charging chunks of polymers).

As for charges an option as a start would be also to allow reading in prmtop files with parmed and then translate to openforcefield, no?

If you just want to use our small molecule FF with an existing protein FF in AMBER (without bringing the protein FF into this format) that's absolutely possible; see https://github.com/openforcefield/openforcefield/blob/master/examples/mixedFF_structure/generate_mixedFF_complex.py for an example doing exactly that (setting up a small molecule with SMIRNOFF and a protein with AMBER and combining the two via ParmEd).

But perhaps that's not quite what you're asking; if it's not, you'll have to explain a little more for me. What exactly would you translate after reading from parmed?

pschmidtke commented 6 years ago

I found the merge structure and ported it to rdkit already but am not sure whether a createSystem should work on such a parmed structure? It doesn't with my current rdkit implementation.

davidlmobley commented 6 years ago

I found the merge structure and ported it to rdkit already but am not sure whether a createSystem should work on such a parmed structure? It doesn't with my current rdkit implementation.

Can you point to code or give a little more detail about what you're doing? Which createSystem? IIRC there are actually three -- ParmEd's, OpenMM's, and openforcefield's (admittedly this is confusing).

We've used the parmed system creation to create systems after merging structures in the way shown in the code I linked to above.

pschmidtke commented 6 years ago

I tried all of them actually ;) Right now I'm insisting on parmed's createSystem. I forked openforcefield (sorry only way to pass you code), created a pschmidtke branch (starting from you swang branch) on my fork and uploaded the example : https://github.com/pschmidtke/openforcefield/tree/pschmidtke/examples/mixedFF_structure. Use the _rdk.py version.

When using this I'm running into : File "/Users/peter/miniconda3/lib/python3.6/site-packages/parmed/structure.py", line 2288, in omm_bond_force raise ParameterError('Cannot find necessary parameters') parmed.exceptions.ParameterError: Cannot find necessary parameter.

We can continue this discussion elsewhere, don't want to hijack the feed here ;)

hjuinj commented 6 years ago

Hi Peter,

This could possibly due to the water parameters in your pdb file not being registered when the openmm object is transferred into parmed in the utils.generateProteinStructure part of the code (e.g. see here for some more detail https://github.com/ParmEd/ParmEd/issues/868). I think in the example David provided, the pdb file has no water and that's why it works.

I have kind of a dirty workaround at the moment. I shall provide a sketch here and David can maybe comment on how appropriate it is before I go into more detail. To start, I think you have parameterised parmed objects of your ligand, your protein but not your water. So I just created a prmtop file containing a single tip3p water parameters. I can then create a parmed object of water via parmed.load_file('water.prmtop', xyz='water.inpcrd'). Then I can combine the water, the ligand and the protein parmed objects into one.

Shuzhe

davidlmobley commented 6 years ago

@pschmidtke - can you try what happens without water present? If indeed the water is the issue then I have some ideas, but if the water is not the issue we'll need to look more closely.

pschmidtke commented 6 years ago

Ok, indeed that was the case. Sorry forgot to drop the water molecules from the pdb file. What are the limitations using parmed right now then vs openforcefield? Some constraints aren't available? I'll try to run some tests already with that for now.

pschmidtke commented 6 years ago

@hjuinj @davidlmobley ok I managed to run a few things. However, I noticed wrong parameters are assigned to a few nitrogens (for instance in PU8 ( CCCCn1c(Cc2cc(OC)c(OC)c(OC)c2Cl)nc2c(N)ncnc12 ) or the ligand of PDB structure 5dls.

I also tried simpler molecules (hydroxytamoxifen and that worked fine). Is this linked to the aromaticity model used in rdkit? or how can I track down / correct the issue?

The attached image is the result of a tiny bit of minimisation.

screen shot 2018-04-18 at 22 12 45

bannanc commented 6 years ago

@pschmidtke I think it is unlikely that this is an aromaticity issue as those problems were primarily related to 5-membered rings. (As in I believe that ring should be aromatic in our desired aromaticity model and most/all defaults). Which version of smirnoff99Frosst.offxml are you using?

pschmidtke commented 6 years ago

I guess an old one, as it's still ffxml ;) basically the one in the swang branch. I'll try an update of the file and keep you posted

pschmidtke commented 6 years ago

Ok I am just loading the offxml from the current master. This results in an error like this : File "/Users/peter/miniconda3/lib/python3.6/site-packages/openforcefield-0.0.3-py3.6.egg/openforcefield/typing/engines/smirnoff/forcefield_rdk.py", line 1544, in createForce _check_for_missing_valence_terms('HarmonicBondForce', topology, bonds.keys(), topology.bonds()) File "/Users/peter/miniconda3/lib/python3.6/site-packages/openforcefield-0.0.3-py3.6.egg/openforcefield/typing/engines/smirnoff/forcefield_rdk.py", line 1265, in _check_for_missing_valence_terms raise Exception(msg) Exception: HarmonicBondForce: Mismatch between valence terms added and topological terms expected. Topological atom sets not assigned parameters: (4, 5) : 0 mymol 4 0 mymol 5 (4, 27) : 0 mymol 4 0 mymol 27 topology_set: {(10, 11), (22, 23), (3, 35), (5, 6), (11, 40), (25, 51), (4, 27), (8, 9), (15, 16), (18, 19), (2, 34), (17, 46), (23, 50), (0, 28), (1, 2), (16, 17), (6, 7), (21, 27), (12, 13), (20, 21), (25, 26), (15, 18), (3, 4), (1, 32), (0, 29), (6, 37), (14, 43), (11, 42), (22, 24), (23, 49), (4, 5), (7, 18), (9, 10), (14, 44), (2, 3), (0, 30), (3, 36), (1, 31), (0, 1), (12, 15), (26, 27), (6, 38), (13, 14), (21, 22), (11, 41), (2, 33), (14, 45), (24, 25), (5, 20), (8, 39), (17, 48), (7, 8), (9, 12), (17, 47)} assigned_set: {(22, 23), (10, 11), (3, 35), (5, 6), (11, 40), (25, 51), (8, 9), (15, 16), (18, 19), (2, 34), (17, 46), (23, 50), (0, 28), (1, 2), (16, 17), (6, 7), (21, 27), (20, 21), (12, 13), (25, 26), (15, 18), (1, 32), (0, 29), (6, 37), (14, 43), (11, 42), (22, 24), (23, 49), (7, 18), (9, 10), (14, 44), (2, 3), (0, 30), (3, 36), (1, 31), (0, 1), (7, 8), (12, 15), (26, 27), (6, 38), (21, 22), (13, 14), (11, 41), (2, 33), (14, 45), (24, 25), (5, 20), (8, 39), (17, 48), (3, 4), (9, 12), (17, 47)}

davidlmobley commented 6 years ago

OK, so I think what you're seeing in this particular case is that: a) We used to take an approach where all molecules would get parameters assigned (we provided dummy generics in such cases), though they would be assigned "obviously wrong" parameters in the case of chemistry we didn't understand, but b) we recently shifted to a model (offxml file) which omits the dummy generics, so you now get an exception

So previously you got bad geometries and now you just get an exception.

HOWEVER, this doesn't answer the question of WHY you're missing parameters for this one. @hjuinj are you in a position to cross-check this case between the RDKit and OE implementations? Specifically, molecule CCCCn1c(Cc2cc(OC)c(OC)c(OC)c2Cl)nc2c(N)ncnc12? I haven't checked our OE implementation yet but looking at the molecule it doesn't look like it has any chemistry we shouldn't cover, so I'm suspecting some kind of issue with the RDKit version.

davidlmobley commented 6 years ago

@pschmidtke - to answer your other question:

What are the limitations using parmed right now then vs openforcefield? Some constraints aren't available?

In principle one can do "almost everything" but there are a bunch of edge cases (in part because Jason Swails, the ParmEd guy, has a "real job" right now and isn't doing as much to support ParmEd as he used to) where things don't behave as they ought unless you do things just right. So if we know exactly what you're trying to do we may be able to give a more useful answer to this question than answering it in general.

To some extent the answer depends on what file formats you are using as well -- for example, I CAN set up mixed solvent systems with water and other things using SMIRNOFF and then simulate with OpenMM or GROMACS. Handling of water currently poses some problems if exporting to AMBER format (parameter files become too large for reasons I could explain). But if you avoid AMBER format there is no problem.

One other issue in general is that the issue of "what molecule are you starting with?" is much more important now than you will be used to with "normal" MD codes, so starting from a PDB file for a small molecule can be a lot more problematic than starting from a SMILES or mol2 because it can result in a step where you have to infer the chemistry.

We have plans to resolve most of these issues in the relatively near future, but we can probably also help you solve your specific problems more immediately if we can talk through exactly what use cases you have/what pipeline you want to apply. Perhaps that's something to lay out in a separate issue.

Thanks.

jchodera commented 5 years ago

Supplanted by #117.