openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
308 stars 90 forks source link

Implement Library Charges #25

Closed davidlmobley closed 4 years ago

davidlmobley commented 7 years ago

Major design decision:

Currently, the charge method is handled on a "by system" basis; that is, as an argument to createSystem. So, all the components of a System are charged via the same mechanism, such as AM1-BCC, and we have no mechanism to flag specific components of a system for special treatment. Often, this is what we want, but water poses challenges (see #24 ) -- typically, for existing water models, we want to assign library charges and only charge our solute (or non-water solvent) atoms via normal charging schemes like AM1-BCC.

How shall we achieve this? @jchodera , @bannanc , @andrizzi, thoughts? Some ideas:

Other possibilities?

davidlmobley commented 7 years ago

See https://github.com/open-forcefield-group/openforcefield/issues/24#issuecomment-298092869 for possible solutions to this. Probably the best is modifying the nonbonded force section to allow charge overrides, I think.

jchodera commented 7 years ago

I think the addition of allowing an additional charge attribute to override the atomic charge is the simplest approach for handling special water models to enable comparisons with existing models.

The real question here is what are the cases outside of special water models in which we would want to override charge handling? We already have a very flexible approach to molecule-based library charges (by providing OEMol molecules with the desired partial charges) as well as an approach to have createSystem compute and apply AM1-BCC-like charges to all molecules. The gaping hole in this capability is charging of polymeric systems or very large small molecules where AM1-BCC-like methods would be impractical, but are there really use cases beyond special water models where the ability to do something more than what we have now would be useful?

davidlmobley commented 7 years ago

I can't think of that many cases offhand aside from the issues you mention.

I have done work where I had sugars in a protein where I wanted to use GLYCAM parameters for the sugars (with their own charges) and "normal" parameters for the protein and ligand and co-solvents. But in that case, to avoid having to create my own GLYCAM parameterization engine, I presumably would have just created a ParmEd system for the GLYCAM components and merged them with the remainder of the system. So I don't necessarily see this as a use case.

But, there are other co-solvents one might sometimes want to use which are somewhat analogous to waters, I believe. For example, presumably someone has custom TRIS, or ... We probably want a mechanism for people to be able to say, "OK, do your normal thing on all of the rest of the stuff, but treat these components in this special way..." (Or, is that opening too big a can of worms?)

Another case is probably ions, but the issue there is less about charges and more about what other parameters one wants to use.

jchodera commented 7 years ago

But, there are other co-solvents one might sometimes want to use which are somewhat analogous to waters, I believe. For example, presumably someone has custom TRIS, or ... We probably want a mechanism for people to be able to say, "OK, do your normal thing on all of the rest of the stuff, but treat these components in this special way..." (Or, is that opening too big a can of worms?)

Specifying specific forcefields for complex buffer molecules like TRIS are harder to specify in ffxml format because the SMARTS strings will have to be ultra-specific to match individual atoms, bonds, angles, and torsions. It can still work, but it will become a huge pain to generate these manually at some point. I'm not sure how to fix that in a SMIRNOFF-friendly way, however. We could extend the syntax significantly to allow us to specify what amounts to the specific force terms applied to a specific molecule, but I'm not sure if the use case is so compelling that it would ever be worth the effort.

Another case is probably ions, but the issue there is less about charges and more about what other parameters one wants to use.

Supporting ion models is going to be important in both the short term (borrowing ions from elsewhere) and long term (allowing flexibility in parameterization and use of ion models). We should make sure that SMIRNOFF is able to easily let us bring in ion models from AMBER right now, and hopefully also support multisite ion models.

jchodera commented 6 years ago

Following #86, here's how library charges might work:

<LibraryCharges charge_unit="elementary_charge">
   <!-- match an alanine residue -->
   <Residue name="ALA" smirks="[$([NX3H:1](C)(C))][CX4H:2]([CH3X4:3])[CX3:4](=[OX1:5])([N])" charge1=0.2 charge2=-0.2 charge3=0.4 charge4=-0.1 charge5=-0.3>
   ...
</LibraryCharges>

This is just an example (and I neglected to use explicit hydrogens, though we would want to use them), but it illustrates how we might be able to specify a number of SMARTS/SMIRKS matches that could be applied to provide library charges for biopolymers (or other small molecules) prior to subjecting the rest of the uncharged molecules (or molecule fragments) in the system to an AM1-BCC-like scheme.

davidlmobley commented 6 years ago

That looks good to me, @jchodera , and is what I'd basically been thinking in the interim.

jchodera commented 6 years ago

Great. It should be very easy to add support for this.

j-wags commented 5 years ago

I'm going to treat this issue as "Implement Library Charges", unless there are any objections. This will allow us to keep discussion in one place and have a concrete goal to allow us to close this.

The other proposal in this issue was to allow different semiempirical treatments for different components of a system. To the best of my understanding, that is no longer a feature we prioritize. Is this correct?

jchodera commented 5 years ago

Agreed!

davidlmobley commented 5 years ago

@j-wags treating this as "implement library charges" works fine as long as the API is designed in such a way that library charges can be applied to COMPONENTS of a system, not just the whole system. Specifically, common workflows will want, simultaneously:

Implementation of library charges needs to be done in a way that allows this to happen. If using "library charges" meant we also had to have library charges for a ligand, that would be painful.

jchodera commented 5 years ago

Yes, we designed the spec with that in mind!

jchodera commented 5 years ago

See https://open-forcefield-toolkit.readthedocs.io/en/latest/smirnoff.html#partial-charge-and-electrostatics-models

If that doesn't work for your use cases or anything isn't specified in sufficient detail, please let us know!

davidlmobley commented 5 years ago

Right now that section of the docs: a) doesn't mention using AM1-BCC or similar, except as an aside in the "ChargeIncrementModel" discussion, and b) doesn't discuss precedence of the different sections

I THINK what it's imagining is that the library charges would overrule anything else that could be applied to charges, for components of the system for which there are library charges, but that's not specified in the documentation.

davidlmobley commented 5 years ago

(Maybe this means we need to spin off a separate issue to clean up that portion of the docs, @j-wags .)

j-wags commented 5 years ago

Implementation of library charges needs to be done in a way that allows this to happen.

Understood. I think we're on the same page about this. I'll think about a way to structure it so that charge sources are prioritized as: 1) Take charges from molecule, if charge_from_molecule is specified during system creation, otherwise 2) Search for library charges for the molecule, otherwise 3) Use a semiempirical method to calculate charges

Note: as we move toward an implementation, we should keep in mind that this will need to handle protein residues in the future, where we want to apply library charges piecewise to a large molecule.

jchodera commented 5 years ago

It does specify precedence:

"Note that atoms for which library charges have already been applied are excluded from charging via ."

But we could make that clearer.

I can't find mention of the ToolkitCharges. The ChargeIncrementModel was supposed to be our AM1-BCC, but we added the toolkit charge as a temporary workaround and may have neglected to add it to the spec. So yes, let's amend the spec in a bugfix release (spinning off a separate issue).

SimonBoothroyd commented 5 years ago

Library charges would definitely be useful to have - especially when dealing with hydrated system! Is there a tentative milestone target for this?

j-wags commented 5 years ago

Not yet, though I've gotten inquiries from @bannanc and @proteneer about when LibraryCharges will be ready. This is probably something we can aim for in 0.5.0 (ETA 4-5 weeks). I'll bring this up at the next milestone meeting.

jchodera commented 5 years ago

We should be careful to pair this with optimization improvements since it is currently slow to type many molecules such as waters (which is the primary use case).