Address open questions on SMIRNOFF format spec revamp

davidlmobley commented 5 years ago

https://open-forcefield-toolkit.readthedocs.io/en/topology/smirnoff.html lists a number of open questions raised by @jchodera (and relevant to @j-wags ) on the SMIRNOFF format spec. I'll give draft answers to some of them here:

"What should we name the parameter sections?" I'd call them "parameter sections", since the other suggested terms ("force field terms", "force classes", "interaction types" all seem already loaded with meaning or like they are overly narrow)
"Should we have a separate metadata or provenance section that users can add whatever they want to?" Seems like a good idea to me. Probably "metadata".
Do we want to allow users to specify atomic or particle masses (e.g. heavy hydrogens)? I am OK with this but also ambivalent.
"Should individual force classes [parameter sections] be versioned, rather than having a global SMIRNOFF version?" I am not certain. I think initially I'd go for the simplest to implement in order to finish sooner. I can see the use case for having versions of specific parameter sections, because if you have a force field which uses only unchanged versions of the spec you don't have to upgrade it for it to work with the new one. But that does add more work to implementing and more work checking for updates, etc.
"Should we expand fractional bond orders beyond <Bonds>?" Yes! That is supposed to be implemented! We must have lost the issue for finishing that implementation when we migrated repos. @j-wags take note; we need to apply the same infrastructure that we use for Bonds for angles and torsions.
XML schema: I don't know, but since we're heading towards various serialization options that probably means no?
"Handling extra XML tag attributes": Agree that having tags there which the parser does not understand should raise an error or warning (perhaps an option as to which). Things like id are very useful though. Maybe we should have some optional tags (like id) which won't raise warnings/exceptions if present, but if anything other than required/optional tags is present it will raise an exception?
"Should we integrate ParmEd and InterMol functionality by adding create() methods for other simulation packages?" Not at present, as this makes us responsible for them NOW. Better not to be responsible for them yet.
"Add link to complete open specification of OEAroModel_MDL aromaticity model" Caitlin can pull this from the SMIRNOFF paper/her notes and provide when someone is ready for it.
"Are there ways we can simplify the integration of legacy biopolymer force fields?" Let's just provide some good examples to start with.
" How will we ensure the SMIRNOFF force field is correctly implemented by molecular simulation packages where nonbonded treatments are encoded by auxiliary input files?" I don't know, let's make a separate issue for that and deal with it separately.
"How should we use multiple conformations in charging?" We should be doing ELF at first I think; this can warrant separate science. Create an issue?
"Should we support RESP charging?" For now, via user charges, otherwise this makes us responsible for RESP at present and Lee-Ping's code is not ready for this.
"How should we fragment larger small molecules and polymers for charging?" Separate issue, deal with later. We and OpenEye have some leads on this but it will take a person working on it for a while.

j-wags commented 5 years ago

It looks like, using OE tools, I can calculate Wiberg bond orders using https://docs.eyesopen.com/toolkits/python/quacpactk/bondordertheory.html . Is there an equivalent method we have in mind using the RDKit/AmberTools toolkits?

hjuinj commented 5 years ago

@j-wags from what I looked into in the past, I don't think there is the equivalent in RDKit.

davidlmobley commented 5 years ago

@j-wags Right, there is nothing in AmberTools for this yet. You may have noticed this was also something which came up in the Torsions call and we reminded @dgasmith about the need for it.

For now the RDKit implementation would presumably have to raise an exception if it is asked to do something with bond orders.

(We don't yet have a force field which requires bond orders, so these are an optional -- but important for the future -- feature.)

j-wags commented 5 years ago

Commit 246101b moves remaining spec questions from The-SMIRNOFF-force-field-format.md to this issue.

Should we have a separate `<Metadata>` or `<Provenance>` section that users can add whatever they want to?

This would minimize the potential for accidentally colliding with other tags we add in the future.

Do we want to allow users to specify atomic or particle masses? This could allow, for example, heavy hydrogens to be specified easily.

<!-- Use average atomic masses (averaged over isotopes) except for heavy hydrogens -->
<Masses default="average-atomic-mass" mass_unit="amu">
   <!-- Make hydrogens heavy -->
   <Mass smirks="[#1:1]" mass="3"/>
</Masses>

While this won't affect thermodynamic properties, it could affect kinetic properties, which may be something our force fields optimize for in the future.

Should individual force classes be versioned, rather than having a global SMIRNOFF version?

Should we expand fractional bond orders beyond `<Bonds>`?

Should we have an XML Schema?

An XML Schema would make it easier to validate XML representations of SMIRNOFF to make sure they are compliant and detect errors.

Should we integrate ParmEd and InterMol functionality by adding `create()` methods for other simulation packages?

We could integrate ParmEd and InterMol as dependencies in our tooolkit and add methods like ForceField.create_amber_system() or ForceField.create_charmm_system() that generate input files for other packages without the need to convert. While perhaps not immediately useful for combining biopolymers parameterized with traditional force fields with SMIRNOFF-parameterized small molecules, once these legacy force fields are available in SMIRNOFF format or we have new biopolymer SMIRNOFF force fields, this drastically simplifies workflows.

Add link to complete open specification of `OEAroModel_MDL` aromaticity model

Are there ways we can simplify the integration of legacy biopolymer force fields?

Are there ways we can make it easy to integrate pre-parameterized systems describing part of the topology (e.g. protein)?

How will we ensure the SMIRNOFF force field is correctly implemented by molecular simulation packages where nonbonded treatments are encoded by auxiliary input files?

For gromacs, AMBER, and CHARMM, the nonbonded treatments (which are integrally specified by a SMIRNOFF force field) are instead specified by an auxiliary input file. Should we generate this auxiliary input file to, or part of it? How can we insist that the desired settings be used?

We should include some missing references

Supported aromaticity models
Supported fractional bond order models
Quantum chemical methods (e.g. AM1) and charge partitioning schemes (e.g. CM2)
AM1-BCC

Should we support RESP charging?

How should we use multiple conformations in charging?

Should we follow the RESP approach, where the charges or averaged? Or the ELF approach, where we use some kind of energy function to evaluate which single conformer is used to compute which conformer is to be used for quantum chemical calculations? What scheme should we use to generate the conformers in a deterministic manner?

How should we fragment larger small molecules and polymers for charging?

Larger small molecules may not benefit from quantum chemical calculations on the whole molecule. Instead, it might be more robust (and faster) to break the molecule into smaller capped fragments, compute charges separately for these fragments, and combine the charges according to some algorithms. This approach would also work for polymers, allowing us to parameterize arbitrary polymeric residues and covalent modifications of them. What is the best way to do this?

j-wags commented 5 years ago

Per https://github.com/openforcefield/openforcefield/pull/233#pullrequestreview-223544516

Should the SMIRNOFF spec live in its own repo, instead of being bundled with the toolkit documentation?

davidlmobley commented 5 years ago

Hmm, the spec in its own repo? That's an interesting idea. Pros and cons?

jchodera commented 5 years ago

Examples:

The QC JSON spec
autoprotocol

j-wags commented 5 years ago

Copying relevant text from https://github.com/openforcefield/openforcefield/pull/341#discussion_r291431410

Here are some unresolved questions we may need to solve

What is the relationship between a parameter section and the SMIRNOFF tag?

I actually don't know, so I'm going to "think out loud" here.

The SMIRNOFF tag currently encodes the aromaticity model. Could we achieve the same behavior by having each parameter section have an aromaticity model? Yes, we could.

So maybe, the idea of an enclosing tag in the SMIRNOFF spec indicates "a value that would otherwise need to be set repeatedly in the enclosed tags". In that sense, the top-level aromaticity_model attribute is implicitly a value that affects, for example, the BondHandler's behavior.

As a thought experiment, I could make a new ParameterHandler that applies parameter based on a different aromaticity model, and have it override the higher level aromaticity model. This is a weird example, but I couldn't come up with a better one -- The point is that it would seem to be arbitrary to tell a developer they can't do that, just because we say so.

Note that the above "sections should be enclosed in other sections if the enclosing section affects the behavior of the enclosee" logic would have a consequence for @andrrizzi's question in https://github.com/openforcefield/openforcefield/issues/310#issuecomment-498221565

First thing that comes to mind is that we might want to have the charge model tags being children tags of Electrostatics, which will essentially have 3 levels of indentation instead of 2.

It implies that the sections that describe atomic charges should be enclosed by the Electrostatics tag (since attributes like the cutoff influence the functional form). It also opens the question of "what if a person wanted both the Electrostatics and GBSA tags? What should enclose the charge-determining sections?". I don't know what we'd do in that situation.

And, in a similar vein, this will make things like VirtualSites more difficult, since they currently are in their own section, but will experience electrostatics and vdW forces. Should they be doubly-nested in Electrostatics and vdW tags? Or should the VirtualSite definitions be duplicated in the enclosing Electrostatics and vdW sections?

So maybe this isn't the right view of section enclosure.

Maybe the SMIRNOFF version encodes something about the underlying section structure

For example, in the 0.2 -> 0.3 transition, we switch from units-in-the-header to units-in-the-quantity. Well, I'd argue that even that change could have been handled at the parameter section level. So one could argue that changing to the "SMIRNOFF 0.3 spec" doesn't actually mean anything -- it's the parameter sections that need to change their behavior! But, I guess the SMIRNOFF spec change here records a "rule change", because the SMIRNOFF 0.3 spec allows ParameterHandlers with their own versions.

Is a parameter section's version truly just the `version=Y` attribute in its defining tag? Or is it really a tuple of `(SMIRNOFF_version=X, section_version=Y)`?

I think it has to be the tuple, because if a parameter section only works with SMIRKS1, but the SMIRNOFF tag sets SMIRKS2... then the parameter section needs to know about this higher-level setting that didn't even exist when it was written. So I guess each parameter section will need to have a MAX_SMIRNOFF_VERSION tag. And probably a MIN_SMIRNOFF_VERSION too.

mattwthompson commented 1 year ago

This is no longer strictly in the domain of the toolkit so I'm transferring it to the standards tracker. I don't know what is currently actionable, but if anything is actionable I recommend triaging this into smaller discrete action items in their own issues or tickets.

openforcefield / standards