SMIRNOFF: PME electrostatics cannot be applied to non-periodic systems

mattwthompson commented 2 years ago

In https://github.com/openforcefield/openff-toolkit/issues/1084 I did a poor job of keeping things in any sort of scope. This issue splits out one issue into something discrete and actionable and focused on just one topic. To resolve this, it will also be helpful to keep the discussion in scope and split out other issues unless they are strictly blocking progress here. I've also attempted to push as much into footnotes as possible.

The problem: PME electrostatics cannot be applied to non-periodic systems

The default and most commonly-used electrostatics method (PME) is fundamentally not compatible with non-periodic systems [1]. This means a force field with <Electrostatics method="PME" .../> cannot be used with gas-phase systems, which is a common use case.

The history: OpenFF has always modified PME to "no-cutoff" for gas-phase systems

The OpenFF Toolkit, if provided a force field with PME electrostatics and a non-periodic topology, changes the electrostatics method to OpenMM's Nocutoff. This is sensible for historical and practical reasons, but modifying the contents of a force field is not desirable and similar implementation in other simulation engines are difficult [2] or not tractable [3].

This code is from the early days of OpenFF and as a result all force fields have been trained using this behavior [4]. This should be of no consequence to the vdW refits, but may have an impact on the valence terms for large/charged molecules with relevant non-bonded intramolecular interactions [5].

This has all led to a situation in which the software and force fields agree what <Electrostatics method="PME" .../> means for non-periodic systems, but it's not what the spec says.

The question: Should the SMIRNOFF specification be updated to convey this meaning?

I'd argue the answer is "yes," since I don't want implementations to choose between implementing the text of the specification and having

The idea here would be to, in version 0.4 of the Electrostatics section, either

Update the defintion of method="pme" with a clause along the lines of "by the way, if non-periodic, use full electrostatics with no or a very long cutoff"
Add a new method flag that conveys the meaning of "PME if periodic, full no-cutoff electrostatics if non-periodic" and make this the default moving forward

If the answer is "no," the above discrepancy will continue to be baked into our code base and force fields. It will also make it difficult for other implementations to decide, particularly when exporting to non-OpenMM engines, if the specification or implementation should be followed.

There are some unresolved questions [6] that are out of scope here but I want to gather feedback on this specific question before moving forward. If any of these ideas is workable I will move it/them into a proper OFF-EP for consideration.

There is another available method in the specification (Coulomb - direct Coulomb interactions (with no reaction-field attenuation) should be used) but this leads to a mirror of the same problem - full electrostatics for a condensed-phase system is not tractable and a user will simply switch it to PME.
In general, molecular dynamics engines tend to consider non-periodic simulations an afterthought, i.e. always requiring box vectors to be known. GROMACS for example does not support vacuum simulations in the past 3-4 years of releases: https://gitlab.com/gromacs/gromacs/-/issues/3526. There are workable alternatives for gas-phase systems that don't introduce PME errors, such as using cutoff electrostatics with long (really long) cutoffs.
Taking only GROMACS for example, I don't think the corresponding "no-cutoff" concept even exists in the supported electrostatics methods.
I'm about 90% sure this is true; I couldn't find anywhere in the current Forcebalance code that obviously sets box vectors to arbitrarily large values, and most single-molecule/gas-phase pipelines I see go pretty quickly from an SDF file to an OpenFF Topology without setting box vectors. Somebody from the fitting team surely has a better understanding of the code and history here.
I don't have a strong intuition here - I figure no-cutoff vs. PME shouldn't have much of an impact in most situation. But I also don't know if some large, floppy zwitterions has snuck into datasets or if molecular geometries might differ slightly with different electrostatics treatments. Any of these could have an impact on i.e. torsions - or none, I don't know.
Mainly - should Parsley and Sage be up-converted by the toolkit to either of these 0.4 ideas?

j-wags commented 2 years ago

The idea here would be to, in version 0.4 of the Electrostatics section, either

Update the defintion of method="pme" with a clause along the lines of "by the way, if non-periodic, use full electrostatics with no or a very long cutoff"
Add a new method flag that conveys the meaning of "PME if periodic, full no-cutoff electrostatics if non-periodic" and make this the default moving forward

I agree that the spec should be updated (to at least record the behavior that users have been getting using our FFs), and would be happy with either of these courses of action. If needed I can generate a preference, but I'd approve an EP that adds either change.

One alternative that came to mind is that we could expand the spec to have periodic_method and nonperiodic_method, thereby codifying in a readily-understandable way exactly what is intended.

j-wags commented 2 years ago

@jchodera @davidlmobley @SimonBoothroyd Resolving this question is high-priority.

Wearing my "Toolkit developer" hat: This issue is effectively blocking to biopolymer support. The chain of logic is:

The biopoymer release of the toolkit will use Interchange by default when exporting to OpenMM
@mattwthompson is building Interchange to adhere to the current SMIRNOFF spec. In his and @davidlmobley's opinion, the correct course of action according to the current spec is to raise an exception if the electrostatics method is PME and no periodic box is provided.
Raising an exception in this case would break a large fraction of user code that goes through create_openmm_system so, as the main toolkit maintainer, I'm not willing to break this much user code unless the spec unambiguously leaves no other option.

Now, putting on my "SMIRNOFF committee member" hat: Per my last message, I think that @mattwthompson's proposed solutions would satisfy everyone. Before he takes the time to draft a formal EP to change the spec, could the other committee members weigh in?

davidlmobley commented 2 years ago

To take the last first, I think it's important that doing what I had suggested would break a lot of existing code, as you point out, @j-wags , and as @jchodera pointed out elsewhere. That's enough to make me back off somewhat from my earlier position.

So, I agree with @mattwthompson 's proposal that we update the SMIRNOFF spec to convey that the behavior users have been getting is appropriate for gas phase systems. So, I like this:

The idea here would be to, in version 0.4 of the Electrostatics section, either

Update the defintion of method="pme" with a clause along the lines of "by the way, if non-periodic, use full electrostatics with no or a very long cutoff"

Add a new method flag that conveys the meaning of "PME if periodic, full no-cutoff electrostatics if non-periodic" and make this the default moving forward

Both of these are fine for me.

@mattwthompson :

most single-molecule/gas-phase pipelines I see go pretty quickly from an SDF file to an OpenFF Topology without setting box vectors. Somebody from the fitting team surely has a better understanding of the code and history here.

I think this is accurate. We kind of assume that the default behavior is to get non-periodic systems with no cutoff, and if we want a cutoff and PME we have to explicitly enable. So it probably makes sense to update the spec to reflect this, even though GROMACS has moved away from it.

I don't have a strong intuition here - I figure no-cutoff vs. PME shouldn't have much of an impact in most situation. But I also don't know if some large, floppy zwitterions has snuck into datasets or if molecular geometries might differ slightly with different electrostatics treatments. Any of these could have an impact on i.e. torsions - or none, I don't know.

Right, so in the gas phase with no periodicity or with a very large box there should not be much of an impact on dynamics, conformational preferences, etc. There COULD be dramatic impacts if doing a free energy calculation, such as for the free energy of turning off a molecule's electrostatics in the gas phase (which can be part of a solvation free energy calculation). But for most other cases no. And, this should be something users could avoid -- I mean, in GROMACS (before their updates the last several years) one could deliberately choose to simulate a system in the gas phase with or without periodicity and with or without PME.

The main argument for making electrostatics (PME) part of the force field is, I think, for condensed phase systems/periodic systems. But if we can require it there, while not requiring it when a periodic box is not set, that seems like a perfect solution.

jchodera commented 2 years ago

Thanks for tagging me into this issue, and for clearly laying out the problem and potential solutions that attempt to eliminate ambiguity from the specification!

There are a few key principles we need to keep in mind as we determine the best way to clarify the spec and ensure all implementations are conformant:

The intention of the specification is to make it clear and unambiguous what the correct definition of the potential function should be for systems of interest---both condensed-phase (periodic) and vacuum (non-periodic). Different packages are left to implement optimizations as long as they control deviations from this correct behavior.
For electrostatics, the goal was to specify which electrostatics model is appropriate for periodic systems, since reaction field implementations and Ewald-based electrostatics have different potential functions. For non-periodic systems, the intention was always to use unmodified Coulomb to represent the gas phase.
The choice whether PME, SPME, GSPME, etc. or other variants are used to implement Ewald is up to the package, provided they control accuracy of the deviation from the true Ewald result in some way. For reaction field, the functional form must be specified since different forms lead to different "correct" electrostatics reference models.

In light of all of this, we should develop a strategy that addresses the following:

We should clarify that non-periodic systems always use Coulomb and the specified method applies only to periodic systems. This might involve a keyword change, from method to periodic_potential or something that conveys that this method only specifies the periodic functional form. We might leave open a way to specify alternative correct functional forms for non-periodic electrostatics in the future.
We should clarify that PME is an implementation detail and not the actual functional form for the true periodic electrostatics potential. Something like Ewald is likely the appropriate keyword value here, since other methods like PME attempt to approximate it in a controlled manner.
Ewald and PME methods do have a choice of boundary conditions---specifically, the dielectric constant of the material that surrounds the infinite periodic cell. Most codes implement only conducting boundary conditions (epsilon=1), but some codes permit other values. @BillSwope may help us navigate whether we need additional information here.
There is some ambiguity about how exceptions are handled for PME (see this thread) that could be clarified in the spec

My suggestions (only suggestions!) would be to

Rename the method attribute to periodic_potential in analogy to other components that use the potential term, and list Ewald3D-ConductingBoundary as a valid keyword that states that the Ewald periodic sum should be the true potential for periodic systems. For reaction field methods, the periodic_potential would specify the exact functional form used for the periodic potential, or a keyword denoting a common choice we choose to include.
To minimize pain in implementations, we can have existing implementations continue to accept method="PME" as a synonym for periodic_potential="Ewald3D-ConductingBoundary" for a few releases, issuing a warning that the spec has changed and this behavior may be removed in a future release.
The spec should state that a nonperiodic system should always use the Coulomb potential. We can reserve the possibility of adding a nonperiodic_potential keyword at a future date, or we could add this now and only define Coulomb as the allowed choice.
We can clarify how exceptions should always use Coulomb and the specified scaling.

Regarding gromacs not supporting cutoff: Presumably, specifying Cut-off with a cut-off that is something large like FLT_MAX would be numerically essentially equivalent to no cutoff. Again, these are package-specific details that are selected to approximate the true unambiguous potential specified by the force field.

jchodera commented 2 years ago

@mkgilson points out we should also take this opportunity to clarify how net charged periodic systems are treated. Perhaps we want Ewald3D-ConductingBoundary-UniformNeutralizingCharge to mean the combination of 3D Ewald with conducting (epsilon=1) boundary conditions where systems with a net charge receive a uniform charge to ensure they are net neutral.

mattwthompson commented 2 years ago

Thanks all for your thoughtful replies - the discussion here was integral in stewing on the problem and deciding which path to propose as a solution.

Earlier I suggested two ideas:

The idea here would be to, in version 0.4 of the Electrostatics section, either

Update the defintion of method="pme" with a clause along the lines of "by the way, if non-periodic, use full electrostatics with no or a very long cutoff"

Add a new method flag that conveys the meaning of "PME if periodic, full no-cutoff electrostatics if non-periodic" and make this the default moving forward

and I decided to move forward with a revised version of #2, more explicitly @j-wags's suggestion of method_periodic and method_nonperiodic (which holds strong similarities to part of @jchodera's suggestion). I admit I wasn't originally wasn't a fan of it - adding more attributes isn't an option that should be taken lightly. But I think squishing a ton of meaning into method is bound to get us in trouble long-term since electrostatics methods can get complicated. Now, I think having separate methods for periodic and non-periodic electrostatics treatment is appropriate level of detail.

I wrote up an EP that codifies this change and would like to continue the discussion there: #30

jchodera commented 2 years ago

What should we do about the fact that "PME" is not an accurate description of the electrostatics model we intend (where the true model is Ewald, and methods like PME that perform controlled numerical approximations are acceptable implementations)?

mattwthompson commented 2 years ago

It's important but I believe the scope and complexity warrants moving it to a separate EP - one that may be as simple as renaming/clarifying what "PME" means, or more prescriptive with respect to the relevant options and implementation details. I've started discussion here: #15

jchodera commented 2 years ago

It's important but I believe the scope and complexity warrants moving it to a separate EP

I'm not sure that makes sense. If the issue is that the current definition is ambiguous, shouldn't the crisp definition of how nonbonded electrostatics should be handled the purvey of a single EP to address that issue?

j-wags commented 2 years ago

shouldn't the crisp definition of how nonbonded electrostatics should be handled the purvey of a single EP to address that issue?

Hm, I think this is letting perfect be the enemy of good. Neither I nor @mattwthompson have a detailed understanding of long-range electrostatics, we largely see this as the minimal change needed to unblock the integration of Interchange into the Toolkit, and make the biopolymer release.

I'd be happy to review an alternative EP from @jchodera that handles this in a more complete way. I just don't think that @mattwthompson or I have the background to craft changes that cleanly handle the mathematical/implementation/historical scope of this area.

davidlmobley commented 2 years ago

I agree with Jeff here. I don't know how to write what's suggested either.

BillSwope commented 2 years ago

I know I am late into the discussion, and it looks like it is being resolved adequately. For those interested, here is just a point of clarification on the "boundary conditions" associated with the Ewald. The Ewald formulation has the concept that the infinite periodic system is embedded in a dielectric at infinity. (Strange concept, right?) The "dielectric at infinity" is usually assumed to be conducting, sometimes also called tinfoil boundary conditions. This corresponds to epsilon=infinity. If the boundary is conducting, charges organize in the conductor to produce a field that cancels any total system dipole that might form. Other choices, however, include vacuum, epsilon=1, and I have seen epsilon=80. For finite values of epsilon, there is an extra term in the electrostatic energy expression that must be included. It is always positive and has the square of the total dipole moment of the system. Including this results in a supression of the transient system dipole moments that form during the simulation. Although all of these values (80, 1 and infinity) are, in some sense, reasonable, the choice can have profound effects on the structure and dynamics of the system. It affects the apparent dielectric constant of water, for instance. In biomolecular codes, I have only ever heard of tinfoil boundary conditions. I would vote for supporting just these and to state so in the documentation.

davidlmobley commented 2 years ago

I am happy with the EP in #30 and I think it resolves this. @jchodera @SimonBoothroyd there are six days left in the feedback window there, so please review so we can get this wrapped up.

openforcefield / standards