Round partial charges to enforce correct integral charge on molecule

j-wags commented 4 years ago

Is your feature request related to a problem? Please describe.

Some partial charge assignment methods only report three or four decimal places of precision
Some System/Topology file formats only store three or four decimals of precision

Both of these issues can lead to rounding errors that become significant in larger molecules.

Describe the solution you'd like ParmEd implemented this approach, which rounds existing charges to the desired precision, and then adds the sum of the rounding errors to the first particle to achieve the correct net charge on each residue. This may be tricky for OFFTK, since it has no concept of "Residues", thus a large biopolymer could require a substantial correction, which should not be dumped on a first atom (for example, if 1000 residues each contributed 0.001e of rounding error, then the first atom might end up getting +1 added to its charge).

It seems likely that we could apply the same solution, but just with a slightly more complex algorithm to spread out the final charge correction over several atoms. It's not clear to me that this is a question with an "automagic" solution -- pathological cases could lead us to introducing errors with this functionality (for example, accumulating a large error given a biopolymer, or adding a dipole to a symmetric system).

We'd want to figure out how this would interface with our current functionality. Basically, we should allow

Users disabling the rounding altogether
Users rounding to a custom precision
Users applying this rounding only to certain molecules

Further, we should figure out where this would live in our current API. Some possibilities include

kwarg to create_openmm_system: do_not_round_charges=[molecules] or custom_charge_rounding={molecule:precision}
Tag-level element in the ForceField itself -- likely in ElectrostaticsHandler (but this prevents us from handling different molecules individually). So, for example, <Electrostatics version="0.3" method="PME" scale12="0.0" scale13="0.0" scale14="0.833333" cutoff="9.0 * angstrom" rounding_precision="4"/>
In our eventual OFF System class, which would know which simulation format it is exporting to, and thus which precision to use.

I'm most in favor of the first and third options, since the second would require a change to the SMIRNOFF spec and disallow treating molecules differently. The first option is the only one that is immediately available.

Additional context In the LibraryCharges PR that was already merged (#433), we added a check that the sum of a molecule's partial charges must be within 0.01 of the molecule's formal charge. This can be overridden by a kwarg (allow_nonintegral_charges=True).

davidlmobley commented 4 years ago

I like the way you're thinking here. I have no other insights at this point though.

j-wags commented 3 years ago

Linking https://github.com/openforcefield/openff-forcefields/issues/34 -- Basically, rounding errors are being noticed in production workflows. I suggest we prioritize resolving this.

aniketsh commented 3 years ago

following the issue, and looking forward to a solution! I am getting some non-integer charges with am1bcc, for instance, 0.003.

j-wags commented 3 years ago

@jchodera @mattwthompson @davidlmobley @SimonBoothroyd Any objection to implementing something like this option:

kwarg to create_openmm_system: do_not_round_charges=[molecules] or custom_charge_rounding={molecule:precision}

The one downside I see here is that it would put some aspect of FF application outside of the force field (instead, the use of non-default rounding precisions would only be stored in the parameterization code).

Would this raise a blocking objection for any of you?

davidlmobley commented 3 years ago

ParmEd implemented the approach I suggested (in other words, I'm partly responsible for what ParmEd does) but it was suggested with only small molecules in mind.

Still, it's clear to me that we want systems with integer charges for essentially all applications, so we need something at least SIMILAR to what ParmEd does. Specifically, if we do NOT have integer charges, the consequences for free energy calculations are potentially dire -- e.g. one perturbs a system with a zero formal charge (but "real" formal charge of +0.003) to one that also has zero charge (but real charge of -0.002) and picks up a very significant free energy change from the associated artifact.

Big picture: We really, really want systems which are neutral for good performance and ought to do this well by default (by distributing charge a bit ourselves) and if users need a rope to hang themsleves with we can provide one but only with ample warnings.

mattwthompson commented 3 years ago

No blocking objection from me! My main question is if we'd want this to be on or off by default. I have no strong feeling on the trade-offs.

I agree that the <Electrostatics> tag is not the place for this, but on the basis that it is purely an implementation detail borne by the behavior of external libraries, not something that a scientist would use to design a force field.

I can't see any way to avoid implementing this in create_openmm_system in order to resolve this issue this issue in a timely basis. That doesn't prevent something like fix_charges from happening downstream, and it may be useful in some cases to allow it to happen later instead of or addition to inside of ElectrostaticsHandler. I had some scattered thoughts about which function it should live inside of - create_openmm_system or each handler?) but I'll leave that to the implementation.

As a half-measure, prior to biopolymer support, this implementation could set an upper limit on how much residual charge can be chucked onto an arbitrary atom, i.e. error out if it's trying to modify the first atom's charge by > 0.05. This would allow the 0.003 case to be rounded away but should safeguard against most invalid physics.

Overcharging one atom on a biopolymer sounds like a good "yeah, let's make sure we support this case" for designing the biopolymer infrastructure. Are there other similar cases to consider?

Picky: I would prefer something like round_charges= ... over custom_charge_rounding= ... since (correct me if I'm wrong) there is not a default per-molecule rounding that is applied if the argument is not specified.

The one downside I see here is that it would put some aspect of FF application outside of the force field (instead, the use of non-default rounding precisions would only be stored in the parameterization code).

I think that train has already left the station - charge_from_molecules and partial_bond_orders_from_molecules exist and are currently in conflict with this idea. I do think we should try as much as possible to keep the details in the force field where possible, but I don't think that's a good solution for this problem. Would the design here would have charge rounding off by default?

j-wags commented 3 years ago

Specifically, if we do NOT have integer charges, the consequences for free energy calculations are potentially dire

I hadn't thought about that -- Great point.

My main question is if we'd want this to be on or off by default

Given Mobley's answer above, I think the default should be "on".

I can't see any way to avoid implementing this in create_openmm_system

I'm not sure I'd like it in create_openmm_system, because then the logic couldn't discriminate between charge sources. I kinda think that a more natural place for the implementation would seem to be in either ToolkitAM1BCCHandler or ElectrostaticsHandler. Both of those will have access to the context for where the numbers are coming from. I'd approve a PR with it in either of those places.

Generally, I don't like the "put all the leftover charge on the first atom" scheme. I'd prefer to spread it out over all atoms at some high precision. That way, people don't hit the rounded-charge-accumulation problem if they go straight to OpenMM, and if folks export to another format, the exporting library can do its own rounding then.

I do think we should try as much as possible to keep the details in the force field where possible, but I don't think that's a good solution for this problem.

One other variable is that we can change the meaning of ToolkitAM1BCC in the SMIRNOFF spec to record the rounding method. We'll probably need to add some new fields to the tag to specify how we handle ELF settings anyway for that series of new features (eg #793). So we could add that on to the changes for a 0.4 ToolkitAM1BCC tag.

mattwthompson commented 3 years ago

That way, people don't hit the rounded-charge-accumulation problem if they go straight to OpenMM, and if folks export to another format, the exporting library can do its own rounding then.

Perfect - this sounds like the best way to keep the handlers close to ground truth while maintaining compatibility with engines where necessary.

j-wags commented 3 years ago

This was noted by a user again today. This should be prioritized for the next release.

openforcefield / openff-toolkit

Round partial charges to enforce correct integral charge on molecule #456