openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
71 stars 23 forks source link

Partial charge assignment using `charge_from_molecules` on virtual sites / charge increments can be unexpected. #1050

Closed IAlibay closed 2 weeks ago

IAlibay commented 2 months ago

Description

Related to #1048

When building a system with OPC water, if a user passed along a pre-charged water, they might end up with an unexpected set of partial charges in the resultant 4 point molecule due to charge increments.

The behaviour isn't incorrect according to the spec, but you then also have to expect users to a) know the charge assignment order in the spec, b) remember that there are charge increments affecting the molecule they passed charges for, c) do the charge increment math ahead of time to make sure they passed along the right charges.

I'm not 100% sure, but I think the ask would be something along the lines of:

  1. Some kind of tooling to inspect the partial charges you are likely to get after all the increments, etc.. are applied for each molecule (rather than having to manually inspect the OpenMM system to see what happened). Is there a better way to assign partial charges for virtual site containing molecules that I'm missing?
  2. Some kind of warning that further processing of the partial charges will happen.
  3. Documentation that a) yes charge_from_molecules will ignore library charges, but b) charge increments will be applied (this is not clear from the existing docs).

Reproduction

from openff.toolkit import Molecule, Topology, ForceField
from openff.interchange.components._packmol import solvate_topology
from openff.interchange import Interchange
from openmm import NonbondedForce

water = Molecule.from_smiles('O')
water.assign_partial_charges(partial_charge_method='gasteiger')
print(water.partial_charges)

ligand = Molecule.from_smiles('CCCCC')
ligand.generate_conformers()

off_top = Topology.from_molecules(ligand)
solvated_off_top = solvate_topology(topology=off_top)

ff = ForceField('openff-2.2.0.offxml', 'opc.offxml')
inter = Interchange.from_smirnoff(topology=solvated_off_top, force_field=ff, charge_from_molecules=[water])
omm_system = inter.to_openmm_system()

nonbond = [f for f in system.getForces() if isinstance(f, NonbondedForce)][0]

# Get the parameters for the first water oxygen
nonbond.getParticleParameters(17)

Output

partial charges

[-0.4115095219374226 0.2057547609687113 0.2057547609687113] elementary_charge

particle parameters for oxygen (in OPC this is a zero charged atom since all the charge is on the virtual site)

[Quantity(value=-0.4115095219374226, unit=elementary charge),
 Quantity(value=0.31506999999999996, unit=nanometer),
 Quantity(value=0.6363864000000001, unit=kilojoule/mole)]
mattwthompson commented 2 months ago

Pre-charged waters sounds like a strange use case; I don't have grounds to forbid it, but it'll really stress what's documented and easy to work with. Just documenting the quirks is a last resort, I'd like things to work more smoothly and in line with SMIRNOFF and user expectations if possible.

Before getting to that, though, is your reproduction missing a charge_from_molecules somewhere? The charges on the Molecule object shouldn't be processed by default. Without a ligand I'm getting

from openff.toolkit import Molecule, ForceField
from openmm import NonbondedForce

water = Molecule.from_smiles("O")
water.assign_partial_charges(partial_charge_method="gasteiger")
print(
    f"Partial charges on molecule: {[round(charge, 3) for charge in water.partial_charges]}"
)
# Partial charges on molecule: [<Quantity(-0.41, 'elementary_charge')>, <Quantity(0.205, 'elementary_charge')>, <Quantity(0.205, 'elementary_charge')>]

ff = ForceField("openff-2.2.0.offxml", "opc.offxml")
system = ff.create_interchange(water.to_topology()).to_openmm_system()

nonbond = [f for f in system.getForces() if isinstance(f, NonbondedForce)][0]

# Get the parameters for the first water oxygen
print(
    f"Partial charge in system: {round(nonbond.getParticleParameters(0)[0]._value, 3)}"
)
# Partial charge in system: 0.0
IAlibay commented 2 months ago

Before getting to that, though, is your reproduction missing a charge_from_molecules somewhere? The charges on the Molecule object shouldn't be processed by default. Without a ligand I'm getting

Ah yeah sorry, I was playing around with and without and copied the wrong thing 😅 - I've edited the originall comment!

IAlibay commented 2 months ago

Pre-charged waters sounds like a strange use case

The use case I have here is that I'm writing this thing where users can optionally define their solvent molecules.. and well in some cases the solvent can be water 😅 (see: https://github.com/IAlibay/pontibus/blob/develop/src/pontibus/utils/system_creation.py). You could say this is "intentionally" pushing things, because the intended users are.. well you folks.

A question - is this really much different than how ligands with vsites will behave? I.e. would you expect users to more easily know how the offsite charge is being incremented?

mattwthompson commented 2 months ago

Upon some reflection ... this is quite a mess and I'm not totally sure how to proceed. At best, Interchange handles this poorly. At worst, SMIRNOFF doesn't handle it well. I'm looking to get #1048 done before this, which may help!

Two things are referred to as "charge increments":

Looping back to the original case (preset water charges passed along with a 4-site water model), I'm not totally sure what should happen here. The current implementation sees a match in charge_from_molecules and short-circuits, I guess? Bits of the spec are

(preset charges)

In our reference implementation of SMIRNOFF in the OpenFF Toolkit, we also provide a method for specifying user-defined partial charges during system creation. This functionality is accessed by using the charge_from_molecules optional argument during system creation, such as in ForceField.create_openmm_system(topology, charge_from_molecules=molecule_list). When this optional keyword is provided, all matching molecules will have their charges set by the entries in molecule_list

(virtual sites)

Each virtual site receives charge which is transferred from the desired atoms specified in the SMIRKS pattern via a charge_increment# parameter, e.g., if charge_increment1=0.1*elementary_charge then the virtual site will receive a charge of -0.1 and the atom labeled 1 will have its charge adjusted upwards by 0.1. N may index any indexed atom. Additionally, each virtual site can bear Lennard-Jones parameters, specified by sigma and epsilon or rmin_half and epsilon. If unspecified these default to zero.

Snipping out just the relevant bits of opc.offxml:

    <LibraryCharges version="0.3">
        <LibraryCharge smirks="[#1]-[#8X2H2+0:1]-[#1]" charge1="0.0 * elementary_charge ** 1" id="q-opc-O"></LibraryCharge>
        <LibraryCharge smirks="[#1:1]-[#8X2H2+0]-[#1]" charge1="0.0 * elementary_charge ** 1" id="q-opc-H"></LibraryCharge>
    </LibraryCharges>
    <VirtualSites version="0.3" exclusion_policy="parents">
        <VirtualSite smirks="[#1:2]-[#8X2H2+0:1]-[#1:3]" epsilon="0.0 * kilocalorie_per_mole ** 1" type="DivalentLonePair" match="once" distance="-0.15939833 * angstrom ** 1" outOfPlaneAngle="0.0 * degree ** 1" inPlaneAngle="None" charge_increment1="0.0 * elementary_charge ** 1" charge_increment2="0.679142 * elementary_charge ** 1" charge_increment3="0.679142 * elementary_charge ** 1" rmin_half="1.0 * angstrom ** 1" name="EP"></VirtualSite>
    </VirtualSites>

I can think of only two possible expectations here:

  1. charge_from_molecules wins and stop looking. So the charges should be (O H H VS) with poor rounding -0.4, +0.2, +0.2, and 0.0.
  2. charge_from_molecules only specifies the atomic partial charges, and virtual site charge increments still apply. So the resulting charges would be (O H H VS) again with poor rounding -0.4, +0.9, +0.9, -1.3

(1) seems bad but is also confusing since the interaction between preset charges and virtual sites seems ambiguous/under-defined. I think you're expecting (2) here?

IAlibay commented 2 months ago

I think you're expecting (2) here?

From my likely poor understanding of the spec, yes. At least as a user I would expect the same behaviour you would expect with ligands.

As far as I'm aware this is the behaviour that you get in practice, it just isn't immediately clear as a user that this is happening (or even that you have virtual sites for that unique molecule).

mattwthompson commented 2 months ago

Unfortunately, I'm in a position where I think this is underspecified and I need updates to the SMIRNOFF spec to proceed: https://github.com/openforcefield/standards/issues/69

I originally hoped this was limited to your (IMO) esoteric case of passing water through to charge_from_molecules, but I now realize this applies the same to ligands' preset charges interacting with virtual sites. (It's just that, right now, you're probably only adding virtual sites to the water.)

cc: @lilyminium @j-wags

lilyminium commented 2 months ago

I commented on the SMIRNOFF spec with my opinion that I would also expect 2) in this scenario and that I think that's the best option.

Some kind of tooling to inspect the partial charges you are likely to get after all the increments, etc.. are applied for each molecule (rather than having to manually inspect the OpenMM system to see what happened).

Does the .charges property on the SMIRNOFFElectrostaticsCollection work for you? (possibly complicated by #1052 at the moment).

Something like:

>>> from openff.toolkit import ForceField, Molecule, Topology
>>> water = Molecule.from_smiles('O')
>>> water.assign_partial_charges(partial_charge_method='gasteiger')
>>>
>>> ligand = Molecule.from_smiles('CCCCC')
>>> ligand.generate_conformers()
>>>
>>> off_top = Topology.from_molecules(ligand, water)
>>> ff = ForceField('openff-2.2.0.offxml', 'opc.offxml')
>>> inter = ff.create_interchange(off_top, charge_from_molecules=[water])
>>> inter.collections["Electrostatics"].charges
IAlibay commented 1 month ago

Does the .charges property on the SMIRNOFFElectrostaticsCollection work for you? (possibly complicated by https://github.com/openforcefield/openff-interchange/issues/1052 at the moment).

Sure, it's sorta what I've been doing-ish, but from the docs it's really unclear that this is how you're meant to do things. I would suggest doing some kind of cookbook or userguide example maybe? (unless I'm being oblivious and missing one).

mattwthompson commented 1 month ago

Modifying charges after the fact is something that should be straightforward but probably only I am familiar with the sharp edges and we haven't really documented it well. I think it's a fair use case, but it's partially in tension with the design that a force field includes instructions on how to assign partial charges (and, by deduction, not whatever else is brought to the table).

Tracked so I don't forget: https://github.com/openforcefield/openff-interchange/issues/1071