openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
311 stars 90 forks source link

AM1BCC charges are sensitive to ordering? #983

Open lilyminium opened 3 years ago

lilyminium commented 3 years ago

Describe the bug

This is odd, hopefully it's not because I've misunderstood how the charge assignment works. I seem to be getting different charges from create_openmm_system depending on how the atoms in my molecule are ordered. The difference in the particle I check is about 0.04 (OpenEye) or 0.03 (AmberTools), which is nothing to sneeze at. This is using both the OpenEye and AmberTools toolkits.

To Reproduce Steps to reproduce the behavior. A minimal reproducing set of python commands is ideal.

If the problem involves a specific molecule or file, please upload that as well.

This may be reproducible with a smaller molecule, but I was looking at Ala5.

Part 1, checking my molecules are the same thing with different orders:

import re
from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField

smi1 = ('[H:50][C@:2]([C:3](=[O:4])[N:5]([H:46])[C@@:6]([H:45])'
        '([C:11](=[O:12])[N:13]([H:44])[C@@:14]([H:43])([C:19]'
        '(=[O:20])[N:21]([H:42])[C@@:22]([H:41])([C:27](=[O:28])'
        '[N:29]([H:40])[C@@:30]([H:39])([C:35](=[O:36])[O:37][H:38])'
        '[C:31]([H:32])([H:33])[H:34])[C:23]([H:24])([H:25])[H:26])'
        '[C:15]([H:16])([H:17])[H:18])[C:7]([H:8])([H:9])[H:10])'
        '([C:1]([H:51])([H:52])[H:53])[N:47]([H:48])[H:49]')

smi2 = ('[H:48][C@:44]([C:27](=[O:45])[N:25]([H:26])[C@@:23]([H:24])'
        '([C:21](=[O:22])[N:19]([H:20])[C@@:17]([H:18])([C:15]'
        '(=[O:16])[N:13]([H:14])[C@@:11]([H:12])([C:9](=[O:10])'
        '[N:7]([H:8])[C@@:5]([H:6])([C:3](=[O:4])[O:2][H:1])'
        '[C:40]([H:41])([H:42])[H:43])[C:36]([H:37])([H:38])[H:39])'
        '[C:32]([H:33])([H:34])[H:35])[C:28]([H:29])([H:30])[H:31])'
        '([C:46]([H:49])([H:50])[H:51])[N:47]([H:52])[H:53]')

assert re.sub(":[0-9]+", "", smi1) == re.sub(":[0-9]+", "", smi2)

mol1_order = Molecule.from_mapped_smiles(smi1)
mol1_unorder = Molecule.from_smiles(smi1)
mol2_order = Molecule.from_mapped_smiles(smi2)

iso, amap1 = Molecule.are_isomorphic(mol1_order, mol2_order, return_atom_map=True)
assert iso
assert amap1[0] != 0
iso, amap2 = Molecule.are_isomorphic(mol1_order, mol1_unorder, return_atom_map=True)
assert iso
assert amap2[0] != 0

Part 2, OpenEye charges:

def get_charge(offmol, particle_index):
    from simtk.openmm.openmm import NonbondedForce
    forcefield = ForceField('openff_unconstrained-1.3.0.offxml')
    sys = forcefield.create_openmm_system(offmol.to_topology())
    nb = [f for f in sys.getForces() if isinstance(f, NonbondedForce)][0]
    return nb.getParticleParameters(particle_index)[0]

get_charge(mol1_order, 0)

Quantity(value=-0.13401000201702118, unit=elementary charge)

get_charge(mol2_order, amap1[0])

Quantity(value=-0.09686999768018723, unit=elementary charge)

get_charge(mol1_unorder, amap2[0])

Quantity(value=-0.0931600034236908, unit=elementary charge)

Part 3, AmberTools charges:

get_charge(mol1_order, 0)

Quantity(value=-0.1241, unit=elementary charge)

get_charge(mol2_order, amap1[0])

Quantity(value=-0.0941, unit=elementary charge)

get_charge(mol1_unorder, amap2[0])

Quantity(value=-0.0951, unit=elementary charge)

Output The full error message (may be large, that's fine. Better to paste too much than too little.)

Computing environment (please complete the following information):

That's very long, I am using b2b76bb457c2a8235 of the toolkit and version 2020.2.2 of openeye-toolkits.

Additional context Add any other context about the problem here.

The fragmenter package canonically orders its atoms before generating conformers. Could that help?

lilyminium commented 3 years ago

This does not appear to be solely a result of different conformer generation. Instead, the order of the atoms even in the same conformer affects the charges, e.g. to 0.015 on this small molecule.

smiles_a = '[H:7][C:1]([H:8])([H:9])[C:6](=[O:3])[O:4][C:5]([H:10])([H:11])[N:2]([H:12])[H:13]'
smiles_b = '[H:7][C:2]([H:8])([H:9])[C:4](=[O:5])[O:6][C:3]([H:10])([H:11])[N:1]([H:12])[H:13]'

mol_a = Molecule.from_mapped_smiles(smiles_a)
mol_b = Molecule.from_mapped_smiles(smiles_b)

mol_a.generate_conformers(n_conformers=1)
mol_b.generate_conformers(n_conformers=1)

_, remapping = Molecule.are_isomorphic(mol_a, mol_b, return_atom_map=True)
mol_c = mol_a.remap(remapping)
# ====== positions are the same =======
assert (mol_c.conformers[0] == mol_b.conformers[0]).all()

mol_a.assign_partial_charges(partial_charge_method="am1bcc",
                             use_conformers=mol_a.conformers)
mol_b.assign_partial_charges(partial_charge_method="am1bcc",
                             use_conformers=mol_b.conformers)
mol_c = mol_a.remap(remapping)

diff = np.abs(mol_c.partial_charges - mol_b.partial_charges)
print(max(diff))

0.01530998945236206

Colour me confused.

davidlmobley commented 3 years ago

Perhaps worth checking if this is a behavior you get if you invoke OpenEye for charging directly; would isolate whether it's due to charging itself or due to something prior to charging, I think.

lilyminium commented 3 years ago

Thanks for the suggestion @davidlmobley. It does seem to occur from the OpenEye side. Not sure how best to contact them about it.

import numpy as np
from openeye import oechem, oequacpac, oeomega
from rdkit import Chem
from openff.toolkit.topology.molecule import Molecule, unit

smiles1 = '[H:7][C:1]([H:8])([H:9])[C:6](=[O:3])[O:4][C:5]([H:10])([H:11])[N:2]([H:12])[H:13]'
smiles2 = '[H:7][C:2]([H:8])([H:9])[C:4](=[O:5])[O:6][C:3]([H:10])([H:11])[N:1]([H:12])[H:13]'

mol1 = Molecule.from_mapped_smiles(smiles1)
mol2 = Molecule.from_mapped_smiles(smiles2)
_, remapping = Molecule.are_isomorphic(mol1, mol2, return_atom_map=True)
_, remap_list = zip(*sorted(remapping.items()))

# generate 1 conformer
mol1.generate_conformers(n_conformers=1)
# use RDKit to save conformer in different orders in PDB
rdmol1 = Chem.RWMol(mol1.to_rdkit())
rdmol2 = Chem.RenumberAtoms(rdmol1, remap_list)
Chem.MolToMolFile(rdmol1, "mol1.mol")
Chem.MolToMolFile(rdmol2, "mol2.mol")

# === OpenEye only from hereon ===
# read mol from file
oemol1 = oechem.OEMol()
ifs = oechem.oemolistream("mol1.mol")
oechem.OEReadMDLFile(ifs, oemol1)

oemol2 = oechem.OEMol()
ifs = oechem.oemolistream("mol2.mol")
oechem.OEReadMDLFile(ifs, oemol2)

# assign partial charges
oe_charge_method = oequacpac.OEAM1BCCCharges()
oequacpac.OEAssignCharges(oemol1, oe_charge_method)
oe_charge_method = oequacpac.OEAM1BCCCharges()
oequacpac.OEAssignCharges(oemol2, oe_charge_method)

charges1 = np.empty(oemol1.NumAtoms())
charges2 = np.empty(oemol2.NumAtoms())

for i, atom in enumerate(oemol1.GetAtoms()):
    charges1[remapping[i]] = atom.GetPartialCharge()
for i, atom in enumerate(oemol2.GetAtoms()):
    charges2[i] = atom.GetPartialCharge()

print(np.abs(charges1 - charges2).max())

0.015299975872039795

lilyminium commented 3 years ago

Update: and as implied, not an issue with AmberTools (for this molecule at least). The charge difference I found when I opened the issue might be from conformer generation then.

smiles1 = '[H:7][C:1]([H:8])([H:9])[C:6](=[O:3])[O:4][C:5]([H:10])([H:11])[N:2]([H:12])[H:13]'
smiles2 = '[H:7][C:2]([H:8])([H:9])[C:4](=[O:5])[O:6][C:3]([H:10])([H:11])[N:1]([H:12])[H:13]'

mol1 = Molecule.from_mapped_smiles(smiles1)
mol2 = Molecule.from_mapped_smiles(smiles2)
_, remapping = Molecule.are_isomorphic(mol1, mol2, return_atom_map=True)
_, remap_list = zip(*sorted(remapping.items()))

# generate 1 conformer
mol1.generate_conformers(n_conformers=1)
mol2 = mol1.remap(remapping)

mol1.assign_partial_charges("am1bcc", toolkit_registry=AmberToolsToolkitWrapper())
mol2.assign_partial_charges("am1bcc", toolkit_registry=AmberToolsToolkitWrapper())

charges1 = np.empty(mol1.n_atoms)
charges2 = np.empty(mol2.n_atoms)

for i in range(mol1.n_atoms):
    charges1[remapping[i]] = mol1.atoms[i].partial_charge._value
    charges2[i] = mol2.atoms[i].partial_charge._value

print(np.abs(charges1 - charges2).max())

0.0

davidlmobley commented 3 years ago

Send an example to reproduce to @., copying Christopher Bayly and me. :) On Jun 18, 2021, 5:57 PM -0700, Lily Wang @.>, wrote:

Thanks for the suggestion @davidlmobley. It does seem to occur from the OpenEye side. Not sure how to contact them about it. import numpy as np from openeye import oechem, oequacpac, oeomega from rdkit import Chem from openff.toolkit.topology.molecule import Molecule, unit

smiles1 = '[H:7]C:1([H:9])C:6[O:4]C:5([H:11])N:2[H:13]' smiles2 = '[H:7]C:2([H:9])C:4[O:6]C:3([H:11])N:1[H:13]'

mol1 = Molecule.from_mapped_smiles(smiles1) mol2 = Molecule.from_mappedsmiles(smiles2) , remapping = Molecule.are_isomorphic(mol1, mol2, return_atommap=True) , remap_list = zip(*sorted(remapping.items()))

generate 1 conformer

mol1.generate_conformers(n_conformers=1)

use RDKit to save conformer in different orders in PDB

rdmol1 = Chem.RWMol(mol1.to_rdkit()) rdmol2 = Chem.RenumberAtoms(rdmol1, remap_list) Chem.MolToMolFile(rdmol1, "mol1.mol") Chem.MolToMolFile(rdmol2, "mol2.mol")

=== OpenEye only from hereon ===

read mol from file

oemol1 = oechem.OEMol() ifs = oechem.oemolistream("mol1.mol") oechem.OEReadMDLFile(ifs, oemol1)

oemol2 = oechem.OEMol() ifs = oechem.oemolistream("mol2.mol") oechem.OEReadMDLFile(ifs, oemol2)

read Molecule from file again for precision errors

mol1 = Molecule.from_openeye(oemol1) mol2 = Molecule.from_openeye(oemol2) mol1.assign_partial_charges(partial_charge_method="am1bcc", use_conformers=mol1.conformers)

assign partial charges

oe_charge_method = oequacpac.OEAM1BCCCharges() oequacpac.OEAssignCharges(oemol1, oe_charge_method) oe_charge_method = oequacpac.OEAM1BCCCharges() oequacpac.OEAssignCharges(oemol2, oe_charge_method)

charges1 = np.empty(oemol1.NumAtoms()) charges2 = np.empty(oemol2.NumAtoms())

for i, atom in enumerate(oemol1.GetAtoms()): charges1[remapping[i]] = atom.GetPartialCharge() for i, atom in enumerate(oemol2.GetAtoms()): charges2[i] = atom.GetPartialCharge()

print(np.abs(charges1 - charges2).max()) 0.015299975872039795 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mattwthompson commented 3 years ago

@lilyminium was there an update on this, or do we still need to handle atom ordering in conformer generation? (I'm assuming that atom ordering in conformer generation would be enough to make oequacpac reproducible)

lilyminium commented 3 years ago

@mattwthompson Atom ordering still affects, separately, both conformer generation and AM1 optimization. Below is OpenEye's recommendation to minimize effects on AM1 optimization, but I think that conformer generation is an independent issue (#926).

I wanted to give you an update on this ticket. After some digging by our developers, it appears as if we have found an atom ordering-dependence in the AM1 optimizer. That is, the ordering of the atoms dictates the ordering of the coordinates that need to be optimized and this is likely causing a slightly different output structure and hence, slightly different partial charges. At the moment, the developer has identified the problematic part of the code and is working to fix it for the next release. However, in the meantime, we can make a recommendation to try and work around the problem. The recommendation is to perform a minimization of the input structure prior to calculating partial charges such that by putting it in a better, lower energy state, the AM1 optimizer doesn't need to alter the structure as much. In some small scale testing, it appears as if this does help reduce the frequency of partial charge mismatches above a certain threshold.

mattwthompson commented 3 years ago

Ah, I forgot we had another issue open for that, thanks!

j-wags commented 2 years ago

cc #924

(I'm just putting together a spec for a next-gen AM1 interface/behavior set so I want to ensure all of the underlying issues are linked)