openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
305 stars 90 forks source link

Deduplicate redundant matches for vsites caused by symmetries in non-capturing atoms #1739

Closed SimonBoothroyd closed 8 months ago

SimonBoothroyd commented 10 months ago

Description

An incorrect topology idx is assigned to v-sites in certain cases.

Reproduction

import openff.interchange
import openff.toolkit
import openff.units

force_field = openff.toolkit.ForceField()
force_field.get_parameter_handler("Electrostatics")
force_field.get_parameter_handler("vdW")

charge_handler = force_field.get_parameter_handler("LibraryCharges")
charge_handler.add_parameter(
    {"smirks": "[*:1]", "charge1": 0.0 * openff.units.unit.e}
)

vsite_handler = force_field.get_parameter_handler("VirtualSites")
vsite_handler.add_parameter(
    parameter_kwargs={
        "smirks": "[H][#6:2]([H])=[#8:1]",
        "name": "EP",
        "type": "BondCharge",
        "distance": 7.0 * openff.units.unit.angstrom,
        "match": "all_permutations",
        "charge_increment1": 0.2 * openff.units.unit.elementary_charge,
        "charge_increment2": 0.1 * openff.units.unit.elementary_charge,
        "sigma": 1.0 * openff.units.unit.angstrom,
        "epsilon": 2.0 * openff.units.unit.kilocalorie_per_mole,
    }
)

molecule = openff.toolkit.Molecule.from_mapped_smiles("[H:3][C:2]([H:4])=[O:1]")

interchange = openff.interchange.Interchange.from_smirnoff(
    force_field, molecule.to_topology()
)
print(interchange.collections["VirtualSites"].virtual_site_key_topology_index_map)

Output

{VirtualSiteKey with atom indices None: 5}

where I'd expect instead

{VirtualSiteKey with atom indices None: 4}

If you swap the SMIRKS to be "[#6:2]=[#8:1]" then the correct output is printed.

Software versions

mattwthompson commented 10 months ago

I think the output should include both 4 and 5? IIUC the toolkit assigns two parameters here, so two particles should be created:

In [8]: len(force_field.label_molecules(molecule.to_topology())[0]["VirtualSites"][(0,)])
Out[8]: 2

swapping the SMIRKS as suggested changes it to be length 1

mattwthompson commented 9 months ago

@j-wags what's the correct behavior here?

SimonBoothroyd commented 9 months ago

Good question... I think you may be right but this is probably a case where the spec is under-defined?

I think when considering matches as 'different' i'd only class them so if their orientation or parent atoms were different, and ignore any other atoms in the SMIRKS. But I bet there's cases where this could also be undesirable...

j-wags commented 9 months ago

Yeah, my first-pass thought on this is that only one vsite should be getting created. Two are getting created because the match is symmetric about the Hs (if I make the smirks be "[#6:2]=[#8:1]" then only one vsite is made). The root cause is probably that naive SMARTS matching returns two instances of the same tuple due to the H symmetry, which we should be deduplicating. IIRC we should be deduplicating on tuples like (name, type, parent_atoms).

mattwthompson commented 9 months ago

Interchange is naive and trusting of what it gets from the toolkit, and I'd like to keep it that way if possible - I think changing this behavior would happen in the toolkit?

SimonBoothroyd commented 9 months ago

we should be deduplicating on tuples like (name, type, parent_atoms)

+1 (assuming parent_atoms are all tagged atoms)

j-wags commented 9 months ago

Thanks - I agree with moving this to the Toolkit!