openforcefield / openff-fragmenter

Fragment molecules for quantum mechanics torsion scans
https://fragmenter.readthedocs.io/en/latest/
MIT License
42 stars 13 forks source link

Provide a way to map from `WBOFragmenter` input molecule to parent molecule #129

Open j-wags opened 3 years ago

j-wags commented 3 years ago

Summary

Many users probably want to follow specific atoms/bonds through the fragmentation process. However, as a result of performing canonicalization of input molecules, there's no way to map from an atom in the input molecule of a Fragmenter job to an atom in the result's parent_molecule or any of the output fragments.

Reproducing example

from openff.toolkit.topology import Molecule
from openff.fragmenter.fragment import WBOFragmenter
mol1 = Molecule.from_smiles('ClCCCF')
frag_engine = WBOFragmenter()
result = frag_engine.fragment(mol1)

def draw_mol_and_label_atom_index(mol):
    rdmol = mol.to_rdkit()
    for atom in rdmol.GetAtoms():
        atom.SetAtomMapNum(atom.GetIdx())
    return rdmol

mol1 has the following atom map, and the drawing below shows the atom indices (NOT map indices)

print(mol1.properties)
draw_mol_and_label_atom_index(mol1)

{'atom_map': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11}}

image

The result's parent_molecule has the following atom map, and the drawing below shows the atom indices (NOT map indices)

result.parent_molecule.properties
draw_mol_and_label_atom_index(result.parent_molecule)

{'atom_map': {0: 1, 1: 3, 2: 5, 3: 4, 4: 2, 5: 8, 6: 9, 7: 10, 8: 11, 9: 6, 10: 7}}

image

We see that the Cl switched from being atom index 0 in the input, to atom index 4 in the parent. There's no direct way to map from 0 to 4 given the information in the result object.

Solutions

I think the root cause of this issue is that the OpenFF toolkit performs canonicalization, but it doesn't have a way to return the atom mapping that got applied. I'll open an issue on the Toolkit repo to have that mapping get returned, and once that's available we can provide it to the user here.

SimonBoothroyd commented 3 years ago

I think the root cause of this issue is that the OpenFF toolkit performs canonicalization, but it doesn't have a way to return the atom mapping that got applied. I'll open an issue on the Toolkit repo to have that mapping get returned, and once that's available we can provide it to the user here.

This was the main reason. And as a slight tangent, there is additional code scattered throughout the codebase that basically reapplies the atom map after things like conformer generation or WBO generation because the TK will discard this info.

Currently, as @jthorton mentioned #126, we recommend retrieving the parent molecule from the results object.

There's no direct way to map from 0 to 4 given the information in the result object.

That's not strictly true. @ldamore , If you really do want to re-map the old map indices to the new without waiting for a new TK release, just do an isomorphism check of the old with the new parent and set return atom maps = True and use that to port the indices over. However note that the indices in the fragment correspond to the map indices in the parent of the results object