lukasturcani / stk

A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases.
https://stk.readthedocs.io/
MIT License
251 stars 47 forks source link

Avoid rdkit sanitization error #531

Closed TimoSommer closed 1 month ago

TimoSommer commented 1 month ago

Hey, first of all thanks for this amazing package!

I am using it for transition metal complexes, and I have noticed one re-occuring problem where rdkit throws an error because it cannot sanitize a molecule due to issues with the transition metal, which happens with rdkit all the time. For me, outcommenting this line has so far shown to work well, but it would be nice to implement this as option into the code, since otherwise I would have to monkeypatch stk all the time, which is dangerous if versions change.

The issue is in the function get_atom_ids() in src/stk/_internal/functional_group_factories/utilities.py. In this file, it would be great if the line rdkit.SanitizeMol(rdkit_mol) could optionally be skipped and this could be exposed in the class SmartsFunctionalGroupFactory() so that a user can turn this off if necessary.

I wonder, has this issue been seen before? I cannot believe that I am the first to discover this, since stk is used so much for metals.

lukasturcani commented 1 month ago

Hi there! thanks for using the package! You can try creating your own FunctionalGroupFactory that you use in your own code by making a subclass of stk.FunctionalGroupFactory. Just copy paste this in your code and there should be no need to monkeypatch stk. Lemme know if this works (I've not run it but am reasonably confident)!

fwiw, do you have a minimum reproducible example for this error? I think @andrewtarzia works a lot with transition metals has not run into the issue, could be useful for our test suite if nothing else.


import stk

class TimoFunctionalGroupFactory(stk.FunctionalGroupFactory):
    def __init__(self, smarts, bonders, deleters, placers=None):
        self._smarts = smarts
        self._bonders = bonders
        self._deleters = deleters
        self._placers = bonders if placers is None else placers

    def get_functional_groups(self, molecule):
        for atom_ids in _get_atom_ids(self._smarts, molecule):
            atoms = tuple(molecule.get_atoms(atom_ids))
            yield stk.GenericFunctionalGroup(
                atoms=atoms,
                bonders=tuple(atoms[i] for i in self._bonders),
                deleters=tuple(atoms[i] for i in self._deleters),
                placers=tuple(atoms[i] for i in self._placers),
            )

def _get_atom_ids(query, molecule):
    rdkit_mol = molecule.to_rdkit_mol()
    yield from rdkit_mol.GetSubstructMatches(
        query=rdkit.MolFromSmarts(query),
    )
lukasturcani commented 1 month ago

I'm close this as solved, lemme know if I should re-open.

TimoSommer commented 2 weeks ago

Sorry for the delay, I just came back to this. I have prepared a test .mol file of a complex in which this issue arises. The issue can be replicated with

import rdkit
import stk
testmol = stk.BuildingBlock.init_from_file('testmol.mol')
rdkit_mol = testmol.to_rdkit_mol()
rdkit.Chem.AllChem.SanitizeMol(rdkit_mol)

The complex fails because rdkit doesn't like Ga atoms with a valence of 4 and fails with an rdkit.Chem.rdchem.AtomValenceException. These kind of Valence Exceptions happen with rdkit all the time when dealing with metal atoms.

About the stk code, I do see the necessity of running an rdkit sanitization before the SMARTS query. Our SMARTS querys are quite trivial though from from what I know, so I might also try to refactor it in a way in which we don't need this anymore. Thank you!

andrewtarzia commented 2 weeks ago

Hey @TimoSommer , I hope the fix lukas mentioned works.

I will note that for handling metals in rdkit/stk, the trick (I use trick because the "chemistry" becomes questionable here) is to treat the metal-organic bonds as "dative" bonds (bond order 9 in stk). So when I change the Ga-included bonds from 1 atom1id atom2id to 9 atom1id atom2id in the mol file you sent, your example code ran fine. The trick is that when dative bonds are used (in the right direction), they do not add to the number of atoms in the valence checker.

See the example Metal-Organic Cage Construction in https://stk.readthedocs.io/en/stable/_autosummary/stk.cage.Cage.html

If you use a metal complex topology graph to build a molecule, stk handles this for free, but for cages and other topology graphs you must use the DativeReactionFactory.

CCEMGroupTCD commented 2 weeks ago

Oh great, that trick seems to work well! Thanks for your help!