rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.61k stars 871 forks source link

Using SDF molecules without sanitization to match substructures give RuntimeError #1596

Open hjuinj opened 7 years ago

hjuinj commented 7 years ago

I am using rdkit 2017.03.3

When I am reading in molecules from sdf stream via SDMolSupplier without sanitization, and then trying to match the molecules to SMARTS pattern which includes the total connectivity atomic property requirement, I get RuntimeError regarding implicit hydrogens.

Here I use cyclohexane as an example:

from rdkit import Chem

# the sdf of cyclohexane is provided at the bottom
suppl = Chem.SDMolSupplier('./cyclohexane.sdf',   removeHs = False, sanitize = False)
rdkmol = suppl[0]
rdkmol.GetSubstructMatches(Chem.MolFromSmarts("[#6X4]-[#6X4]"))

I get:

RuntimeError: Pre-condition Violation
    getNumImplicitHs() called without preceding call to calcImplicitValence()
    Violation occurred on line 153 in file Code/GraphMol/Atom.cpp
    Failed Expression: d_implicitValence > -1

Am I using the SDMolSupplier correctly? If I am, it seems some information is lost when parsing the sdf molecule. It I set sanitize to True this issue disappears.

sdf of cyclohexane:

CT1001745819

 18 18  0  1  0               999 V2000
   -0.0187    1.5258    0.0104 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0021   -0.0041    0.0020 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.4167    2.0553   -0.0004 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7182   -0.4975   -1.2568 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1536    0.0321   -1.2676 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.1328    1.5619   -1.2592 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5459    1.8868   -0.8726 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5289    1.8773    0.9072 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.5293   -0.3651    0.8851 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0205   -0.3814    0.0098 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4019    3.1452    0.0056 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.9439    1.6943    0.8826 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.7330   -1.5874   -1.2628 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.1910   -0.1364   -2.1398 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6808   -0.3290   -0.3846 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6638   -0.3194   -2.1644 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.6057    1.9230   -2.1423 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.1554    1.9392   -1.2670 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  1  0  0
  1  3  1  0  1  0  0
  1  7  1  0  0  0  0
  1  8  1  0  0  0  0
  2  4  1  0  1  0  0
  2  9  1  0  0  0  0
  2 10  1  0  0  0  0
  3  6  1  0  1  0  0
  3 11  1  0  0  0  0
  3 12  1  0  0  0  0
  4  5  1  0  1  0  0
  4 13  1  0  0  0  0
  4 14  1  0  0  0  0
  5  6  1  0  1  0  0
  5 15  1  0  0  0  0
  5 16  1  0  0  0  0
  6 17  1  0  0  0  0
  6 18  1  0  0  0  0
M  END
$$$$
greglandrum commented 7 years ago

If you turn off sanitization (is there a reason you are doing this?) then things like valence are not calculated. If you need to read without sanitization, then be sure to call 'mol.UpdatePropertyCache()' before you do the substructure match

hjuinj commented 7 years ago

Hi Greg,

Thank you for your help. I am working on the open force field project that John showed at the UGM. The reason for no sanitization is because the philosophy that the input molecule should already be in the user-intended form to be simulated. Thus the molecule is left untouched before matching against SMIRKS patterns.

May I ask why when I read molecules in mol2 files I do not need to perform 'mol.UpdatePropertyCache()' ? Additionally UpdatePropertyCache() will not alter the molecular topology right?

greglandrum commented 7 years ago

It's safe to call mol.updatePropertyCache(); it just calculates valence states of the atoms. There are no topology changes.

hjuinj commented 7 years ago

Thanks again Greg :) Have you had the chance to take a look at the other issue I seem to encounter (#1590) regarding difference in interpretation between mol2 and sdf molecules?

hjuinj commented 6 years ago

Hi Greg, I am having problems using updatePropertyCache(), regardless of whether the molecule is read from sdf or mol2 or from smiles, I get AttributeError: 'Mol' object has no attribute 'updatePropertyCache'

e.g. : m1 = Chem.MolFromSmiles('Cc1ccccc1') print(m1) <rdkit.Chem.rdchem.Mol object at 0x7f3bb90e1c60> m1.updatePropertyCache()

AttributeError: 'Mol' object has no attribute 'updatePropertyCache'

greglandrum commented 6 years ago

You’ve got the capitalization wrong: The Python call is UpdatePropertyCache()