rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.63k stars 875 forks source link

Different Bond Types between Similar Molecules #2808

Open vsomnath opened 4 years ago

vsomnath commented 4 years ago
**Configuration:**

Description:

I'm working on a task of graph translation from fragment molecules to reactant molecules. These atom mapped SMILES come from the USPTO reaction dataset. If you see the code below, the bond between atoms with atom map numbers 45 and 46 have different bond types, even though they are both aromatic, or double. Attached below are also the images of the fragments and reactants. The molecules look the same except for the -OH group with atom map number 47.

Fragment image

Reactant image

from rdkit import Chem

frag_smi = '[CH3:34][O:35][C:36]([CH2:37][c:38]1[cH:39][n:40]([CH3:48])[c:41]2[cH:42][c+:43][cH:44][cH:45][c:46]12)=[O:49]'

reac_smi = '[CH3:34][O:35][C:36]([CH2:37][c:38]1[cH:39][n:40]([CH3:48])[c:41]2[cH:42][c:43]([OH:47])[cH:44][cH:45][c:46]12)=[O:49]'

frag_mol = Chem.MolFromSmiles(frag_smi)
reac_mol = Chem.MolFromSmiles(reac_smi)

print(frag_mol.GetAtomWithIdx(12).GetAtomMapNum(), frag_mol.GetAtomWithIdx(13).GetAtomMapNum())
print(frag_mol.GetBondBetweenAtoms(12, 13).GetBondType())
print()

print(reac_mol.GetAtomWithIdx(13).GetAtomMapNum(), reac_mol.GetAtomWithIdx(14).GetAtomMapNum())
print(reac_mol.GetBondBetweenAtoms(13, 14).GetBondType())

Output:

45 46
DOUBLE

45 46
AROMATIC

I'd like to understand why this is happening, and if there is something on my end, like using the +ve charge on the fragment that is leading to this issue.

vsomnath commented 4 years ago

When I remove the positive charge, both bonds are marked as Aromatic. That seems to be the problem, but the positive charge indication is important for my task, as I use the atom.GetFormalCharge() as a feature in the graph model.

ptosco commented 4 years ago

It looks like setting the phenyl ring to be a carbocation removes the aromatic flag from the ring atoms and bonds, which does not seem right.

frag_noh = "COC(=O)Cc1cn(C)c2ccccc12"
frag_noh_mol = Chem.MolFromSmiles(frag_noh)
Chem.MolToSmiles(frag_noh_mol)
'COC(=O)Cc1cn(C)c2ccccc12'

frag_noh_mol.GetAtomWithIdx(11).SetNumExplicitHs(0)
frag_noh_mol.GetAtomWithIdx(11).SetNoImplicit(True)
frag_noh_mol.GetAtomWithIdx(11).SetFormalCharge(1)
frag_noh_mol.GetAtomWithIdx(11).UpdatePropertyCache()

Chem.SanitizeMol(frag_noh_mol)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

Chem.MolToSmiles(frag_noh_mol)
'COC(=O)Cc1cn(C)c2c1=CC=[C+]C=2'
greglandrum commented 4 years ago

I think there's a good argument that this is a bug. The [C+] is iso-electronic to a [B] and should be happily aromatic in this situation. Here's a simpler demonstration:

In [10]: m = Chem.MolFromSmiles('C1=CC=CC=[C+]1')                                                                                                                          

In [11]: m.Debug()                                                                                                                                                         
Atoms:
    0 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
    1 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
    2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
    3 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
    4 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
    5 6 C chg: 1  deg: 2 exp: 3 imp: 0 hyb: 2 arom?: 0 chi: 0
Bonds:
    0 0->1 order: 2 conj?: 1 aromatic?: 0
    1 1->2 order: 1 conj?: 1 aromatic?: 0
    2 2->3 order: 2 conj?: 1 aromatic?: 0
    3 3->4 order: 1 conj?: 1 aromatic?: 0
    4 4->5 order: 2 conj?: 1 aromatic?: 0
    5 5->0 order: 1 conj?: 1 aromatic?: 0

In [12]: m = Chem.MolFromSmiles('C1=CC=CC=B1')                                                                                                                             

In [13]: m.Debug()                                                                                                                                                         
Atoms:
    0 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
    1 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
    2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
    3 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
    4 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 1 chi: 0
    5 5 B chg: 0  deg: 2 exp: 3 imp: 0 hyb: 2 arom?: 1 chi: 0
Bonds:
    0 0->1 order: 12 conj?: 1 aromatic?: 1
    1 1->2 order: 12 conj?: 1 aromatic?: 1
    2 2->3 order: 12 conj?: 1 aromatic?: 1
    3 3->4 order: 12 conj?: 1 aromatic?: 1
    4 4->5 order: 12 conj?: 1 aromatic?: 1
    5 5->0 order: 12 conj?: 1 aromatic?: 1

In [14]: