rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.53k stars 854 forks source link

Converting smiles to smarts using MolToSmarts flips stereocenters #1918

Open kovasap opened 6 years ago

kovasap commented 6 years ago

When i try to convert a molecular smiles string to a smarts string, some of the stereocenters in my molecule change. Example code:

import rdkit
from rdkit import Chem

s = '[NH2:1]-[C:2](=[O:3])-[c:4]1:[cH:5]:[cH:6]:[cH:7]:[n+:8](-[C@@H:10]2-[O:11]-[C@H:12](-[CH2:13]-[O:14]-[P:15](-[O:16])(=[O:17])-[O:18]-[P:19](-[O:20])(=[O:21])-[O:22]-[CH2:23]-[C@H:24]3-[O:25]-[C@@H:26](-[n:31]4:[cH:32]:[n:33]:[c:34]5:[c:35](-[NH2:36]):[n:37]:[cH:38]:[n:39]:[c:40]:4:5)-[C@H:27](-[OH:28])-[C@@H:29]-3-[OH:30])-[C@@H:41](-[OH:42])-[C@H:43]-2-[OH:44]):[cH:9]:1'
print()
print(rdkit.__version__)
print(s)
m = Chem.MolFromSmiles(s)
sma = Chem.MolToSmarts(m, isomericSmiles=True)
# makes easier comparison to smiles by eye only - aromatic atoms are ignored
print(sma.replace('#6', 'C').replace('#8', 'O').replace('#15', 'P').replace('#7', 'N'))
print(sma)

Produces output:

2018.03.2
[NH2:1]-[C:2](=[O:3])-[c:4]1:[cH:5]:[cH:6]:[cH:7]:[n+:8](-[C@@H:10]2-[O:11]-[C@H:12](-[CH2:13]-[O:14]-[P:15](-[O:16])(=[O:17])-[O:18]-[P:19](-[O:20])(=[O:21])-[O:22]-[CH2:23]-[C@H:24]3-[O:25]-[C@@H:26](-[n:31]4:[cH:32]:[n:33]:[c:34]5:[c:35](-[NH2:36]):[n:37]:[cH:38]:[n:39]:[c:40]:4:5)-[C@H:27](-[OH:28])-[C@@H:29]-3-[OH:30])-[C@@H:41](-[OH:42])-[C@H:43]-2-[OH:44]):[cH:9]:1
[NH2:1]-[C:2](=[O:3])-[C:4]1:[CH:5]:[CH:6]:[CH:7]:[N+:8](-[C@H:10]2-[O:11]-[C@H:12](-[CH2:13]-[O:14]-[P:15](-[O:16])(=[O:17])-[O:18]-[P:19](-[O:20])(=[O:21])-[O:22]-[CH2:23]-[C@@H:24]3-[O:25]-[C@@H:26](-[N:31]4:[CH:32]:[N:33]:[C:34]5:[C:35](-[NH2:36]):[N:37]:[CH:38]:[N:39]:[C:40]:4:5)-[C@H:27](-[OH:28])-[C@H:29]-3-[OH:30])-[C@@H:41](-[OH:42])-[C@@H:43]-2-[OH:44]):[CH:9]:1
[#7H2:1]-[#6:2](=[#8:3])-[#6:4]1:[#6H:5]:[#6H:6]:[#6H:7]:[#7+:8](-[#6@H:10]2-[#8:11]-[#6@H:12](-[#6H2:13]-[#8:14]-[#15:15](-[#8:16])(=[#8:17])-[#8:18]-[#15:19](-[#8:20])(=[#8:21])-[#8:22]-[#6H2:23]-[#6@@H:24]3-[#8:25]-[#6@@H:26](-[#7:31]4:[#6H:32]:[#7:33]:[#6:34]5:[#6:35](-[#7H2:36]):[#7:37]:[#6H:38]:[#7:39]:[#6:40]:4:5)-[#6@H:27](-[#8H:28])-[#6@H:29]-3-[#8H:30])-[#6@@H:41](-[#8H:42])-[#6@@H:43]-2-[#8H:44]):[#6H:9]:1

Unless I'm missing something, this means that atom 24, 29, 10, and 43 change their stereochemistry during the conversion. This looks like a bug to me, unless bond rotation/ring flipping can explain this?

greglandrum commented 6 years ago

Yeah, this looks like a bug in the way the SMARTS is generated. Thanks for reporting it!

greglandrum commented 6 years ago

Here's a more minimal example:

In [19]: s3 = '[NH3+:8](-[C@@H:10]2-[O:11]-[C@H:12](-[CH2:13]-[OH:14])-[C@@H:41](-[OH:42])-[C@H:43]-2-[OH:44])'

In [20]: m3 = Chem.MolFromSmiles(s3)

In [21]: print(Chem.MolToSmiles(m3))
[NH3+:8][C@@H:10]1[O:11][C@H:12]([CH2:13][OH:14])[C@@H:41]([OH:42])[C@H:43]1[OH:44]

In [22]: print(Chem.MolToSmarts(m3))
[#7H3+:8]-[#6@H:10]1-[#8:11]-[#6@H:12](-[#6H2:13]-[#8H:14])-[#6@@H:41](-[#8H:42])-[#6@@H:43]-1-[#8H:44]

and to confirm that it's not connected to the atom map info (which it shouldn't be):

In [31]: for at in m3.GetAtoms(): at.SetAtomMapNum(0)

In [32]: print(Chem.MolToSmiles(m3))
[NH3+][C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O

In [33]: print(Chem.MolToSmarts(m3))
[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]
zeromtmu commented 5 years ago

i face this too

bricehoff commented 5 years ago

There seems to be a related bug. with the example above: "[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]"

if we convert it multiple times with MolFromSmarts and MolToSmarts, the Smarts is modified:

print(AllChem.MolToSmarts(AllChem.MolFromSmarts("[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]")))

print(AllChem.MolToSmarts(AllChem.MolFromSmarts(AllChem.MolToSmarts(AllChem.MolFromSmarts("[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]")))))

print(AllChem.MolToSmarts(AllChem.MolFromSmarts(AllChem.MolToSmarts(AllChem.MolFromSmarts(AllChem.MolToSmarts(AllChem.MolFromSmarts("[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]")))))))

we obtain:

"[#7&H3&+]-[#6@@&*&H1]1-[#8]-[#6@&*&H1](-[#6&H2]-[#8&H1])-[#6@@&*&H1](-[#8&H1])-[#6@&*&H1]-1-[#8&H1]"
"[#7&H3&+]-[#6@&*&*&H1]1-[#8]-[#6@&*&*&H1](-[#6&H2]-[#8&H1])-[#6@@&*&*&H1](-[#8&H1])-[#6@@&*&*&H1]-1-[#8&H1]"
"[#7&H3&+]-[#6@@&*&*&*&H1]1-[#8]-[#6@&*&*&*&H1](-[#6&H2]-[#8&H1])-[#6@@&*&*&*&H1](-[#8&H1])-[#6@&*&*&*&H1]-1-[#8&H1]"
ricrogz commented 4 years ago

I think this (the chiralities) gets fixed with #2570. The extra "&*" that get added are #2595 :)